Re: traversing a variable with regex instead of a file

angie ahl Fri, 10 Oct 2003 08:53:15 -0700

on 2003-10-10 James Edward Gray II said:

>Keep your replies on the list, so you can get help from all the people 
>smarter than me.  ;)


If there are people smarter than you out there I must be an amoeba ;)

>Okay, why put this inside an if block.  If it doesn't find a match it 
>will fail and do nothing, which is what you want, right?  I don't think 
>you need the if.

Good point.

>Why don't we work on your Regular Expression a little and see if we can 
>do it all in one move.  We want to find all occurrences of the keyword, 
>as long as they're not on a line beginning with qz, right?  This seems 
>to do that for me:
>
>$content =~ s/^([^\n]*)($kw)/substr($1, 0, 2) ne 'qz' ? "$1\n$2\n" : 
>"$1$2"/mge;
>

Ok. I had to stop to pick myself up off the floor then. WOW.

This has actually made it possible to cut the whole thing down massively.

here's the code now:

_________________________ 
# get line breaks to make <br>'s at the end
$content =~ s/\n/-qbr-/g;

# find markup and add markers so it doesn't get processed by regex, 
# no keyword links to be made inside other tags
$content =~ s/(\[(img|page|link|mp3)=.*?\])/\nqz$1\n/g;

# find HTML so it doesn't get processed by regex, 
# no keyword links to be made inside valid HTML
$content =~ s/(<.*?>)/\nqz$1\n/g;

for my $href ( @Keywords ) {
    
    # get each keyword and llok for it in content.
    for $kw ( keys %$href ) {
        if ($content =~ /\b($kw)\b/g) {
            
            # do the very clever reg with help from and thanks to           
            # [EMAIL PROTECTED]
            $content =~ s/^([^\n]*)($kw)/substr($1, 0, 2) ne 'qz' ?
"$1\nqz[link=\"$href->{$kw}\" title=\"$2\"]\n" : "$1$2"/mge;
        }
    }
}

# clean up those line breaks and markers;
$content =~ s/\n(qz)?//g;

# put in <br>'s
$content =~ s/-qbr-/<br>\n/g;

print $content;
_________________________

As you can see I've adapted your regex a little to put in the full markup around
the keyword.

The regex itself made perfect sense, it was the 

"" ? "" : "" bit that I've never seen before. That's really useful.

I assume it means

"if statement" ? "do if true" : "do if false"

Please do correct me if I'm wrong. What do you call that? I think I'm going it
be using that quite a bit ;)

do I even need the if false bit in this case?

>I used the /e modifier for the replacement, which allows me to use Perl 
>code in there.  It's pretty simple.  If the line didn't start with a 
>qz, we do a normal replace.  

That's going in my BBEdit gold dust code snippets glossary.

>Let me know if that will work for you.

It did, perfectly. Thank you soooooo much,

>Your right about it being inefficient, of course.  It was easier to 
>read than my Regex though, eh?  <laughs>  

Are you implying that regex isn't easy to read ;)

>The first choice may be slow, 
>but on modern computers they may both work in the blink of an eye.  
>Save worrying about speed for when you need to and try and keep your 
>life as a programmer as easy as possible until then.

Sadly then is now. That's why I joined up to this list today ;)

This code will be run on every single page of a website, in one go. So it needs
to be as efficient as physically possible. The site will only be a few hundred
pages, and not all pages will always be processed. It's a system that makes it's
own links and maintains them, so eveytime a page's keywords change this has to
be done to all pages that contain that keyword.

I know this is not a task for the beginner, but this is actually version 3 of
the code. my old programming language started to show it's dislike for regex.

>> If you have any suggestions I would be most grateful to hear them.
>
>Those are my best shots.  Hope they help.

They did, thank you so much.

Angie

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: traversing a variable with regex instead of a file

Reply via email to