Re: regexp question

Andrea Holstein Sun, 21 Oct 2001 09:26:00 -0700

Birgit Kellner wrote:
> 
> my $header = "Joe Doe<br><br>The book I wrote yesterday";
> my $title;
> if ($header =~ /(^\S.+)(<br><br>)(\S.+$)/m) { $title = "$1: $3";}
> print "$title\n";
> 
> Is there a shorter, simpler, more efficient way to do this? I still need
> $header later on as it is.
> 
I try to go another way as the good answers alreay resolved.


1. There's no need to capture <br><br>.
So 1st optimization is
if ($header =~ /(^\S.+)<br><br>(\S.+$)/m { $title = "$1: $2" }

2. The .+ will first go till to end of line.
Then the regex machine will go character for character backward.
There are two possible ways to speed up:
a) my @headerparts = split /<br><br>/, $header;
   if (@headerparts == 2 and 
       $headerparts[0] =~ /^\S.+$/ and 
       $headerparts[1] =~ /^\S.+/) 
   {
       $title = "$headerparts[0]: $headerparts[1]";
   }

   (I'm not sure, wether .+ is still needed.
    It stands for everything except the \n.
    If there are no newlines in,
    it's quicker and simpler to test:
    my @headerparts = split /<br><br>/, $header;
    if (@headerparts == 2 and
        $headerparts[0] =~ /^\S./ and
        $headerparts[1] =~ /^\S./)
    {
       $title = "$headerparts[0]: $headerparts[2]";
    }

b) I could imagine, that the title doesn't contain any html tags.
So, you could tell the regex machine to don't backtrack unnecessary.

if ($header =~ /(^\S[^<]+)<br><br>(\S.+$)/m) { $title = "$1: $3"; }

Good Luck,
Andrea

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: regexp question

Reply via email to