Re: regexp question

Tim Hammerquist Mon, 02 Jul 2001 19:14:29 -0700
Chris Devers wrote:
> 
> On Mon, 2 Jul 2001, Tim Hammerquist wrote:
> 
> > Chris Devers wrote:
> > >
> > > At 10:45 AM 2001.07.02 +0100, Kristofer Wolff wrote:
> > >
> > > I don't know, but I also don't know why you're trying to match
> > > everything. You don't seem to be interested in text outside the
> > > title tags, so skip it. A somewhat better match would be:
> > >
> > >      $subject =~ s/\<title\>(.*)\<\/title\>/$2/sig;
> >
> > $2 won't have anything in it.  Try $1.  And you shouldn't have to
> > escape the [<>]s.
> 
> Whoops! That's cargo cult programming: take something you half read,
> rewrite it & release it into the wild. Serves me right. Sorry... :)

Honestly, the only reason I even catch little things like unnecessarily
escaping characters or double checking code is from spending so much
time on the comp.lang.perl.misc group.  It's kind of like being raised
by abusive parents.  ;)

> > This worked for me:
> >
> >     if ($html =~ m#<title>([^<]*)</title>#si) {
> >         $title = $1;
> >     }
> >     else {
> >         $title = '<none>';
> >     }
> 
> It's still better to go for a parser though. What yould your script do
> with this html:
> 
>    <html>
>    <head>
>      <!-- old name: <title>Eat at Joe's</title>  -->
>      <title>Joe's HTML Hutt</title>
>    </head>
>    <body>
>      <h1>HTML Tutorial: Document Layout</h1>
>      <p>Joe includes the following code in all his documents:
>      <img src="source.gif" width="250" height="200"
>       alt="<html><head><title></title></head><body></body></html>" >
>    </body>
>    </html>
> 
> In that code, the title tag pair shows up validly three times, but only
> once does it actually define the document's title property, and that is
> surely the one most scripters would be interested in. You could monkey
> around with trying to figure out which title tag isn't inside comments and
> isn't part of another tag's attributes etc, but it's far easier and far
> more robust to simply pull in a parser & let it do the work for you.

That's a good point.  I don't parse HTML very often at all, and if I
did, I'd probably get a parsing module.  However, I see a faulty regex
and I feel the need to fix it.  Just my own neurosis.  =)

-- 
-Tim Hammerquist <[EMAIL PROTECTED]>

Sometimes these hairstyles are exaggerated beyond the laws of physics.
    -- Unknown narrator on Anime
_______________________________________________
Perl-Win32-Web mailing list
[EMAIL PROTECTED]
http://listserv.ActiveState.com/mailman/listinfo/perl-win32-web
Re: regexp question

Reply via email to