Re: regexp question

Chris Devers Mon, 02 Jul 2001 13:08:31 -0700
On Mon, 2 Jul 2001, Tim Hammerquist wrote:

> Chris Devers wrote:
> > 
> > At 10:45 AM 2001.07.02 +0100, Kristofer Wolff wrote:
> > 
> > I don't know, but I also don't know why you're trying to match
> > everything. You don't seem to be interested in text outside the
> > title tags, so skip it. A somewhat better match would be:
> > 
> >      $subject =~ s/\<title\>(.*)\<\/title\>/$2/sig;
> 
> $2 won't have anything in it.  Try $1.  And you shouldn't have to
> escape the [<>]s.

Whoops! That's cargo cult programming: take something you half read,
rewrite it & release it into the wild. Serves me right. Sorry... :)

> This worked for me:
> 
>     if ($html =~ m#<title>([^<]*)</title>#si) {
>         $title = $1;
>     }
>     else {
>         $title = '<none>';
>     }
 
It's still better to go for a parser though. What yould your script do
with this html:

   <html>
   <head>
     <!-- old name: <title>Eat at Joe's</title>  -->
     <title>Joe's HTML Hutt</title>
   </head>
   <body>
     <h1>HTML Tutorial: Document Layout</h1>
     <p>Joe includes the following code in all his documents:
     <img src="source.gif" width="250" height="200"
      alt="<html><head><title></title></head><body></body></html>" >
   </body>
   </html>

In that code, the title tag pair shows up validly three times, but only
once does it actually define the document's title property, and that is
surely the one most scripters would be interested in. You could monkey
around with trying to figure out which title tag isn't inside comments and
isn't part of another tag's attributes etc, but it's far easier and far
more robust to simply pull in a parser & let it do the work for you. 



--
Chris Devers                     [EMAIL PROTECTED]
webmaster                                 work: 781.221.5372
Skillcheck                                cell: 617.365.0585

_______________________________________________
Perl-Win32-Web mailing list
[EMAIL PROTECTED]
http://listserv.ActiveState.com/mailman/listinfo/perl-win32-web
Re: regexp question

Reply via email to