RE: Yet another regex question

Thomas, Mark - BLS CTR Wed, 24 Mar 2004 07:03:50 -0800

> I'm trying to grab a website's description from the meta tags 
> but I can't seem to make it work all the time.


As people have pointed out, it's best to use a parser to parse HTML, not a
regex. Here's some code (untested) as a starting point for two different
parsers.

# Sample HTML::Tokeparser code:

  $p = HTML::TokeParser->new($filename);
  my $desc = "";
  while (my $token = $p->get_tag("meta")) {
      if ($token->[1]{name} eq "description") {
                $desc = $token->[1]{content};
        }
  }

# Sample XML::LibXML code (handy if you plan to do
# more sophisticated parsing or XML parsing too)
  
  my $p = XML::LibXML->new()->recover(1);
  my $doc = $p->parse_html_file($filename);
  my $desc =
$doc->findvalue('/html/head/meta/@content[../@name="description"]');



-- 
Mark Thomas                    [EMAIL PROTECTED]
Internet Systems Architect     User Technology Associates, Inc.

$_=q;KvtuyboopuifeyQQfeemyibdlfee;; y.e.s. ;y+B-x+A-w+s; ;y;y; ;;print;;
 


_______________________________________________
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

RE: Yet another regex question

Reply via email to