>   my $stream = HTML::TokeParser->new(\$agent->{content});
> 
>   while ( my $tag=$stream->get_tag("form")) {
>      if ($tag->[1]{name} and $tag->[1]{name} eq "f1") {
>           $stream->get_tag("table");
>               $stream->get_tag("table");
>               $stream->get_tag("table");
>      }
>   }
> 
> 
> I want to take the last HTML table and put it into a new stream to be
>   parsed down later. If I use get_text(), I will lose the HTML tags.
> 
> Is there a way to save this block for later? I want to save 
> chunks of the HTML for later parsing.

This is difficult to do in a stream-based parser. You would have to use
get_token, saving every token until you reach the end-table token. If you
wanted to handle nested tables, you'd need to increment a counter every time
you reach a <table> and decrement it when you reach a </table>, when you get
to zero you're done.

A tree-based approach using HTML::TreeBuilder or Xpath (via XML::LibXML's
HTML parsing capability) would be easier. Here's XML::LibXML code that will
do what you want:

  use XML::LibXML;
  my $doc =
XML::LibXML->new({recover=>1})->parse_html_string($agent->{content});
  my $tableNode = $doc->findnodes('//[EMAIL PROTECTED]"f1"]/table/table/table');
  print $tableNode->toString;

Pretty easy.

-- 
Mark Thomas                    [EMAIL PROTECTED] 
Internet Systems Architect     DigitalNet, Inc. 

$_=q;KvtuyboopuifeyQQfeemyibdlfee;; y.e.s. ;y+B-x+A-w+s; ;y;y; ;;print;; 
  

_______________________________________________
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Reply via email to