Paul Kraus wrote:

> Someone want to show me how this module can help parse out html?
>
> I want to grap text between <td>text</td> being able to apple regexp to
> get what I want.
>
> The problem is my text is among 10,000 td tags. With the only difference
> being what the above <th> tag has in it.
>
> So if th tag = then store text between <td> into an array.
>
> Paul

Hi Paul,

Sorry that earlier response was so dumb.  I didn't connect the content with
the subject line, I'm afraid.  That may be because it included neither data
illustration nor anything you had tried on your own.  Anyway, I hope this
makes up for my negligence a bit.

I'm not sure that HTML::TokeParser::Simple adds anything to the
functionality of HTML::TokeParser for your purposes [at least what you have
described here].  The Simple part mostly has to do with making the tag types
and attributes more transparent.  I didn't see much in the docs about the
data itself.  Neither module seems all that user friendly, but I got
something along that line working.

With a simple table using headers:
table_test.html:
<html>
<head>
<title>  HTML::TokeParser Test </title>
</head>

<body>
<table rows=4 cols=3>
<tr> <th> Key </th> <th> name </th> <th> Address </th> </tr>
<tr>
<td> 1 </td> <td> George </td> <td> farewell </td>
</tr>
<tr>
<td> 2 </td> <td> Abe </td> <td> Gettysburg </td>
</tr>
<tr>
<td> 3 </td> <td> Joseph </td> <td> E-Mail </td>
</tr>
</table>

This [after many hours of near-misses], seemed to work:Greetings!
E:\d_drive\perlStuff>perl -w -MHTML::TokeParser
Greetings! E:\d_drive\perlStuff>perl -w -MHTML::TokeParser
my $tp = HTML::TokeParser->new('table_test.html');
my @fields;

my @test;

my $open_tag;
$open_tag = $tp->get_tag('th');
while ($open_tag and $open_tag->[0] ne '/tr') {
   if (my $test = $tp->get_text('/th')) {
      push @fields, $test;
      $open_tag = $tp->get_tag('th', '/tr')
   }
}

my @data;
my $data_start = $tp->get_tag('tr');
while ($data_start) {
   my $values = {};
   foreach (@fields) {
      $tp->get_tag('td');
      $values->{$_} = $tp->get_text('/td');
   }
   push @data, $values;
   $data_start = $tp->get_tag('tr');
}

foreach $row (@data) {
   print "$_: $row->{$_};   " foreach keys %$row;
   print "\n";
}
^Z
 Address :  farewell ;    name :  George ;    Key :  1 ;
 Address :  Gettysburg ;    name :  Abe ;    Key :  2 ;
 Address :  E-Mail ;    name :  Joseph ;    Key :  3 ;


It simple would not come together until I dealt with holiday cooking and
celebrations, though.  The main problem I was having was because I had been
trying to do too much in the control blocks of the while loops.  These
"shortcuts" kept creating situations where the loop would pass beyond the
desired data and consume the whole file.  Doing a priming round, and then
doing a spare test of value in the loop condition helped a lot.

Of course, you still have to have a way to pick the particular row that you
want, a complication that you didn't mention.

Joseph




-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to