On Wednesday, Nov 26, 2003, at 12:30 US/Pacific, Paul Kraus wrote:
Someone want to show me how this module can help parse out html?
I want to grap text between <td>text</td> being able to apple regexp to get what I want.
The problem is my text is among 10,000 td tags. With the only difference
being what the above <th> tag has in it.
So if th tag = then store text between <td> into an array.
my first concern here is did you mean <th> or <tr>?
a simple table would look like: <table> <tr> <th>header1</th> <th>header2</th> <th>header3</th> </tr> <tr> <td>_Row_1_Cell_1_</td> <td>_Row_1_Cell_2_</td> <td>_Row_1_Cell_3_</td> </tr> <tr> <td>_Row_2_Cell_1_</td> <td>_Row_2_Cell_2_</td> <td>_Row_2_Cell_3_</td> </tr> <tr> <td>_Row_3_Cell_1_</td> <td>_Row_3_Cell_2_</td> <td>_Row_3_Cell_3_</td> </tr> </table>
You have almost written your algorithm
while( my $token = $p->get_token) { last if ($token->is_start_tag('table')); }
# there is a Table opening Tag, our hope now is that # we can get our Keys from the headers
my $count = 0; my $header = {};
while( my $token = $p->get_token) { next if ($token->is_start_tag( qr/t[rd]/)); # don't care last if ($token->is_end_tag('/tr')); # finished with headers if ($token->is_end_tag('/td')) { $count++; next; } if ( $token->is_text()) { my $text = $token->as_is(); $header->{$count} = $text if ( $text =~ <some_pattern>); } }
# # read the first row of headers, now to meander forward # At this point we know that IF
if(defined($header->{$count})) this is a column we have to grot data from into the storage set up
and that would be basically like the way that we grotted out the header sections, which is left as an exercise for the reader.
CAVEAT: simply because it looks like Perl, does not mean that I have written Perl, or that the code will actually work. It is merely a demonstration in algorithm creation.
ciao drieux
---
-- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]