Hi Ben: On Mon, Jan 26, 2004 at 04:47:17PM -0500, Ben Ostrowsky wrote: > HTML > BODY > TABLE[3] > TR[7] > TD[1]
HTML::Parser is quite complex, and like Chuck said you need to get your head around callbacks to start working with it. Callbacks are a really handy technique to use, and as Chuck also said they are not confined to just parsing HTML. You may want to also look at HTML::Tree [1] and it's associated modules. HTML::Tree allows you to build an in memory data structure of the HTML page...kinda like a Perlish Document Object Model (DOM). Once you've got the page in memory you can dig in to the place that you are interested in, and extract the value. Here's an example, retrieving a fictitious page from NOAA. use strict; use warnings; use HTML::TreeBuilder; use LWP::Simple; my $html = get( 'http://www.noaa.gov/ben.html' ); my $tree = HTML::TreeBuilder->new_from_content( $html ); my $body = $tree->look_down( _tag => 'body' ); ## get the third <table> my $count = 0; my $table; foreach my $element ( $body->content_list() ) { $count++ if ( $element->tag() eq 'table' ); if ( $count == 3 ) { $table = $element; last; } } ## get the 7th <tr> $count = 0; my $row; foreach my $element ( $table->content_list() ) { $count++ if ( $element->tag eq 'tr' ); if ( $count == 7 ) { $row = $element; last; } } ## extract the first <td> my ( $td ) = $row->content_list(); ## and print it! print $td->as_text(); There would need to be some error checking in here to make sure that we are really getting the table, tr and td elements of course before calling methods on them :) Sean Burke is the current maintainer of HTML::Parser and HTML::Tree, and has written some good articles on parsing HTML, a few of which are included in the HTML::Tree distribution [2,3,4]. If you really get interested you can buy (or perhaps check out :) his book Perl & LWP [5] which has lots of good info on parsing HTML. Strongly recommended! //Ed [1] http://search.cpan.org/perldoc?HTML::TreeBuilder [2] search.cpan.org/perldoc?HTML::Tree::AboutObjects [3] search.cpan.org/perldoc?HTML::Tree::AboutTrees [4] search.cpan.org/perldoc?HTML::Tree::Scanning [5] http://www.oreilly.com/catalog/perllwp/ //Ed -- Ed Summers aim: inkdroid web: http://www.inkdroid.org The best writing is rewriting. [E. B. White]