On Mon, Jan 26, 2004 at 04:47:17PM -0500, Ben Ostrowsky wrote: > I'm trying to build a script that will go to a NOAA web page, find the > current temperature, and return just that information. > > If the first TABLE in the document is TABLE[0], then the data I want is, I > think, at: > > HTML > BODY > TABLE[3] > TR[7] > TD[1] > > But how can I use Perl to get the contents of that TD? I don't understand > the HTML::Parser manpage.
HTML::Parser is an event-oriented stream parser. It's similar to SAX programming in XML, if you've done any of that. For your purposes, a start-tag handler, end-tag handler, and character data handler will probably do. You define the start and end handlers to track the state of your stream parsing, incrementing counts for <table>, <tr>, and <td> elements as they are processed from the stream, so that you know when you've reached table 3, row 7, cell 1. You define the character handler to check the state of parsing, and when the wanted state is reached, it starts accumulating character data. When the end tag handler detects the end of table[3]/tr[7]/td[1], it changes the state again so that the character data handler doesn't accumulate character data any longer. It also does something with the accumulated data, like printing it on STDOUT or loading it into a database or storing it in a member variable of your subclassed parser object. HTML::Parser looks a little more intimidating than it should because of the flexible way it lets you define your event-handlers--you can specify a list containing a reference to a handler function and other arguments, typically an argspec (argument specification, basically defining a signature for your handler subroutine). I haven't written any HTML::Parser code since my previous position, so I don't have any examples handy, but I hope my description helps. Best wishes, Chuck