Re: HTML::Parser walkthrough?

Chuck Bearden Mon, 26 Jan 2004 15:42:30 -0800

On Mon, Jan 26, 2004 at 04:47:17PM -0500, Ben Ostrowsky wrote:
> I'm trying to build a script that will go to a NOAA web page, find the 
> current temperature, and return just that information.
> 
> If the first TABLE in the document is TABLE[0], then the data I want is, I 
> think, at:
> 
> HTML > BODY > TABLE[3] > TR[7] > TD[1]
> 
> But how can I use Perl to get the contents of that TD?  I don't understand 
> the HTML::Parser manpage.


HTML::Parser is an event-oriented stream parser.  It's similar to SAX
programming in XML, if you've done any of that.  For your purposes, a
start-tag handler, end-tag handler, and character data handler will
probably do.  You define the start and end handlers to track the 
state of your stream parsing, incrementing counts for <table>, <tr>, 
and <td> elements as they are processed from the stream, so that you 
know when you've reached table 3, row 7, cell 1.  You define the 
character handler to check the state of parsing, and when the wanted 
state is reached, it starts accumulating character data.  When the end
tag handler detects the end of table[3]/tr[7]/td[1], it changes the
state again so that the character data handler doesn't accumulate
character data any longer.  It also does something with the 
accumulated data, like printing it on STDOUT or loading it into a
database or storing it in a member variable of your subclassed parser
object.

HTML::Parser looks a little more intimidating than it should because of
the flexible way it lets you define your event-handlers--you can specify
a list containing a reference to a handler function and other arguments,
typically an argspec (argument specification, basically defining a
signature for your handler subroutine).

I haven't written any HTML::Parser code since my previous position, so I
don't have any examples handy, but I hope my description helps.

Best wishes,
Chuck

Re: HTML::Parser walkthrough?

Reply via email to