RE: Parsing HTML Tables

Charles K. Clarkson Sat, 30 Apr 2005 18:45:32 -0700

Sri Pen <mailto:[EMAIL PROTECTED]> wrote:


: and second @tableDefRows match the data between
: <TD*>..4321191..</TD>It matches all the data between
:  <TABLE><TOBDY><TR*>...</TR></TBODY></TABLE> something is wrong
: here.

    You are getting this because perl regular expressions are
greedy by default and because regular expressions alone are a poor
substitute for a good parser of HTML.


: Do I need to some how start from <TABLE> and my match all the
: way tofirst </td> and use some backtracking or something?

    Perhaps. It is probably best to just scrap the regular
expressions and use a module made for parsing HTML. Here's some
code using a general HTML parser, but there are many table parsers
as well.

my $parser = HTML::TokeParser->new( \$html_string );

my @rows;
while ( $parser->get_tag( 'tr' ) ) {

    # check error number
    next unless $parser->get_trimmed_text( '/td' ) eq '';

    # get next cell
    $parser->get_tag( 'td' );

    # check userid
    next unless $parser->get_trimmed_text( '/td' ) eq 'YHIRA';

    my @cells;
    while ( $parser->get_tag( 'td' ) ) {
        my $text = $parser->get_trimmed_text( '/td' );
        push @cells, $text if $text and $text =~ /4321191/;
    }

    push @rows, [EMAIL PROTECTED] if @cells;
}


HTH,

Charles K. Clarkson
-- 
Mobile Homes Specialist
254 968-8328


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

RE: Parsing HTML Tables

Reply via email to