Re: help with HTML::TableExtract

Chas. Owens Thu, 01 Jan 2009 13:09:33 -0800

On Thu, Jan 1, 2009 at 10:16,  <but...@voila.fr> wrote:
snip
>  use HTML::TableExtract;
snip


This makes it possible to use the code in HTML::TableExtract.

snip
>  $te = HTML::TableExtract->new( headers => [qw(Date Price Cost)] );
snip

This is, in fact, creating (instantiating) a new object of the class
HTML::TableExtract.  The constructor is taking an argument that tells
it what the expected headers to look for are.

snip
>  $te->parse($html_string);
snip

This line is telling the object to look for the table specified by the
headers from the constructor in the HTML in contained in $html_string.
 If this method call returns a true value then $te will contain the
data from the tables that matched.  If it returns a false value then
no table matched.

You can get a handle to each table found by calling the tables method
on $te.  You should be able to call the column method on each table to
print the desired column, but there appears to be a bug in at least
the latest version of the code (2.10 dating from 2006).  The bug
occurs around line 900.  He is trying to use a row object as an index.
 You can change that function to look like this:

  sub column {
    my $self = shift;
    my $c = shift;
    my @column;
    my $r;
    foreach my $row ($self->rows) {
      push(@column, $self->cell($r++, $c));
    }
    wantarray ? @column : \...@column;
  }

Here is a program similar to what you described.

#!/usr/bin/perl

use strict;
use warnings;

use HTML::TableExtract;

my $te = HTML::TableExtract->new(headers => [qw/foo bar baz/])
        or die "could not create table extract object\n";

#read in all of the lines from the DATA section and join them
#into one scalar to pass to the parse method
$te->parse(join "", <DATA>)
        or die "could not find table\n";

my $i = 1;
for my $table ($te->tables) {

        print "table $i column 0:\n";
        $i++;
        for my $cell ($table->column(0)) {
                print "\t$cell\n";
        }
}

__DATA__
<table>
        <tr><th>foo</th><th>bar</th><th>baz</th></tr>
        <tr><td>1</td><td>a</td><td>z</td></tr>
        <tr><td>2</td><td>b</td><td>y</td></tr>
        <tr><td>3</td><td>c</td><td>x</td></tr>
</table>
<table>
        <tr><th>foo</th><th>bar</th><th>baz</th></tr>
        <tr><td>1</td><td>a</td><td>z</td></tr>
        <tr><td>2</td><td>b</td><td>y</td></tr>
        <tr><td>3</td><td>c</td><td>x</td></tr>
</table>

and here is one that works without fixing the broken module

#!/usr/bin/perl

use strict;
use warnings;

use HTML::TableExtract;

my $te = HTML::TableExtract->new(headers => [qw/foo bar baz/])
        or die "could not create table extract object\n";

#read in all of the lines from the DATA section and join them
#into one scalar to pass to the parse method
$te->parse(join "", <DATA>)
        or die "could not find table\n";

my $i = 1;
for my $table ($te->tables) {

        print "table $i column 0:\n";
        $i++;
        for my $col ($table->columns) {
                for my $cell (@$col) {
                        print "\t$cell\n";
                }
                last;
        }
}

__DATA__
<table>
        <tr><th>foo</th><th>bar</th><th>baz</th></tr>
        <tr><td>1</td><td>a</td><td>z</td></tr>
        <tr><td>2</td><td>b</td><td>y</td></tr>
        <tr><td>3</td><td>c</td><td>x</td></tr>
</table>
<table>
        <tr><th>foo</th><th>bar</th><th>baz</th></tr>
        <tr><td>1</td><td>a</td><td>z</td></tr>
        <tr><td>2</td><td>b</td><td>y</td></tr>
        <tr><td>3</td><td>c</td><td>x</td></tr>
</table>

I will see what I can do about getting the module fixed in CPAN.

-- 
Chas. Owens
wonkden.net
The most important skill a programmer can have is the ability to read.

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: help with HTML::TableExtract

Reply via email to