Douglass Franklin wrote: > > I'm trying to transform this html table to a colon-delimited flat-file
Why colon separated? What if one of the fields has a colon? > database. This is what I have so far: > > HTML: > <tr><td class='bodyblack' width='50%'><a > href='http://jsearch.usajobs.opm.gov/summary.asp?OPMControl=IC9516' > class='jobrlist'><font size='2'>ACCOUNTANT > </font></a></td><td class='bodyblack' width='40%'>$24,701.00 > - $51,971.00 > </td><td class='bodyblack'>INDEFINITE</td></tr> > <tr><td class='bodyblack'>CONTINENTAL U.S., US</td> > </tr><td class='bodyblack' colspan='3'>  </td></tr> > > Database Record (wanted): > Accountant:$24,701.00 - $51,971.00:INDEFINITE:CONTINENTAL U.S., US > > Regex I have: > $jobrecord =~ ^(<tr>)(<td class='bodyblack' width='50%'>)(.+)(  > </td></tr>)$ > > However, this doesn't seem to be working. Please help. This will work and was tested on the attached page from http://jsearch.usajobs.opm.gov/ #!/usr/bin/perl use warnings; use strict; use HTML::TokeParser; my $p = HTML::TokeParser->new( 'page1.html' ) or die "Cannot open page1.html: $!"; my @data; TABLE: while ( my $token = $p->get_token() ) { my @table; if ( $token->[ 0 ] eq 'S' and $token->[ 1 ] eq 'center' ) { $token = $p->get_token(); if ( $token->[ 0 ] eq 'S' and $token->[ 1 ] eq 'table' ) { $token = $p->get_token(); if ( $token->[ 0 ] eq 'S' and $token->[ 1 ] eq 'tr' ) { $token = $p->get_token(); if ( $token->[ 0 ] eq 'S' and $token->[ 1 ] eq 'td' ) { $token = $p->get_token(); if ( $token->[ 0 ] eq 'S' and $token->[ 1 ] eq 'strong' ) { while ( $token = $p->get_token() ) { push @table, $token->[ 1 ] if $token->[ 0 ] eq 'T'; if ( $token->[ 0 ] eq 'S' and $token->[ 1 ] eq 'center' ) { $p->unget_token( $token ); s/ / /g, s/^\s+//, s/\s+$// for @table; push @data, join ':', @table; next TABLE; } } } } } } } } print "$_\n" for @data; __END__ HTH John -- use Perl; program fulfillment
-- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]