Thank you $Bill, I'll have to digest your code to see if there is something I can integrate into the solution. The problem in using a hash to hold the column start positions is that those columns shift based on how much "stat" is placed in the column. For example, the PF column once the total reaches the hundreds, the column will expand wider, pushing the other columns to the right. But there is some promise here, if I compromise on being completely generalized in the parsing. By "generalized" I mean that I supply the parser with no strings to try and match. This way I can use the same parser for any kind of text that is in this kind of format.
Thanks again, for the time you spent on this and I think that there is some, well in fact, most of the code after the defining of the section starts and the column starts, do a much better job at pulling out the data than what I had thought of. JY ----- Original Message ----- From: "$Bill Luebkert" <[EMAIL PROTECTED]> To: "Joe Youngquist" <[EMAIL PROTECTED]> Sent: Wednesday, December 10, 2003 7:35 AM Subject: Re: parsing text > Joe Youngquist wrote: > > Hello list, > > > > I've been trying to figure out a generalized method of parsing space > > formatted text to outout into html tables. The data is verly likely > > written out using Perl Reports and Pictures, has anyone come up with > > a general method? > > Here's a slightly different approach breaking the text into 3 sections > and lines in that section and fields in each line (didn't get to the > HTML part yet) : > > my $text = <<EOD; > |-----------------OVERALL STATISTICS------------------| > TOTALS O-REB D-REB TOTAL PF FO A TO A/TO Hi Pts > -------------------------------------------------------------------------- - > Lowe, Kenneth....... 0 15 15 15 0 14 11 1.3 26 > Teague, David....... 6 16 22 9 0 9 4 2.2 19 > Booker, Chris....... 13 21 34 8 0 10 10 1.0 20 > Buckley, Melvin..... 5 17 22 11 0 10 8 1.2 20 > McKnight, Brandon... 1 11 12 15 1 18 15 1.2 13 > Buscher, Brett...... 1 9 10 15 0 9 9 1.0 10 > Kartelo, Ivan....... 22 19 41 14 0 2 7 0.3 12 > Kiefer, Matt........ 9 12 21 14 0 4 9 0.4 7 > Parkinson, Austin... 3 5 8 4 0 20 7 2.9 8 > Nwankwo, Ije........ 2 2 4 2 0 2 2 1.0 2 > Carroll, Matt....... 1 3 4 6 0 0 2 0.0 2 > Ford, Andrew........ 0 1 1 2 0 0 1 0.0 0 > Garrity, Kevin...... 0 1 1 0 0 0 0 0.0 0 > Hartley, Chris...... 1 0 1 0 0 0 1 0.0 0 > Total............... 72 143 215 115 1 98 86 1.1 78 > Opponents........... 72 130 202 131 - 62 103 0.6 68 > > TEAM STATISTICS PUR OPP > -------------------------------------------------------- > SCORING....................... 431 352 > Points per game............. 71.8 58.7 > Scoring margin.............. +13.2 - > FIELD GOALS-ATT............... 142-328 134-336 > Field goal pct.............. .433 .399 > 3 POINT FG-ATT................ 36-102 25-99 > 3-point FG pct.............. .353 .253 > 3-pt FG made per game....... 6.0 4.2 > FREE THROWS-ATT............... 111-147 59-99 > Free throw pct.............. .755 .596 > REBOUNDS...................... 215 202 > Rebounds per game........... 35.8 33.7 > Rebounding margin........... +2.2 - > ASSISTS....................... 98 62 > Assists per game............ 16.3 10.3 > TURNOVERS..................... 86 103 > Turnovers per game.......... 14.3 17.2 > Turnover margin............. +2.8 - > Assist/turnover ratio....... 1.1 0.6 > STEALS........................ 44 31 > Steals per game............. 7.3 5.2 > BLOCKS........................ 23 23 > Blocks per game............. 3.8 3.8 > WINNING STREAK................ 6 - > Home win streak............. 3 - > ATTENDANCE.................... 33118 23435 > Home games-Avg/Game......... 3-11039 0-0 > Neutral site-Avg/Game....... - 3-7812 > > BY PERIOD 1st 2nd Total > ------------ ---- ---- ---- > Team........ 203 228 - 431 > Opponents... 164 188 - 352 > EOD > > # you can expand on this table to include a prefix and suffix for a field > # and whether you want to use it or not etc. > > # section => { column-heading => [start-col, length], ... } > > my %sections = ( # offset of column 1 is 1 rather than 0 here > 1 => {Name => [1, 20], ORBs => [24, 3], REBs => [30, 3], > RBs => [36, 3], PFs => [41, 3], FOs => [45, 3], ASTs => [50, 3], > TOs => [55, 3], 'A-TO' => [59, 5], Pts => [68, 3], }, > 2 => {Title => [4, 30], Home => [37, 10], Opp => [50, 10], }, > 3 => {Team => [4, 12], '1st' => [18, 3], '2nd' => [23, 3], > Total => [30, 5], }, > ); > > my %sect_start = ( # used to find start of section > 0 => qr'^TOTALS\s+', > 1 => qr'^\s*TEAM\s+', > 2 => qr'^\s*BY\s+', > ); > > print Data::Dumper->Dump([\%sections, \%sect_start], > [qw(%sections %sect_start)]) if $debug; > > my $MIN_LINE_LEN = 80; > my @lines = split /\n/, $text; > print "Number of Lines: ", scalar @lines, "\n" if $debug; > > my $sect = 0; > my %flds = (); # section # => [lines] [flds] > for (my $ii = 0; $ii < @lines; $ii++) { > > $_ = $lines[$ii]; > next if /^\s*$/; # skip blank lines > next if /---/; # skip dividers > $_ .= ' ' x ($MIN_LINE_LEN - length $_); # pad to fixed length > > my @flds = (); > if ($sect < keys %sect_start and /$sect_start{$sect}/) { > $sect++; > $flds{$sect} = []; > print "section going to $sect\n" if $debug; > print "\n"; > next; > } > > push @{$flds{$sect}}, []; > my $haref = $sections{$sect}; # hash of arrays > my $jj = 0; > foreach my $aref (keys %{$haref}) { > > my ($fc, $num) = @{$haref->{$aref}}; > print "Sect $sect: $aref => from $fc for $num\n" if $debug; > my $tmp = substr $_, $fc-1, $num; > $tmp =~ s/\.+$//; > print "$sect: $ii: $jj: $tmp\n" if $debug; > push @{$flds{$sect}[$ii]}, $tmp; > } > } > print Data::Dumper->Dump([\%flds], [qw(%flds)]) if $debug; > > __END__ > > > -- > ,-/- __ _ _ $Bill Luebkert Mailto:[EMAIL PROTECTED] > (_/ / ) // // DBE Collectibles Mailto:[EMAIL PROTECTED] > / ) /--< o // // Castle of Medieval Myth & Magic http://www.todbe.com/ > -/-' /___/_<_</_</_ http://dbecoll.tripod.com/ (My Perl/Lakers stuff) _______________________________________________ Perl-Win32-Users mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs