Re: parsing text

Joe Youngquist Wed, 10 Dec 2003 07:00:14 -0800

Thank you $Bill,

I'll have to digest your code to see if there is something I can integrate
into the solution.
The problem in using a hash to hold the column start positions is that those
columns shift based on how much "stat" is placed in the column.
For example, the PF column once the total reaches the hundreds, the column
will expand wider, pushing the other columns to the right.
But there is some promise here, if I compromise on being completely
generalized in the parsing.  By "generalized" I mean that I supply the
parser with no strings to try and match.  This way I can use the same parser
for any kind of text that is in this kind of format.


Thanks again, for the time you spent on this and I think that there is some,
well in fact, most of the code after the defining of the section starts and
the column starts, do a much better job at pulling out the data than what I
had thought of.

JY

----- Original Message -----
From: "$Bill Luebkert" <[EMAIL PROTECTED]>
To: "Joe Youngquist" <[EMAIL PROTECTED]>
Sent: Wednesday, December 10, 2003 7:35 AM
Subject: Re: parsing text


> Joe Youngquist wrote:
> > Hello list,
> >
> > I've been trying to figure out a generalized method of parsing space
> > formatted text to outout into html tables.  The data is verly likely
> > written out using Perl Reports and Pictures, has anyone come up with
> > a general method?
>
> Here's a slightly different approach breaking the text into 3 sections
> and lines in that section and fields in each line (didn't get to the
> HTML part yet) :
>
> my $text = <<EOD;
>                     |-----------------OVERALL
STATISTICS------------------|
> TOTALS               O-REB D-REB TOTAL   PF  FO    A   TO  A/TO Hi Pts
> --------------------------------------------------------------------------
-
> Lowe, Kenneth.......     0    15    15   15   0   14   11   1.3     26
> Teague, David.......     6    16    22    9   0    9    4   2.2     19
> Booker, Chris.......    13    21    34    8   0   10   10   1.0     20
> Buckley, Melvin.....     5    17    22   11   0   10    8   1.2     20
> McKnight, Brandon...     1    11    12   15   1   18   15   1.2     13
> Buscher, Brett......     1     9    10   15   0    9    9   1.0     10
> Kartelo, Ivan.......    22    19    41   14   0    2    7   0.3     12
> Kiefer, Matt........     9    12    21   14   0    4    9   0.4      7
> Parkinson, Austin...     3     5     8    4   0   20    7   2.9      8
> Nwankwo, Ije........     2     2     4    2   0    2    2   1.0      2
> Carroll, Matt.......     1     3     4    6   0    0    2   0.0      2
> Ford, Andrew........     0     1     1    2   0    0    1   0.0      0
> Garrity, Kevin......     0     1     1    0   0    0    0   0.0      0
> Hartley, Chris......     1     0     1    0   0    0    1   0.0      0
> Total...............    72   143   215  115   1   98   86   1.1     78
> Opponents...........    72   130   202  131   -   62  103   0.6     68
>
>    TEAM STATISTICS                         PUR          OPP
>    --------------------------------------------------------
>    SCORING.......................          431          352
>      Points per game.............         71.8         58.7
>      Scoring margin..............        +13.2            -
>    FIELD GOALS-ATT...............      142-328      134-336
>      Field goal pct..............         .433         .399
>    3 POINT FG-ATT................       36-102        25-99
>      3-point FG pct..............         .353         .253
>      3-pt FG made per game.......          6.0          4.2
>    FREE THROWS-ATT...............      111-147        59-99
>      Free throw pct..............         .755         .596
>    REBOUNDS......................          215          202
>      Rebounds per game...........         35.8         33.7
>      Rebounding margin...........         +2.2            -
>    ASSISTS.......................           98           62
>      Assists per game............         16.3         10.3
>    TURNOVERS.....................           86          103
>      Turnovers per game..........         14.3         17.2
>      Turnover margin.............         +2.8            -
>      Assist/turnover ratio.......          1.1          0.6
>    STEALS........................           44           31
>      Steals per game.............          7.3          5.2
>    BLOCKS........................           23           23
>      Blocks per game.............          3.8          3.8
>    WINNING STREAK................            6            -
>      Home win streak.............            3            -
>    ATTENDANCE....................        33118        23435
>      Home games-Avg/Game.........      3-11039          0-0
>      Neutral site-Avg/Game.......            -       3-7812
>
>    BY PERIOD     1st  2nd    Total
>    ------------ ---- ----     ----
>    Team........  203  228  -   431
>    Opponents...  164  188  -   352
> EOD
>
> # you can expand on this table to include a prefix and suffix for a field
> # and whether you want to use it or not etc.
>
> # section => { column-heading => [start-col, length], ... }
>
> my %sections = ( # offset of column 1 is 1 rather than 0 here
>   1 => {Name => [1, 20], ORBs => [24, 3], REBs => [30, 3],
>     RBs => [36, 3], PFs => [41, 3], FOs => [45, 3], ASTs => [50, 3],
>     TOs => [55, 3], 'A-TO' => [59, 5], Pts => [68, 3], },
>   2 => {Title => [4, 30], Home => [37, 10], Opp => [50, 10], },
>   3 => {Team => [4, 12], '1st' => [18, 3], '2nd' => [23, 3],
>     Total => [30, 5], },
> );
>
> my %sect_start = ( # used to find start of section
>   0 => qr'^TOTALS\s+',
>   1 => qr'^\s*TEAM\s+',
>   2 => qr'^\s*BY\s+',
> );
>
> print Data::Dumper->Dump([\%sections, \%sect_start],
>   [qw(%sections %sect_start)]) if $debug;
>
> my $MIN_LINE_LEN = 80;
> my @lines = split /\n/, $text;
> print "Number of Lines: ", scalar @lines, "\n" if $debug;
>
> my $sect = 0;
> my %flds = (); # section # => [lines] [flds]
> for (my $ii = 0; $ii < @lines; $ii++) {
>
> $_ = $lines[$ii];
> next if /^\s*$/; # skip blank lines
> next if /---/; # skip dividers
> $_ .= ' ' x ($MIN_LINE_LEN - length $_); # pad to fixed length
>
> my @flds = ();
> if ($sect < keys %sect_start and /$sect_start{$sect}/) {
> $sect++;
> $flds{$sect} = [];
> print "section going to $sect\n" if $debug;
> print "\n";
> next;
> }
>
> push @{$flds{$sect}}, [];
> my $haref = $sections{$sect}; # hash of arrays
> my $jj = 0;
> foreach my $aref (keys %{$haref}) {
>
> my ($fc, $num) = @{$haref->{$aref}};
> print "Sect $sect: $aref => from $fc for $num\n" if $debug;
> my $tmp = substr $_, $fc-1, $num;
> $tmp =~ s/\.+$//;
> print "$sect: $ii: $jj: $tmp\n" if $debug;
> push @{$flds{$sect}[$ii]}, $tmp;
> }
> }
> print Data::Dumper->Dump([\%flds], [qw(%flds)]) if $debug;
>
> __END__
>
>
> --
>   ,-/-  __      _  _         $Bill Luebkert    Mailto:[EMAIL PROTECTED]
>  (_/   /  )    // //       DBE Collectibles    Mailto:[EMAIL PROTECTED]
>   / ) /--<  o // //      Castle of Medieval Myth & Magic
http://www.todbe.com/
> -/-' /___/_<_</_</_    http://dbecoll.tripod.com/ (My Perl/Lakers stuff)

_______________________________________________
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Re: parsing text

Reply via email to