RE: parsing text

2003-12-10 Thread Lee Goddard

Nice idea: I'm surprised it's not been done before
(I didn't look on CPAN ...)

Just a thought, fwiw: if you are sure there will be
no spaces in your leaders - the bit between the
row name and the data (...) - and if you can be sure
that each column consists of data without white space
then you could surely use a regular expression to 
get at the data?

You $text string does have a row (number 6) with
a space in the leader: but maybe you get around
that by requiring a column to have white space on
either side...?

Just a thought.
lee

-Original Message-
From: Joe Youngquist [mailto:[EMAIL PROTECTED]
Sent: Tuesday, December 09, 2003 9:58 PM
To: [EMAIL PROTECTED]
Subject: parsing text


Hello list,

I've been trying to figure out a generalized method of parsing space formatted text to 
outout into html tables.  The data is verly likely written out using Perl Reports and 
Pictures, has anyone come up with a general method?

Here's a few examplesof the text that I'd format to html tables:

NOTE: Best to use Courier New font to keep the formatting


|-OVERALL STATISTICS--|
TOTALS   O-REB D-REB TOTAL   PF  FOA   TO  A/TO Hi Pts 
---
Lowe, Kenneth... 01515   15   0   14   11   1.3 26
Teague, David... 616229   094   2.2 19
Booker, Chris...1321348   0   10   10   1.0 20
Buckley, Melvin. 51722   11   0   108   1.2 20
McKnight, Brandon... 11112   15   1   18   15   1.2 13
Buscher, Brett.. 1 910   15   099   1.0 10
Kartelo, Ivan...221941   14   027   0.3 12
Kiefer, Matt 91221   14   049   0.4  7
Parkinson, Austin... 3 5 84   0   207   2.9  8
Nwankwo, Ije 2 2 42   022   1.0  2
Carroll, Matt... 1 3 46   002   0.0  2
Ford, Andrew 0 1 12   001   0.0  0
Garrity, Kevin.. 0 1 10   000   0.0  0
Hartley, Chris.. 1 0 10   001   0.0  0
Total...72   143   215  115   1   98   86   1.1 78
Opponents...72   130   202  131   -   62  103   0.6 68

   TEAM STATISTICS PUR  OPP
   
   SCORING...  431  352
 Points per game. 71.8 58.7
 Scoring margin..+13.2-
   FIELD GOALS-ATT...  142-328  134-336
 Field goal pct.. .433 .399
   3 POINT FG-ATT   36-10225-99
 3-point FG pct.. .353 .253
 3-pt FG made per game...  6.0  4.2
   FREE THROWS-ATT...  111-14759-99
 Free throw pct.. .755 .596
   REBOUNDS..  215  202
 Rebounds per game... 35.8 33.7
 Rebounding margin... +2.2-
   ASSISTS...   98   62
 Assists per game 16.3 10.3
   TURNOVERS.   86  103
 Turnovers per game.. 14.3 17.2
 Turnover margin. +2.8-
 Assist/turnover ratio...  1.1  0.6
   STEALS   44   31
 Steals per game.  7.3  5.2
   BLOCKS   23   23
 Blocks per game.  3.8  3.8
   WINNING STREAK6-
 Home win streak.3-
   ATTENDANCE3311823435
 Home games-Avg/Game.  3-11039  0-0
 Neutral site-Avg/Game...-   3-7812

   BY PERIOD 1st  2ndTotal
      
   Team  203  228  -   431
   Opponents...  164  188  -   352


The goal I'm trying to reach is to build a method that no matter the table of data 
sent to it, will find where the columns are for the data.  It's easy to see where 
the columns are, but my attempt to tell a program how to see the columns has been 
embarrising to say the least.

The road I was walking down was to take each line of a table and look for spaces 
(skipping dashes and pipes) when one is found, look down the rest of the table in 
this current column with the space.  If all the way down the table are spaces (or a 
dash or pipe) then there is likely a column boundry at this column location.  Once

Re: parsing text

2003-12-10 Thread Joe Youngquist
Thank you $Bill,

I'll have to digest your code to see if there is something I can integrate
into the solution.
The problem in using a hash to hold the column start positions is that those
columns shift based on how much stat is placed in the column.
For example, the PF column once the total reaches the hundreds, the column
will expand wider, pushing the other columns to the right.
But there is some promise here, if I compromise on being completely
generalized in the parsing.  By generalized I mean that I supply the
parser with no strings to try and match.  This way I can use the same parser
for any kind of text that is in this kind of format.

Thanks again, for the time you spent on this and I think that there is some,
well in fact, most of the code after the defining of the section starts and
the column starts, do a much better job at pulling out the data than what I
had thought of.

JY

- Original Message -
From: $Bill Luebkert [EMAIL PROTECTED]
To: Joe Youngquist [EMAIL PROTECTED]
Sent: Wednesday, December 10, 2003 7:35 AM
Subject: Re: parsing text


 Joe Youngquist wrote:
  Hello list,
 
  I've been trying to figure out a generalized method of parsing space
  formatted text to outout into html tables.  The data is verly likely
  written out using Perl Reports and Pictures, has anyone come up with
  a general method?

 Here's a slightly different approach breaking the text into 3 sections
 and lines in that section and fields in each line (didn't get to the
 HTML part yet) :

 my $text = EOD;
 |-OVERALL
STATISTICS--|
 TOTALS   O-REB D-REB TOTAL   PF  FOA   TO  A/TO Hi Pts
 --
-
 Lowe, Kenneth... 01515   15   0   14   11   1.3 26
 Teague, David... 616229   094   2.2 19
 Booker, Chris...1321348   0   10   10   1.0 20
 Buckley, Melvin. 51722   11   0   108   1.2 20
 McKnight, Brandon... 11112   15   1   18   15   1.2 13
 Buscher, Brett.. 1 910   15   099   1.0 10
 Kartelo, Ivan...221941   14   027   0.3 12
 Kiefer, Matt 91221   14   049   0.4  7
 Parkinson, Austin... 3 5 84   0   207   2.9  8
 Nwankwo, Ije 2 2 42   022   1.0  2
 Carroll, Matt... 1 3 46   002   0.0  2
 Ford, Andrew 0 1 12   001   0.0  0
 Garrity, Kevin.. 0 1 10   000   0.0  0
 Hartley, Chris.. 1 0 10   001   0.0  0
 Total...72   143   215  115   1   98   86   1.1 78
 Opponents...72   130   202  131   -   62  103   0.6 68

TEAM STATISTICS PUR  OPP

SCORING...  431  352
  Points per game. 71.8 58.7
  Scoring margin..+13.2-
FIELD GOALS-ATT...  142-328  134-336
  Field goal pct.. .433 .399
3 POINT FG-ATT   36-10225-99
  3-point FG pct.. .353 .253
  3-pt FG made per game...  6.0  4.2
FREE THROWS-ATT...  111-14759-99
  Free throw pct.. .755 .596
REBOUNDS..  215  202
  Rebounds per game... 35.8 33.7
  Rebounding margin... +2.2-
ASSISTS...   98   62
  Assists per game 16.3 10.3
TURNOVERS.   86  103
  Turnovers per game.. 14.3 17.2
  Turnover margin. +2.8-
  Assist/turnover ratio...  1.1  0.6
STEALS   44   31
  Steals per game.  7.3  5.2
BLOCKS   23   23
  Blocks per game.  3.8  3.8
WINNING STREAK6-
  Home win streak.3-
ATTENDANCE3311823435
  Home games-Avg/Game.  3-11039  0-0
  Neutral site-Avg/Game...-   3-7812

BY PERIOD 1st  2ndTotal
   
Team  203  228  -   431
Opponents...  164  188  -   352
 EOD

 # you can expand on this table to include a prefix and suffix for a field
 # and whether you want to use

Re: parsing text

2003-12-10 Thread Joe Youngquist
Hello Lee,

My first attempt was to  use a regular expression, but there are no
guaranties on the header format...
The real bugger is sometimes the column headers will not have any spaces
between them, though this is rare, it is something I'll need to keep an eye
on and change manually - I'm not that great of a programmer to tell me
script to make a judgment call on that there column chief. :)
My hope right now is just to make something that works with my data 99% of
the time and something that will work as close to 100% of the time as long
at the column headers have a space between them.  Once I do, this would be
the first time I'd have the joy of contributing to the Perl community.

JY
- Original Message -
From: Lee Goddard [EMAIL PROTECTED]
To: Joe Youngquist [EMAIL PROTECTED];
[EMAIL PROTECTED]
Sent: Wednesday, December 10, 2003 5:30 AM
Subject: RE: parsing text



Nice idea: I'm surprised it's not been done before
(I didn't look on CPAN ...)

Just a thought, fwiw: if you are sure there will be
no spaces in your leaders - the bit between the
row name and the data (...) - and if you can be sure
that each column consists of data without white space
then you could surely use a regular expression to
get at the data?

You $text string does have a row (number 6) with
a space in the leader: but maybe you get around
that by requiring a column to have white space on
either side...?

Just a thought.
lee

-Original Message-
From: Joe Youngquist [mailto:[EMAIL PROTECTED]
Sent: Tuesday, December 09, 2003 9:58 PM
To: [EMAIL PROTECTED]
Subject: parsing text


Hello list,

I've been trying to figure out a generalized method of parsing space
formatted text to outout into html tables.  The data is verly likely written
out using Perl Reports and Pictures, has anyone come up with a general
method?

Here's a few examplesof the text that I'd format to html tables:

NOTE: Best to use Courier New font to keep the formatting


|-OVERALL STATISTICS--|
TOTALS   O-REB D-REB TOTAL   PF  FOA   TO  A/TO Hi Pts
---
Lowe, Kenneth... 01515   15   0   14   11   1.3 26
Teague, David... 616229   094   2.2 19
Booker, Chris...1321348   0   10   10   1.0 20
Buckley, Melvin. 51722   11   0   108   1.2 20
McKnight, Brandon... 11112   15   1   18   15   1.2 13
Buscher, Brett.. 1 910   15   099   1.0 10
Kartelo, Ivan...221941   14   027   0.3 12
Kiefer, Matt 91221   14   049   0.4  7
Parkinson, Austin... 3 5 84   0   207   2.9  8
Nwankwo, Ije 2 2 42   022   1.0  2
Carroll, Matt... 1 3 46   002   0.0  2
Ford, Andrew 0 1 12   001   0.0  0
Garrity, Kevin.. 0 1 10   000   0.0  0
Hartley, Chris.. 1 0 10   001   0.0  0
Total...72   143   215  115   1   98   86   1.1 78
Opponents...72   130   202  131   -   62  103   0.6 68

   TEAM STATISTICS PUR  OPP
   
   SCORING...  431  352
 Points per game. 71.8 58.7
 Scoring margin..+13.2-
   FIELD GOALS-ATT...  142-328  134-336
 Field goal pct.. .433 .399
   3 POINT FG-ATT   36-10225-99
 3-point FG pct.. .353 .253
 3-pt FG made per game...  6.0  4.2
   FREE THROWS-ATT...  111-14759-99
 Free throw pct.. .755 .596
   REBOUNDS..  215  202
 Rebounds per game... 35.8 33.7
 Rebounding margin... +2.2-
   ASSISTS...   98   62
 Assists per game 16.3 10.3
   TURNOVERS.   86  103
 Turnovers per game.. 14.3 17.2
 Turnover margin. +2.8-
 Assist/turnover ratio...  1.1  0.6
   STEALS   44   31
 Steals per game.  7.3  5.2
   BLOCKS   23   23
 Blocks per game.  3.8  3.8
   WINNING STREAK6-
 Home win streak.3-
   ATTENDANCE3311823435

Re: parsing text

2003-12-10 Thread SCOTT_SISSON
:  
 
  [EMAIL PROTECTED]Subject:  Re: parsing text  
  
  veState.com  
 
   
 
   
 
  12/10/2003 08:33 AM  
 
   
 
   
 




Hello Lee,

My first attempt was to  use a regular expression, but there are no
guaranties on the header format...
The real bugger is sometimes the column headers will not have any spaces
between them, though this is rare, it is something I'll need to keep an eye
on and change manually - I'm not that great of a programmer to tell me
script to make a judgment call on that there column chief. :)
My hope right now is just to make something that works with my data 99% of
the time and something that will work as close to 100% of the time as long
at the column headers have a space between them.  Once I do, this would be
the first time I'd have the joy of contributing to the Perl community.

JY
- Original Message -
From: Lee Goddard [EMAIL PROTECTED]
To: Joe Youngquist [EMAIL PROTECTED];
[EMAIL PROTECTED]
Sent: Wednesday, December 10, 2003 5:30 AM
Subject: RE: parsing text



Nice idea: I'm surprised it's not been done before
(I didn't look on CPAN ...)

Just a thought, fwiw: if you are sure there will be
no spaces in your leaders - the bit between the
row name and the data (...) - and if you can be sure
that each column consists of data without white space
then you could surely use a regular expression to
get at the data?

You $text string does have a row (number 6) with
a space in the leader: but maybe you get around
that by requiring a column to have white space on
either side...?

Just a thought.
lee

-Original Message-
From: Joe Youngquist [mailto:[EMAIL PROTECTED]
Sent: Tuesday, December 09, 2003 9:58 PM
To: [EMAIL PROTECTED]
Subject: parsing text


Hello list,

I've been trying to figure out a generalized method of parsing space
formatted text to outout into html tables.  The data is verly likely
written
out using Perl Reports and Pictures, has anyone come up with a general
method?

Here's a few examplesof the text that I'd format to html tables:

NOTE: Best to use Courier New font to keep the formatting


|-OVERALL STATISTICS--|
TOTALS   O-REB D-REB TOTAL   PF  FOA   TO  A/TO Hi Pts
---
Lowe, Kenneth... 01515   15   0   14   11   1.3 26
Teague, David... 616229   094   2.2 19
Booker, Chris...1321348   0   10   10   1.0 20
Buckley, Melvin. 51722   11   0   108   1.2 20
McKnight, Brandon... 11112   15   1   18   15   1.2 13
Buscher, Brett.. 1 910   15   099   1.0 10
Kartelo, Ivan...221941   14   027   0.3 12
Kiefer, Matt 91221   14   049   0.4  7
Parkinson, Austin... 3 5 84   0   207   2.9  8
Nwankwo, Ije 2 2 42   022   1.0  2
Carroll, Matt... 1 3 46   002   0.0  2
Ford, Andrew 0 1 12   001   0.0  0
Garrity, Kevin.. 0 1 10   000   0.0  0
Hartley, Chris.. 1 0 10   001   0.0  0
Total...72   143   215  115   1   98   86   1.1 78
Opponents...72   130   202  131   -   62  103   0.6 68

   TEAM STATISTICS PUR  OPP
   
   SCORING...  431  352
 Points per game. 71.8 58.7
 Scoring margin..+13.2-
   FIELD GOALS-ATT...  142-328  134-336
 Field goal pct.. .433 .399
   3 POINT FG-ATT   36-10225-99
 3-point FG pct.. .353 .253
 3-pt FG made per game...  6.0  4.2
   FREE THROWS-ATT...  111-14759-99
 Free throw

parsing text

2003-12-09 Thread Joe Youngquist



Hello list,

I've been trying to figure out 
ageneralized method of parsing space formatted text to outout into html 
tables. The data is verly likely written out using Perl Reports and 
Pictures, has anyone come up with ageneral method?

Here'sa few examplesof the text that 
I'd format to html tables:

NOTE: Best to use Courier New font to keep 
the formatting


 
|-OVERALL STATISTICS--|
TOTALS 
O-REB D-REB TOTAL PF FO A 
TO A/TO Hi Pts
---Lowe, 
Kenneth... 0 15 
15 15 0 14 11 
1.3 26Teague, David... 
6 16 22 9 
0 9 4 
2.2 19Booker, Chris... 
13 21 34 8 
0 10 10 1.0 
20Buckley, Melvin. 5 
17 22 11 0 
10 8 1.2 20McKnight, 
Brandon... 1 11 
12 15 1 18 15 
1.2 13Buscher, Brett.. 
1 9 10 15 
0 9 9 
1.0 10Kartelo, Ivan... 
22 19 41 14 
0 2 7 
0.3 12Kiefer, Matt 
9 12 21 14 
0 4 9 
0.4 7Parkinson, 
Austin... 3 
5 8 4 0 
20 7 2.9 
8Nwankwo, Ije 2 
2 4 2 0 
2 2 1.0 
2Carroll, Matt... 1 
3 4 6 0 
0 2 0.0 2Ford, 
Andrew 0 
1 1 2 0 
0 1 0.0 
0Garrity, Kevin.. 0 
1 1 0 0 
0 0 0.0 
0Hartley, Chris.. 1 
0 1 0 0 
0 1 0.0 
0Total... 72 143 
215 115 1 98 86 
1.1 78
Opponents... 
72 130 202 131 - 62 
103 0.6 68

 TEAM 
STATISTICS 
PUR OPP 
 
SCORING... 
431 
352 Points per 
game. 
71.8 
58.7 Scoring 
margin.. 
+13.2 
- FIELD GOALS-ATT... 
142-328 134-336 Field 
goal pct.. 
.433 .399 3 
POINT FG-ATT 
36-102 
25-99 3-point FG 
pct.. 
.353 
.253 3-pt FG made per 
game... 
6.0 4.2 
FREE THROWS-ATT... 
111-147 
59-99 Free throw 
pct.. 
.755 .596 
REBOUNDS.. 
215 
202 Rebounds per 
game... 
35.8 
33.7 Rebounding 
margin... 
+2.2 
- 
ASSISTS... 
98 
62 Assists per 
game 
16.3 10.3 
TURNOVERS. 
86 
103 Turnovers per 
game.. 
14.3 
17.2 Turnover 
margin. 
+2.8 
- Assist/turnover 
ratio... 
1.1 0.6 
STEALS 
44 
31 Steals per 
game. 
7.3 5.2 
BLOCKS 
23 
23 Blocks per 
game. 
3.8 3.8 
WINNING 
STREAK 
6 
- Home win 
streak. 
3 
- 
ATTENDANCE 
33118 
23435 Home 
games-Avg/Game. 
3-11039 
0-0 Neutral 
site-Avg/Game... 
- 3-7812

 BY 
PERIOD 1st 2nd 
Total    
 Team 203 228 - 
431 Opponents... 164 188 - 
352


The goal I'm trying to reach is to build a 
method that no matter the table of data sent to it, will find where the columns 
are for the data. It's easy to "see" where the columns are, but my attempt 
to tell a program how to "see" the columns has been embarrising to say the 
least.

The road I was walking down was to take 
each line of a table and look forspaces (skipping dashes and 
pipes)when one is found, look "down" the rest of the table in this current 
columnwith the space. If all the way "down" the table are spaces (or 
a dash or pipe) then there is likely a column boundry at this column 
location. Once the entire table of data has been looked at, where there 
were changes from text to spaces back to text, there is an ending "cell" of data 
and the start of a new "cell". So my logic is this looking at the last 
example table of data:


BY PERIOD 1st 
2nd Total   
Team 203 
228 - 431Opponents 164 
188 - 352

Line one:0-9: text ( at 
col 3 [the space between "by" and "period"]
 
would be counted as text because "down" the table
 
there are no other spaces)10-14: spaces15-17: 
text18-19: spaces20-22: text23-26: 
spaces27-31: text

Line two:0-31: spaces (by the logic that dashes are counted 
like a space)

Line three:0-4 text5-14 
spaces15-17: text18-19: spaces20-22: 
text23-28: spaces29-31: text

Line four:0-9: text10-14: 
spaces15-17: text18-19: spaces20-22: 
text23-28: spaces29-31: text

From this I can tell the program for each line in the table:from 0 to 9 
grab the text, from 15 to 17 grab the text,from 20 to 22 grab the 
text,from 27 to 31 grab the text,

I would end up with (after ignoring line two and stripping leading and 
trailing space)tabletr tdBY 
PERIOD/td td1st/td 
td2nd/td 
tdTotal/td/trtr 
tdTeam/td td203/td 
td228/td 
td431/td/trtr 
tdOpponents/td td164/td 
td188/td 
td352/td/tr/table


I dunno, just tossing this out the list for the hopes for a fresh 
perspective to the problem. Below is some code I'm trying to tell the 
program how to spot spaces down the table.

Thanks in advanced for your time in reading all this.

Joe 
Y.---Code:---


my $text = 
" 
|-OVERALL 
STATISTICS--|TOTALS 
O-REB D-REB TOTAL PF FO A 
TO A/TO Hi Pts