Re: Make this into a script to parse?

2004-02-05 Thread R. Joseph Newton
John McKown wrote:


 my ($item_num,$a,$b) = $i =~ /(.*?|)((?:.*?|){11})(.*)/;
 print LINE $inv|$item_num|$a|$item_num|$b\n;

 I think that I have that right. Well, assuming that the original is
 correct.

No John,

If you are using $a and $b as variables in any context other than the sort
built-in function, then you do not have it right.  Choose meaningful variable
names.

Joseph


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




Re: Make this into a script to parse?

2004-02-04 Thread wolf blaum
For Quality purpouses, Lone Wolf 's mail on Thursday 05 February 2004 00:52 
may have been monitored or recorded as:
 I'm back to dealing with the main issue of a badly formatted file being
 brought down from an archaic system and needing to be cleaned up before
 being passed to another user or a database table.  I have the code

I assume by saying you are back that you are talking ofyour thread from 12/17: 
get rid of whitesace around pipes??.

 below, which pulls the whole file in and parse it line by line.  That
 problem is still that when the stuff is done parsing the file, the file
 still has a ton of white spaces left in it.

did you try something like
my @fields = split /\s*\|\s*/, $line; 
as suggested by James, Jeff and Randy?
Why didnt it work - the problem looks still pretty much the same, does it?

 What I would like to do is when I first open the file (another piece of
 this massive script) is tell it to just run a sub program on each piece
 that does the same thing as the stuff below, unfortunately I am not sure
 of the way to do this.

Frankly, after a while of looking at your code Im still not sure what you want 
do - that might be due to my ignorance, but you would really help me (and I 
guess others too) understand, if you could post  some sample data before they 
go into your program and a line of how you expect thme to look like after 
they were processed by your code - I guess that would make it easier to 
figure out, where what goes how (or so).

Wolf


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




Re: Make this into a script to parse?

2004-02-04 Thread Jeff 'japhy' Pinyan
On Feb 4, Lone Wolf said:

I'm back to dealing with the main issue of a badly formatted file being
brought down from an archaic system and needing to be cleaned up before
being passed to another user or a database table.  I have the code
below, which pulls the whole file in and parse it line by line.  That
problem is still that when the stuff is done parsing the file, the file
still has a ton of white spaces left in it.

open (OLDFILE,  $file);
open (NEWFILE,  $newfile);
while ($line = OLDFILE)  {
   $line =~ s/^ //mg;
   $line =~ s/ $//mg;
   $line =~ s/\t/|/mg;
   $line =~ s/\s+/ /mg;
   $line =~ s/^\s*//mg;
   $line =~ s/\s*$//mg;
   $line =~ s/\s*$//mg;

These regexes (above and below) have NO need for the /m modifier, and only
a few of them have any need for the /g modifier.

  $line =~ s/^\s+//;  # remove leading spaces
  $line =~ s/\s+$/;   # remove trailing spaces
  $line =~ tr/\t/|/;  # change all \t's to |'s
  $line =~ tr/ //s;   # squash multiple spaces on one space

Those four lines (two regexes, two transliterations) do what the seven
lines above them do.

   $line =~ s/(?=\d)/in. /mg;
   $line =~ s/(?=\d)'/ft. /mg;

Still don't need the /m modifier.

   $line =~ s/^\s+//mg;
   $line =~ s/\s+$//mg;

The first one is totally useless, and the second is only needed because
it's possible $line now ends in in. , which means the trailing space
should be removed.  The solution, then, is to do the two \d regexes FIRST,
and THEN do the other regexes.

#  $line =~ s/\s*\|\s*//mg;
###$line =~ s/ |/|/mg;
###$line =~ s/| /|/mg;

Are those not needed, or commented out because they're not working
properly?

print NEWFILE $line\n;
}
close OLDFILE;
close NEWFILE;

  print $newfile has now been created\n;
}

sub MySQL_id_data {
  $database_file = info/salesa1;
  open(INF,$database_file) or dienice(Can't open $database_file: $!\n);
  @grok = INF;
  close(INF);

There's no reason to slurp a file into an array.  Just loop over the lines
of the file like you have with the while loop above.

  $file1 = info/salesa1-data;
  open (FILE, $file1) || die Can't write to $file1 : error $!\n;
  $inv = 1;

  foreach $i (@grok) {
   chomp($i);

($item_num,$item_desc,$b1,$b2,$b3,$b4,$cc,$vn,$qoh,$qc,$qor,$bc,$sc,$yp)
= split(/\|/,$i);
   print FILE
$inv|$item_num|$item_desc|$b1|$b2|$b3|$b4|$cc|$vn|$qoh|$qc|$qor|$bc|$it
em_num|$sc|$yp\n;
   $inv++;
 }

Oh good God.  Do you know what that for loop is DOING?

  for each element in @grok:
remove the newline
split it on pipes into some variables
print $inv, those variables with pipes in between, and add a newline

That is terribly insane.

 close FILE;
}

Here's my rewrite:

  sub MySQL_id_data {
my $db_file = info/salesa1;
my $info_file = $db_file-data;

open DB,  $db_file or dienice(can't open $db_file: $!);
open INFO,  $info_file or dience(can't write $info_file: $!);
print INFO $.|$_ while DB;
close INFO;
close DB;
  }

-- 
Jeff japhy Pinyan  [EMAIL PROTECTED]  http://www.pobox.com/~japhy/
RPI Acacia brother #734   http://www.perlmonks.org/   http://www.cpan.org/
stu what does y/// stand for?  tenderpuss why, yansliterate of course.
[  I'm looking for programming work.  If you like my work, let me know.  ]


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




RE: Make this into a script to parse?

2004-02-04 Thread Lone Wolf
I tried the my @fields and I did not get it to work, probably because my
coding skills have not improved enough lately to be worthy of perl.
Thank goodness I never said I had perfect code, because I would
definitely be lying.

I attached 2 files, one the beginning data, the other the .sql file that
I load into MySQL database.  The files are about 3000 lines before and
after so I cut out the first 30 lines and put them in the files to the
list.

What I need to figure out is how to make a sub call that when I pull in
the file will remove all extraneous white space.  Something I can copy
into another Perl program to parse another set of files (ARGH!).  I've
learned not to tell the bosses I can write a script to handle the errors
of the salesmen.  I currently use a back piece of PHP coding to handle
the extra spaces in the pages that use the data, but for another project
I can't use that work-around.

I know I can do something along the lines of:
(from an HTML generating page with a sort)

 foreach $i (sort ByName @grok)
 {
  chomp($i);
  ($type,$description,$parts,$numb) = split(/\|/,$i);
 print INFO2;
 
trtd$type/tdtd$description/tdtd$parts/tdtd$numb/td/tr

INFO2
 }

The sub program:
sub ByName {
@a = split(/\|/,$a);
@b = split(/\|/,$b);
$a[1] cmp $b[1];
}

But I am still not sure how to make the $i go through, and it is
probably something simple I am missing.

Thanks!!
Robert
1|AA-1202|12in. X10.75 FOIL SHEETS 
12/200|70.96|46.40|45.24|44.13|246|3|55.000|.000|.000|A|AA-1202
2|AA-1205|12in. x10in. FOIL POPUP SHEETS 
6/500|96.61|63.17|61.59|60.09|246|3|19.000|.000|.000|B|AA-1205
3|AA-1215RO|12in. X1000ft.  ROYALE FOIL STD 
1|25.16|15.84|15.46|15.09|245|3|56.000|5.000|.000|B|AA-1215RO
4|AA-1217SE|12in. X1000 ALUMINUM FOIL STD 
1|30.18|19.73|19.24|18.77|245|3|36.000|.000|.000|B|AA-1217SE
5|AA-1251|12in. X500 ALUMINUM FOIL (HVY) 
EA|26.25|17.17|16.74|16.33|245|3|34.000|.000|.000|B|AA-1251
6|AA-1255RO|12in. X500ft. ROYALE FOIL STD 
1/CS|15.96|10.05|9.81|9.58|245|3|30.000|.000|.000|C|AA-1255RO
7|AA-1817SE|18in. X1000 STD.DUTY FOIL-RL 
1|42.82|28.00|27.30|26.63|245|3|17.000|.000|.000|C|AA-1817SE
8|AA-1825|18in. X25ft.  HEAVY DUTY-RL 
12|28.29|27.27|26.79|26.17|245|3|6.000|.000|.000|C|AA-1825
9|AA-1851SE|18in. X500 ALUMINUM FOIL (HVY) 
1|32.67|21.36|20.83|20.32|245|3|116.000|4.000|.000|A|AA-1851SE
10|AA-1857SE|18in. X500in.  ALUMINUM FOIL STD 
1|24.41|15.96|15.56|15.18|245|3|67.000|.000|.000|B|AA-1857SE
11|AA-455-44|1/2 AL.STEAM T/PAN-MEDIUM 
100|40.59|27.24|26.54|25.88|212|3|22.000|.000|.000|B|AA-455-44
12|AA-456-44|1/2 AL.STEAM T/PAN-DEEP 
100|34.16|24.54|23.86|23.38|212|3|97.000|1.000|.000|A|AA-456-44
13|AA-457-70|FULL AL.STEAM T/PAN-MEDIUM 
50|53.57|36.92|35.95|35.03|212|3|9.000|.000|.000|B|AA-457-70
14|AA-458-40|1/2 AL.STEAM T/PAN-SHALLOW 
100|34.49|22.99|22.41|21.85|212|3|11.000|.000|.000|C|AA-458-40
15|AA-459-70|FULL AL.STEAM T/PAN-SHALLOW 
50|53.02|36.30|35.35|34.45|212|3|8.000|1.000|.000|C|AA-459-70
16|AA-460-70|FULL AL.STEAM T/PAN-DEEP 
50|44.49|30.25|29.47|28.72|212|3|116.000|2.000|.000|A|AA-460-70
17|AA-516430WLR|***9ft. RND CONT W/FOIL BD LD 
250|57.80|49.13|45.35|42.11|77|3|8.000|.000|.000|D|AA-516430WLR
18|AA-552-40|ASHTRAY ROUND SILVER 
1000|77.65|50.77|49.50|48.29|2|3|15.000|.000|.000|B|AA-552-40
19|AA-554-40|FULL AL.STEAM T/PAN-FOIL 
COV50|27.45|17.95|17.50|17.07|212|3|42.000|2.000|.000|A|AA-554-40
20|AA-555-30|1/2 AL.STEAM T/PAN-FOIL 
COV100|14.44|13.75|13.42|12.94|212|3|23.000|1.000|.000|B|AA-555-30
21|AA-688-64A|19in.  
AL.ROAST.PAN-OVAL-GIANTG50|51.24|51.24|51.24|51.24|212|3|15.000|.000|.000|C|AA-688-64A
22|AA-9102|9X10.75 FOIL SHEETS 
12/200|51.71|33.81|32.96|32.16|246|3|17.000|.000|.000|B|AA-9102
23|AA-9105|9X10.75 FOIL SHEETS 
6/500|59.47|38.88|37.91|36.99|246|3|94.000|5.000|.000|A|AA-9105
24|AA-A12DL|12in.  DOME CLEAR LID-A13A16 
25|13.73|9.72|9.46|9.21|41|3|3.000|.000|.000|C|AA-A12DL
25|AA-A12FT|12in.  ALUMINUM FLAT TRAY 
25|15.51|10.99|10.69|10.41|41|3|5.000|.000|.000|C|AA-A12FT
26|AA-A12LS|12in.  ALUMINUM 5-CMPT.TRAY 
25|15.51|10.99|10.69|10.41|41|3|15.000|.000|.000|C|AA-A12LS
27|AA-A16DL|16in.  DOME CLEAR LID 
25|20.76|14.71|14.31|13.93|41|3|11.000|1.000|.000|B|AA-A16DL
28|AA-A16FT|16in.  ALUMINUM FLAT TRAY 
25|25.69|18.19|17.70|17.24|41|3|10.000|1.000|.000|B|AA-A16FT
29|AA-A16LS|16in.  ALUMINUM 5-CMPT.TRAY 
25|25.69|18.19|17.70|17.24|41|3|8.000|.000|.000|C|AA-A16LS
30|AA-A18DL|18in.  DOME CLEAR LID 
25|29.84|21.14|20.57|20.03|41|3|7.000|1.000|.000|C|AA-A18DL
AA-1202 |12X10.75 FOIL SHEETS   12/200| 70.96 | 46.40 | 45.24 | 
44.13 |UF|ALCAN |55.000 |  .000 |  .000 |A 
 AA-1205 |12x10FOIL POPUP SHEETS 6/500| 96.61 | 63.17 | 
61.59 | 60.09 |UF|ALCAN |19.000 |  .000 |  .000 |B 
 AA-1215RO   |12X1000' ROYALE FOIL STD1| 25.16 | 15.84 | 
15.46 | 15.09 |U5|ALCAN |56.000 | 5.000 |  .000 |B 
 AA-1217SE   |12X1000 ALUMINUM FOIL STD   1| 30.18 | 19.73 | 

Re: Make this into a script to parse?

2004-02-04 Thread John McKown
On Wed, 4 Feb 2004, Jeff 'japhy' Pinyan wrote:

snip
 
   foreach $i (@grok) {
chomp($i);
 
 ($item_num,$item_desc,$b1,$b2,$b3,$b4,$cc,$vn,$qoh,$qc,$qor,$bc,$sc,$yp)
 = split(/\|/,$i);
print FILE
 $inv|$item_num|$item_desc|$b1|$b2|$b3|$b4|$cc|$vn|$qoh|$qc|$qor|$bc|$it
 em_num|$sc|$yp\n;
$inv++;
  }
 
 Oh good God.  Do you know what that for loop is DOING?
 
   for each element in @grok:
 remove the newline
 split it on pipes into some variables
 print $inv, those variables with pipes in between, and add a newline
 
 That is terribly insane.

Jeff, The input and output lines are not identical. The output line
prefixes $inv at the front and inserts $item_num between $bc and $sc. I
don't know why $item_num is repeated. Granted that I think a more
efficient construct might be:

my ($item_num,$a,$b) = $i =~ /(.*?|)((?:.*?|){11})(.*)/;
print LINE $inv|$item_num|$a|$item_num|$b\n;

I think that I have that right. Well, assuming that the original is 
correct.


--
Maranatha!
John McKown


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




Re: Make this into a script to parse?

2004-02-04 Thread Jeff 'japhy' Pinyan
On Feb 4, John McKown said:

On Wed, 4 Feb 2004, Jeff 'japhy' Pinyan wrote:

   foreach $i (@grok) {
chomp($i);
 
 ($item_num,$item_desc,$b1,$b2,$b3,$b4,$cc,$vn,$qoh,$qc,$qor,$bc,$sc,$yp)
 = split(/\|/,$i);
print FILE
 $inv|$item_num|$item_desc|$b1|$b2|$b3|$b4|$cc|$vn|$qoh|$qc|$qor|$bc|$it
 em_num|$sc|$yp\n;
$inv++;
  }

 Oh good God.  Do you know what that for loop is DOING?
 That is terribly insane.

Jeff, The input and output lines are not identical. The output line
prefixes $inv at the front and inserts $item_num between $bc and $sc. I
don't know why $item_num is repeated. Granted that I think a more
efficient construct might be:

Bah, I missed that.  Then I'd use split(), but just use an array.

  while (IN) {
local $ = |;
my @fields = split /\|/;
print OUT $.|@fields[0..11,0,12..13];
  }

But this begs the question, WHY does item_num have to be used TWICE in the
SAME line of data.  This smells of poor coding on the other side.  It's
still ugly.

-- 
Jeff japhy Pinyan  [EMAIL PROTECTED]  http://www.pobox.com/~japhy/
RPI Acacia brother #734   http://www.perlmonks.org/   http://www.cpan.org/
stu what does y/// stand for?  tenderpuss why, yansliterate of course.
[  I'm looking for programming work.  If you like my work, let me know.  ]


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




Re: Make this into a script to parse?

2004-02-04 Thread wolf blaum
For Quality purpouses, Lone Wolf 's mail on Thursday 05 February 2004 04:23 
may have been monitored or recorded as:

Hi
 
 Thank goodness I never said I had perfect code, because I would
 definitely be lying.

no worries - I post code to get feedback. Thats the whole ideaof learning it.

 I attached 2 files, one the beginning data, the other the .sql file that
 I load into MySQL database.  The files are about 3000 lines before and
 after so I cut out the first 30 lines and put them in the files to the
 list.

Ok - then, again: Do not read these files into mem at once unless you really 
have to (which should be close to never).

here is a script that uses your given data:

---snip---
#!/usr/bin/perl

use strict;
use warnings;
my (@fields, $lng);

opendir INDIR , ./sql or die Can't open dir with before files:$!;

foreach my $infile (grep {!/^\./} readdir INDIR) {
#read all the files in your home/sql dir
#read only files that do not start with a .
  my ($i,$rec);

  open INFILE, ./sql//$infile or die Can't open $infile: $!;
  open OUTFILE, ./${infile}.out or die Can't open ${infile}.out at home: 
$!;
  while (INFILE) {
   $rec++;
   chomp;
   @fields = split /\s*\|\s*/, $_;
   $fields[0] =~ s/^\s+//; 
   #there is probably a way to get rid of the trailing spaces in the first 
entry using split,I just couldnt think of any.

   $lng = @fields unless $lng; #set $lng for first record
   print The following record: $i has , scalar @fields,  fields as compared 
to $lng fields in the first record! Skip. : $_\n and next unless $lng == 
@fields;
#poor quality control of your input data: check if all reords have the same 
number of fields or skip and print record otherwise.
   $i++;
   print OUTFILE $i;
   print OUTFILE |$_ foreach (@fields);
   print OUTFILE |$fields[0]\n; #your trailing ID
  }
  close INFILE;
  close OUTFILE;
  print Read $rec records from ./sql/$infile and printed $i into ./
${infile}.out\n;
}
closedir INDIR;
---snap---

A couple of hints:

The script reads all files in the sql subdir of your home dir and produces the 
corrosponding filname.out in your homedir.

the split splits as written by Jeff et al.
I coulndt think of a better way to substtute the leading spaces for the first 
field.
Anyone better suggestions?

you end up with a final \n in each outfile.

You rewrite it into a sub by substititing the line
foreach my $infile (grep {!/^\./} readdir INDIR) {
with

sub whatever{
...
foreach my $infile (@_) {

and call th sub with
whatever (file1, file2, ...);

of course you may want to change the open statements to, if you dont have your 
infiles in ./sql

Hope that gets you started, Wolf







-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




Re: Make this into a script to parse?

2004-02-04 Thread wolf blaum
For Quality purpouses, wolf blaum 's mail on Thursday 05 February 2004 06:07 
may have been monitored or recorded as:

 The script reads all files in the sql subdir of your home dir and produces
 the corrosponding filname.out in your homedir.

shame on me: of course it reads all the files in the sub dir sql of the 
CURRENT DIR, not the home dir. use ~/ if you want your homedir...

Well, if been here a while...

Something else i forgot: why do you need the count on the beginning of the 
line? I hope not as a unique (primary) key for the dbtable you feed that 
into.There should be an AUTO_INCREMENT in your DB for that.
And talking about DBs: 
According to te 3rd rule of Normalisation as outlined by e.f.codd of ibm in 
the 1970s: (to that i was arround at this time...)

An Entity is said to be in 3rd normal form if it is allready in 2nd normal 
form and no nonidentifying attributs are dependent on any other 
nonidentifying attributs.

The repeat of a value like $fields[0] clearly violates this rule.
See www.databasejournal.com/sqletc/article.php/1428511
on Db Design.

Good night, wolf


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response