Re: Make this into a script to parse?

Jeff 'japhy' Pinyan Wed, 04 Feb 2004 20:01:35 -0800

On Feb 4, Lone Wolf said:

>I'm back to dealing with the main issue of a badly formatted file being
>brought down from an archaic system and needing to be cleaned up before
>being passed to another user or a database table.  I have the code
>below, which pulls the whole file in and parse it line by line.  That
>problem is still that when the stuff is done parsing the file, the file
>still has a ton of white spaces left in it.


>        open (OLDFILE, "< $file");
>        open (NEWFILE, "> $newfile");
>        while ($line = <OLDFILE>)  {
>               $line =~ s/^ //mg;
>               $line =~ s/ $//mg;
>               $line =~ s/\t/|/mg;
>               $line =~ s/\s+/ /mg;
>               $line =~ s/^\s*//mg;
>               $line =~ s/\s*$//mg;
>               $line =~ s/\s*$//mg;

These regexes (above and below) have NO need for the /m modifier, and only
a few of them have any need for the /g modifier.

  $line =~ s/^\s+//;  # remove leading spaces
  $line =~ s/\s+$/;   # remove trailing spaces
  $line =~ tr/\t/|/;  # change all \t's to |'s
  $line =~ tr/ //s;   # squash multiple spaces on one space

Those four lines (two regexes, two transliterations) do what the seven
lines above them do.

>               $line =~ s/(?<=\d)"/in. /mg;
>               $line =~ s/(?<=\d)'/ft. /mg;

Still don't need the /m modifier.

>               $line =~ s/^\s+//mg;
>               $line =~ s/\s+$//mg;

The first one is totally useless, and the second is only needed because
it's possible $line now ends in "in. ", which means the trailing space
should be removed.  The solution, then, is to do the two \d regexes FIRST,
and THEN do the other regexes.

>#              $line =~ s/\s*\|\s*//mg;
>###            $line =~ s/ |/|/mg;
>###            $line =~ s/| /|/mg;

Are those not needed, or commented out because they're not working
properly?

>                print NEWFILE "$line\n";
>        }
>        close OLDFILE;
>        close NEWFILE;
>
>  print "$newfile has now been created\n";
>}

>sub MySQL_id_data {
>  $database_file = "info/salesa1";
>  open(INF,$database_file) or dienice("Can't open $database_file: $!\n");
>  @grok = <INF>;
>  close(INF);

There's no reason to slurp a file into an array.  Just loop over the lines
of the file like you have with the while loop above.

>  $file1 = "info/salesa1-data";
>  open (FILE, ">$file1") || die "Can't write to $file1 : error $!\n";
>  $inv = 1;
>
>  foreach $i (@grok) {
>   chomp($i);
>
>($item_num,$item_desc,$b1,$b2,$b3,$b4,$cc,$vn,$qoh,$qc,$qor,$bc,$sc,$yp)
>= split(/\|/,$i);
>   print FILE
>"$inv|$item_num|$item_desc|$b1|$b2|$b3|$b4|$cc|$vn|$qoh|$qc|$qor|$bc|$it
>em_num|$sc|$yp\n";
>   $inv++;
> }

Oh good God.  Do you know what that for loop is DOING?

  for each element in @grok:
    remove the newline
    split it on pipes into some variables
    print $inv, those variables with pipes in between, and add a newline

That is terribly insane.

> close FILE;
>}

Here's my rewrite:

  sub MySQL_id_data {
    my $db_file = "info/salesa1";
    my $info_file = "$db_file-data";

    open DB, "< $db_file" or dienice("can't open $db_file: $!");
    open INFO, "> $info_file" or dience("can't write $info_file: $!");
    print INFO "$.|$_" while <DB>;
    close INFO;
    close DB;
  }

-- 
Jeff "japhy" Pinyan      [EMAIL PROTECTED]      http://www.pobox.com/~japhy/
RPI Acacia brother #734   http://www.perlmonks.org/   http://www.cpan.org/
<stu> what does y/// stand for?  <tenderpuss> why, yansliterate of course.
[  I'm looking for programming work.  If you like my work, let me know.  ]


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

Re: Make this into a script to parse?

Reply via email to