Re: [Perl-unix-users] how to improve text manipulation

eyal edri Fri, 10 Mar 2006 06:50:51 -0800

here is a snippet of the script, doing the part of:

1. reading the file
2. writing each line to it's destination target file.
3. chopping the line to to the 1st max chars (from the conf file)

......
if (!open (LOGFILE, $file)) {
     fatal ("can't open $file\nfile corrupted?\n"); (fatal is just a sub that logs the error & die)
}
flock (LOGFILE, 2); # lock the file while reading to avoid other process writing to it.
@lines = <LOGFILE>; # insert file into array of lines (i think i need to change this - if the file is too big???)
$lineNum = 0;

foreach $line (@lines) { #copy every line to the right datafile (according to the datecode)
   $lineNum++;
    undef ($dateCode);

    next if ($line =~ m/^\s*#/); # skip remark lines
    next if ($line =~ m/^\s*$/); # skip space lines

    $line = checkNumOfChars($line, $dateCodeColumn, $numOfCharsToCopy, $ftype);

     # get the datecode from each line
     $dateCode = substr ($line, $dateCodeColumn, $dateCodeLength);

    if ( $dateCode =~ m/[1-2][0-9][0-9][0-9][0-1][0-9][0-3][0-9][0-2][0-9]$/ ) { # this check
          makes sure that the dateCode is in the correct format
                writeLineToDatafile ($line, $dateCode, $targetDir, $ftype);
     }
     else {
                fatal ("Error on logFile: $file.\ndateCode in line $lineNum is not in the correct
format: $dateCode\n");
            }

}
close LOGFILE;

sub writeLineToDatafile {
    ($line, $dateCode, $targetDir, $ftype) = @_;
    local ($dataFile);
    undef ($dataFile);

    $dataFile = "$ftype"."_"."$dateCode";
    open (DATAFILE, ">>$targetDir$dataFile.datafile") || fatal ("couldn't write to datafile
                                                                                            $dataFile.datafile: $!\n");
    flock (DATAFILE, 2); #lock to file while i'm wrting to it
    print DATAFILE "$line";
    flock (DATAFILE, 8); #unlock the file
    close DATAFILE;

}

sub checkNumOfChars {
    ($line, $dateCodeColumn, $numOfCharsToCopy, $ftype) = @_;
    local ($choppedLine);
    return ($line) if ( $numOfCharsToCopy == -1 );   # num of chars not defined in the conf file,
   (read entire line instead)
    $choppedLine = substr ($line, 0, $numOfCharsToCopy );
    return ( "$choppedLine"."\n" );
}

I hope this helps...

On 3/10/06, $Bill Luebkert < [EMAIL PROTECTED]> wrote:

eyal edri wrote:

> what i did and it's not good enough:
>
> 1. i've implemented this using ordinary "foreach $line (@lines)" file
> processing, which is very slow when handeling large files.
>     and then it's quite easy. (using substr and such...).
>
> what i think needs to be done (and here i need your advice):
>
> 1. Extract the datecodes (there's more than one) from the logfile
> (without having to read every line.. )

Not possible.  You can't extract without reading.

> 2. 'grep' all lines matching a certain datecode and dump it to the
> target file (with the same datecode)
>     (while checking if the length of each lines should be
> chopped/limited to XX CHARS.)

You seem to be doing that already (line by line).

> I know that using ''awk''/''sed''/"grep" suppose to produce better results.

Probably true.

> Also using regular expressions will probably improve performance.

No - that should slow you down even more.

Try showing us the code you have and see if it can be sped up, but it
sounds like you were on the right track to start with.

--
Eyal Edri | System & Security Engineer | [EMAIL PROTECTED] Communication.

_______________________________________________
Perl-Unix-Users mailing list
Perl-Unix-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Re: [Perl-unix-users] how to improve text manipulation

Reply via email to