I wrote a small script that uses message ID's as unique values and
extracts recipient address info. The goal is to count 1019 events per
message ID. It also gets the sum of recipients per message ID. The
script works fine but when it runs against a very large file (2GB+) I
receive an out of memory error. 

Is there a more efficient way of handling the hash portion that is less
memory intense and preferably faster?

--Paul



# Tracking log parser


use strict;

my $recips;
my %event_id;
my $counter;
my $total_recips;
my $count;


# Get log file

die "You must enter a tracking log. \n" if $#ARGV <0;
my $logfile = shift;

open (LOGFILE, $logfile) || die "Unable to open $logfile because\n
$!\n";

        foreach (<LOGFILE>)     {
                        
        next if /^#/;   #skip any comment lines that contain the pound
sign.                   
        my @fields = split (/\t/, $_); #split the line by tabs
        
       $recips = $fields[13]; # Number or recipients column
        my $message_id = $fields[9]; # message ID
        
         if ($fields[8] == "1019")      {
                
                $event_id{$message_id}++ unless exists
$event_id{$message_id};
                $counter++;
                $total_recips = ($total_recips + $recips);
                }
        
        
close LOGFILE;  

}
        

print "\n\nTotal instances of 1019 events in \"$logfile\" is
$counter.\n\n"; 

print "\nTotal single instances of 1019 event per message ID is ";

#print keys %event_id;

foreach my $key (keys (%event_id))      {
        $count ++;
}

print $count;

print "\n\nTotal # of recipients per message ID is ";
print $total_recips;




Reply via email to