Samuel Brown <[EMAIL PROTECTED]> writes:

> Hi ya,
> 
> I'm teaching myself Perl and I have a log file around 1GB that I need to sort 
> by month | date | time | byte size. So far I have parse the log for "bytes" 
> since this is all that I need but I can't get it to sort like I want.

Well...

First, I hope I'm understanding you correctly - you have this file, in
'some order', which you need to sort by month, then by date, then
time, then bytce size.  If that's not right, please enlighten me.  If
I understand your problem, however, read on - 

I've got good news, bad news, bad news, and bad news... :-)

Good news: below is an algorithm that works for your data.

Bad news: it's not going to work for you, because it's going to curl
up and die when it tries to load a GB file into RAM, copy it, sort it,
and spit out another copy.

Bad news: some of the regular readers of this list may remember my
little rant about how important it is to write easy-to-understand,
idiomatic code.  Forget I said all that...this way is much more fun.

Bad news: with the function call in the sort loop, it's gonna be dog
slow, even with smaller sets of data.

What you probably need to do is write some custom sorting code to sort
the file (or a copy, more likely) in place, without loading the whole
thing in memory.  There aren't many machines who'll be able to load
the whole file into RAM, and even fewer which would have enough left
over to make 1 1/2 copies of the data.  The memory efficiency could
certainly be improved, but it's still likely to kill the system in
question, so another solution is needed.  Unfortunately, writing that
is much more work than I put into the following example, but perhaps
it will give you some ideas.

Code follows, comments inline:

'
#!/usr/bin/perl -w

use strict;

#always enable warnings and use strict.

my $hostname = `hostname`;
my @a = localtime(time);
my $today = sprintf("%04d/%02d/%02d",1900+$a[5], $a[4]+1, $a[3]);
print "Date: $today\n";
#Search current log file for "byte"
my $log = "/var/log.txt";
print "Byte Statistics for HOST: $hostname\n";

my @data;

open (FILE, "< $log")
        or die "Couldn't open $log!";

while (my $line = <FILE>) {
    if ($line =~ /byte/i) { push @data, $line }
}

close (FILE);

#the above was pretty much your original script, with slight modifications

print @data;  #print it out - you probably don't want to do this yet.

my %month_map = (Jan => 0, Feb => 1, Mar => 2, Apr => 3, May => 4, Jun
=> 5, Jul => 6, Aug => 7, Sep => 8, Oct => 9, Nov => 10, Dec => 11);

#a hash - $month_map{Oct} == 9, etc.

#This is where it gets fun - a Schwartzian transform, with conditional
#sorting...

my @ordered = 
  map { $_->[0] }
  sort { $month_map{$a->[1]->[0]} <=> $month_map{$b->[1]->[0]}
         || $a->[1]->[1] <=> $b->[1]->[1]
         || tm_conv($a->[1]->[2]) <=> tm_conv($b->[1]->[2])
         || $a->[1]->[3] <=> $b->[1]->[3] }
  map { [ $_, [(split /\s+/)[0,1,2,10]] ] } @data;

#explanation of the last few lines, starting from the last and moving up:

#'map { [ $_, [(split /\s+/)[0,1,2,10]] ] } @data;'
#Take @data, map it into an array of arrayrefs.
# the first element in each arrayref is the original line.
# the second element is an arrayref containing fields 0,1,2, and 10 of
# the original data, split by spaces.

#'sort { $month_map ... $b->[1]->[3] }'
#Now, we take this array, and sort it by the values contained in the
# array referred to by the second element of each 'outer' arrayref.
# First by month, using our map; then the day; then the timestamp,
# with the ':'s stripped out by the tm_conv function; and finally by
# the bytes.

#'map { $_->[0] }'
#Now that we've sorted the arrayrefs by the values in the second
#element, we take the first element (the original line), and throw the
#rest away.

#Fun!

print "\n\n";
print @ordered;

#The tm_conv function just strips out the ':'s - I'm assuming the
#timestamps are in '24-hour' format.  Calling this function in the
#middle of the sort like this is absolutely terrible; it will slow
#down the sort tremendously.  I should be shot for writing this code.
sub tm_conv {
  my $tm = shift;

  $tm =~ tr/://d;

  return $tm;
}
'

-RN (Who should never be allowed near a keyboard again)

-- 

Robin Norwood
Red Hat, Inc.

"The Sage does nothing, yet nothing remains undone."
-Lao Tzu, Te Tao Ching

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to