John,

Thanks for your help!  Chomp and replacing $_ with $site did it!

Below is the log if you think I can do it without so many splits that would be great!

2007.1.12 8:06:43 - 70.22.222.29 http://www.12gpatt.com/Nightlife.html *DENIED* Weighted phrase limit of 100 : 341 (live sex...cam+liveshow) GET 143438

Ryan
----- Original Message ----- From: "John W. Krahn" <[EMAIL PROTECTED]>
To: "Perl Beginners" <beginners@perl.org>
Sent: Thursday, January 11, 2007 3:55 PM
Subject: Re: printing to a file if line is not there


FamiLink Admin wrote:
Hello all.

Hello,

I am trying to read a log file then look for a website and score.  If
that website has a score >100, take it and add it to a "Check me" list.

It would be easier to help if we could see examples of valid and invalid log
file entries.

1. I don't want to "recheck" so I have a list of checked sites that I
   want to verify with.
2. I don't want duplicates added to the "check me" list.

Below is what I have.  I have 1 above working but I cannot get the
duplicates from printing to the list.  The sub testfordup at the bottom
will print the proper information but if it is in the log again it will
print it again.

Also, is the a better way of not printing a line if the line exists in
the file?

I am also getting:
Use of uninitialized value in hash element at line 60.
and
Use of uninitialized value in regexp compilation at line 44,
<AB_EX_FILE> line 14.

I think this is because the text is www.site.com an the "." is not being
read as text in the file.

Any help would be greatly appreciated.


#!/usr/bin/perl -w
use strict;
my $ab_file = 'autobannedsitelist1';
my $ab_ex_file = 'autobannedsitelistexceptlst';
my $log_file = 'access.log';

open ( LOG_FILE, "-|", "tail -n 100000 $log_file" ) or die "No log file
exists.\n $! \n";
{
# Begin parsing of log data
my $pass = 0;
my $dup = 0;
foreach my $logfile_line (<LOG_FILE>) {

You are using a foreach loop which means that 100,000 lines of the log file are stored in memory. It would be more efficient to use a while loop which
will only store a single line at a time in memory.


   $logfile_line =~ s/GET//;
   if ($logfile_line =~ m/Weighted/ ){
       my @logfile_fields = split /\s+/, $logfile_line, 6;
       my $site = (split /\//, ($logfile_fields[4]))[2]; # get the
domain name
       my $sitemain = (split '\.', $site)[-2]; # get the site name
without .com
       my $sitemain2 = (split '\.', $site)[-3]; # get the site name
without .co.ku
       my $points = (split ' ',(split ':',
(substr($logfile_fields[5],35,10)))[1])[0];

You are using split() a lot. It may be more efficient to use a single regular
expression but it is hard to tell without seeing actual data.


       if ($points > 100){
           $pass = &abextest($sitemain,$sitemain2);

You shouldn't use the '&' sigil with subroutines unless you really have to.

perldoc perlsub


               if ( $pass eq 0 ){

You are using a string comparison operator on a number which means that perl
has to convert the number to a string:

               if ( $pass == 0 ){


                   &testfordup($site);
               }
           }
       }
   }
close (LOG_FILE);
}

sub abextest {
    my ($sitemain,$sitemain2)[EMAIL PROTECTED];
    my $pass = 0;
    open ( AB_EX_FILE, "<", $ab_ex_file ) or die "Can't write
AUTOBANNED_EX_FILE: $!";
    foreach my $line (<AB_EX_FILE>){

You are using a foreach loop which means that all of the file
'autobannedsitelistexceptlst' is stored in memory. It would be more efficient to use a while loop which will only store a single line at a time in memory.


        if (($line =~ m/$sitemain/i ) || ( $line =~
m/$sitemain2/i ))  {

It looks like this *may* be line 44?  If so then either $sitemain or
$sitemain2 is undefined.


            $pass = 1;
        }
    }
    close (AB_EX_FILE);
return $pass;
}

sub testfordup {
    my %seen;
    open AB_FILE, "<  $ab_file" or die "Can't read AUTOBANNED_FILE: $!";
    while (<AB_FILE>) { $seen{$_} = 1 }  # build the hash of seen lines
    close AB_FILE;

    my ($site) = @_;
open AB_FILE, ">> $ab_file" or die "Can't append AUTOBANNED_FILE: $!";
    print AB_FILE "$site\n" if not ($seen{$_}++) ;

It looks like this *may* be line 60?  If so then you have not explicitly
stored a value in $_ so it is probably undefined. Perhaps you meant to use $seen{$site} instead? But that probably wouldn't work either as you didn't
chomp() the data before you added it to %seen.


    close (AB_FILE);
return;
}


John
--
Perl isn't a toolbox, but a small machine shop where you can special-order
certain sorts of tools at low cost and in short order.       -- Larry Wall

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/


Reply via email to