I've found this script on another message board that is close, but still
doesn't work with my data.  Any ideas on modifications?  I think my biggest
problem is the regex in the split function, because what this does is match
ONLY against the first column in the line, when I need it to match anything
in the fourth column.  Thanks for your help, and I'll see what I can do
about allowing playboy.com (although since I work at a public school
district, it might not be a good idea!)   The script follows:

#!/bin/perl -w
use strict;

my %domains;
open FILE1, '< domains.txt or
    die $!;
while(<FILE1>)
{
    chomp;
    $domains{$_}=1;
}
close FILE1;

open OUT, '>> access.out' or
    die $!;
open FILE2, '< access.log' or
    die $!;
while(my $line=<FILE2>)
{
    my($num);
    ($num, undef)=split /\s+/,$line, 2;
    if(defined $domains{$num})
    {
        print OUT $line;
    }
    else
    {
    print OUT "$line not found";
    }

}
close FILE2;
close OUT;


"Wiggins D'Anconia" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
> Mike M wrote:
> > Hi,
> >
> > I'm new to Perl and have what I hope is a simple question:  I have a
Perl
> > script that parses a log file from our proxy server and reformats it to
a
> > more easily readable space-delimited text file.  I also have another
file
> > that has a categorized list of internet domains, also space-delimited.
A
> > snippet of both text files is below:
> >
> > Proxy Log
> > ----snip----
> > 10/23/2003 4:18:32 192.168.0.100 http://www.squid-cache.org OK
> > 10/23/2003 4:18:33 192.168.1.150 http://msn.com OK
> > 10/23/2003 4:18:33 192.168.1.150 http://www.playboy.com DENIED
> > ----snip----
> >
> > Categorized Domains List
> > ----snip----
> > msn.com news
> > playboy.com porn
> > squid-cache.com software
> > ----snip----
> >
> > What I would like to do is write a script that compares the URL in the
proxy
> > log with the categorized domains list file and creates a new file that
looks
> > something like this:
> >
> > New File
> > ----snip----
> > 10/23/2003 4:18:32 192.168.0.100 http://www.squid-cache.org software OK
> > 10/23/2003 4:18:33 192.168.1.150 http://msn.com news OK
> > 10/23/2003 4:18:33 192.168.1.150 http://www.playboy.com porn DENIED
> > ----snip----
> >
> > Is this possible with Perl??  I've been trying to do this by importing
the
> > log files into SQL and then running queries, but it's so much slower
than
> > Perl (the proxy logs are roughly 1 million lines).  Any ideas?
> >
>
> What have you tried, where have you failed?  Just about anything is
> possible with Perl, and there are hints that will make this more
> bearable, but this isn't a one stop shopping place, so give it a try
> yourself first....
>
> You seem to have a good grasp of what is needed, break it down into
> parts and see what you come up with...
>
> 1. We need a list of domains to match against and what category they are
in,
> 2. We need a line from the log to get the domain,
> 3. We need to look up into the list of domains to see if the domain is
> there,
> 4. If it is we need to add the category to the end of the domain,
> 5. We need a place to store the information back to.
>
> So you need at the very least:
>
> perldoc -f open
> perldoc -f print
>
> And you are probably going to want,
>
> perldoc -f exists
> perldoc -f keys
> perldoc -f values
> perldoc -f grep
>
> And then probably a while loop....So some pseudo code might look like:
>
> open file of categorized domains
> store categorized domains into an easily accessible data structure
> close file
>
> open file for writing to store updated log to
> open file that has log lines in it
> while we read the file, do some stuff, where:
> if the line has a domain name
> pull the domain name
> compare it to the data structure
> if it is in teh data structure update the line
> print the line to the new location
> repeat
>
> close the read file
> close the write file
>
> Have a beer.  By the way you should deny msn.com instead of playboy.com
> ;-)....
>
> http://danconia.org
>



-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to