I've found this script on another message board that is close, but still
doesn't work with my data. Any ideas on modifications? I think my biggest
problem is the regex in the split function, because what this does is match
ONLY against the first column in the line, when I need it to match anything
in the fourth column. Thanks for your help, and I'll see what I can do
about allowing playboy.com (although since I work at a public school
district, it might not be a good idea!) The script follows:
#!/bin/perl -w
use strict;
my %domains;
open FILE1, '< domains.txt or
die $!;
while(<FILE1>)
{
chomp;
$domains{$_}=1;
}
close FILE1;
open OUT, '>> access.out' or
die $!;
open FILE2, '< access.log' or
die $!;
while(my $line=<FILE2>)
{
my($num);
($num, undef)=split /\s+/,$line, 2;
if(defined $domains{$num})
{
print OUT $line;
}
else
{
print OUT "$line not found";
}
}
close FILE2;
close OUT;
"Wiggins D'Anconia" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
> Mike M wrote:
> > Hi,
> >
> > I'm new to Perl and have what I hope is a simple question: I have a
Perl
> > script that parses a log file from our proxy server and reformats it to
a
> > more easily readable space-delimited text file. I also have another
file
> > that has a categorized list of internet domains, also space-delimited.
A
> > snippet of both text files is below:
> >
> > Proxy Log
> > ----snip----
> > 10/23/2003 4:18:32 192.168.0.100 http://www.squid-cache.org OK
> > 10/23/2003 4:18:33 192.168.1.150 http://msn.com OK
> > 10/23/2003 4:18:33 192.168.1.150 http://www.playboy.com DENIED
> > ----snip----
> >
> > Categorized Domains List
> > ----snip----
> > msn.com news
> > playboy.com porn
> > squid-cache.com software
> > ----snip----
> >
> > What I would like to do is write a script that compares the URL in the
proxy
> > log with the categorized domains list file and creates a new file that
looks
> > something like this:
> >
> > New File
> > ----snip----
> > 10/23/2003 4:18:32 192.168.0.100 http://www.squid-cache.org software OK
> > 10/23/2003 4:18:33 192.168.1.150 http://msn.com news OK
> > 10/23/2003 4:18:33 192.168.1.150 http://www.playboy.com porn DENIED
> > ----snip----
> >
> > Is this possible with Perl?? I've been trying to do this by importing
the
> > log files into SQL and then running queries, but it's so much slower
than
> > Perl (the proxy logs are roughly 1 million lines). Any ideas?
> >
>
> What have you tried, where have you failed? Just about anything is
> possible with Perl, and there are hints that will make this more
> bearable, but this isn't a one stop shopping place, so give it a try
> yourself first....
>
> You seem to have a good grasp of what is needed, break it down into
> parts and see what you come up with...
>
> 1. We need a list of domains to match against and what category they are
in,
> 2. We need a line from the log to get the domain,
> 3. We need to look up into the list of domains to see if the domain is
> there,
> 4. If it is we need to add the category to the end of the domain,
> 5. We need a place to store the information back to.
>
> So you need at the very least:
>
> perldoc -f open
> perldoc -f print
>
> And you are probably going to want,
>
> perldoc -f exists
> perldoc -f keys
> perldoc -f values
> perldoc -f grep
>
> And then probably a while loop....So some pseudo code might look like:
>
> open file of categorized domains
> store categorized domains into an easily accessible data structure
> close file
>
> open file for writing to store updated log to
> open file that has log lines in it
> while we read the file, do some stuff, where:
> if the line has a domain name
> pull the domain name
> compare it to the data structure
> if it is in teh data structure update the line
> print the line to the new location
> repeat
>
> close the read file
> close the write file
>
> Have a beer. By the way you should deny msn.com instead of playboy.com
> ;-)....
>
> http://danconia.org
>
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]