Re: Parse and Compare Text Files

Wiggins d'Anconia Fri, 24 Oct 2003 16:47:06 -0700

Mike M wrote:

Hi,

I'm new to Perl and have what I hope is a simple question:  I have a Perl
script that parses a log file from our proxy server and reformats it to a
more easily readable space-delimited text file.  I also have another file
that has a categorized list of internet domains, also space-delimited.  A
snippet of both text files is below:

Proxy Log
----snip----
10/23/2003 4:18:32 192.168.0.100 http://www.squid-cache.org OK
10/23/2003 4:18:33 192.168.1.150 http://msn.com OK
10/23/2003 4:18:33 192.168.1.150 http://www.playboy.com DENIED
----snip----

Categorized Domains List
----snip----
msn.com news
playboy.com porn
squid-cache.com software
----snip----

What I would like to do is write a script that compares the URL in the proxy
log with the categorized domains list file and creates a new file that looks
something like this:

New File
----snip----
10/23/2003 4:18:32 192.168.0.100 http://www.squid-cache.org software OK
10/23/2003 4:18:33 192.168.1.150 http://msn.com news OK
10/23/2003 4:18:33 192.168.1.150 http://www.playboy.com porn DENIED
----snip----

Is this possible with Perl??  I've been trying to do this by importing the
log files into SQL and then running queries, but it's so much slower than
Perl (the proxy logs are roughly 1 million lines).  Any ideas?

What have you tried, where have you failed? Just about anything is possible with Perl, and there are hints that will make this more bearable, but this isn't a one stop shopping place, so give it a try yourself first....

You seem to have a good grasp of what is needed, break it down into parts and see what you come up with...

1. We need a list of domains to match against and what category they are in, 2. We need a line from the log to get the domain, 3. We need to look up into the list of domains to see if the domain is there, 4. If it is we need to add the category to the end of the domain, 5. We need a place to store the information back to.

So you need at the very least:

perldoc -f open
perldoc -f print

And you are probably going to want,

perldoc -f exists
perldoc -f keys
perldoc -f values
perldoc -f grep

And then probably a while loop....So some pseudo code might look like:

open file of categorized domains
store categorized domains into an easily accessible data structure
close file

open file for writing to store updated log to
open file that has log lines in it
while we read the file, do some stuff, where:
if the line has a domain name
pull the domain name
compare it to the data structure
if it is in teh data structure update the line
print the line to the new location
repeat

close the read file
close the write file

Have a beer. By the way you should deny msn.com instead of playboy.com ;-)....

http://danconia.org


--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Parse and Compare Text Files

Reply via email to