Hi,
I'm new to Perl and have what I hope is a simple question: I have a Perl script that parses a log file from our proxy server and reformats it to a more easily readable space-delimited text file. I also have another file that has a categorized list of internet domains, also space-delimited. A snippet of both text files is below:
Proxy Log ----snip---- 10/23/2003 4:18:32 192.168.0.100 http://www.squid-cache.org OK 10/23/2003 4:18:33 192.168.1.150 http://msn.com OK 10/23/2003 4:18:33 192.168.1.150 http://www.playboy.com DENIED ----snip----
Categorized Domains List ----snip---- msn.com news playboy.com porn squid-cache.com software ----snip----
What I would like to do is write a script that compares the URL in the proxy log with the categorized domains list file and creates a new file that looks something like this:
New File ----snip---- 10/23/2003 4:18:32 192.168.0.100 http://www.squid-cache.org software OK 10/23/2003 4:18:33 192.168.1.150 http://msn.com news OK 10/23/2003 4:18:33 192.168.1.150 http://www.playboy.com porn DENIED ----snip----
Is this possible with Perl?? I've been trying to do this by importing the log files into SQL and then running queries, but it's so much slower than Perl (the proxy logs are roughly 1 million lines). Any ideas?
What have you tried, where have you failed? Just about anything is possible with Perl, and there are hints that will make this more bearable, but this isn't a one stop shopping place, so give it a try yourself first....
You seem to have a good grasp of what is needed, break it down into parts and see what you come up with...
1. We need a list of domains to match against and what category they are in,
2. We need a line from the log to get the domain,
3. We need to look up into the list of domains to see if the domain is there,
4. If it is we need to add the category to the end of the domain,
5. We need a place to store the information back to.
So you need at the very least:
perldoc -f open perldoc -f print
And you are probably going to want,
perldoc -f exists perldoc -f keys perldoc -f values perldoc -f grep
And then probably a while loop....So some pseudo code might look like:
open file of categorized domains store categorized domains into an easily accessible data structure close file
open file for writing to store updated log to open file that has log lines in it while we read the file, do some stuff, where: if the line has a domain name pull the domain name compare it to the data structure if it is in teh data structure update the line print the line to the new location repeat
close the read file close the write file
Have a beer. By the way you should deny msn.com instead of playboy.com ;-)....
http://danconia.org
-- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]