Re: Parse and Compare Text Files
Mike M wrote: Hi, Hello, I'm new to Perl and have what I hope is a simple question: I have a Perl script that parses a log file from our proxy server and reformats it to a more easily readable space-delimited text file. I also have another file that has a categorized list of internet domains, also space-delimited. A snippet of both text files is below: Proxy Log snip 10/23/2003 4:18:32 192.168.0.100 http://www.squid-cache.org OK 10/23/2003 4:18:33 192.168.1.150 http://msn.com OK 10/23/2003 4:18:33 192.168.1.150 http://www.playboy.com DENIED snip Categorized Domains List snip msn.com news playboy.com porn squid-cache.com software snip What I would like to do is write a script that compares the URL in the proxy log with the categorized domains list file and creates a new file that looks something like this: New File snip 10/23/2003 4:18:32 192.168.0.100 http://www.squid-cache.org software OK 10/23/2003 4:18:33 192.168.1.150 http://msn.com news OK 10/23/2003 4:18:33 192.168.1.150 http://www.playboy.com porn DENIED snip Is this possible with Perl?? I've been trying to do this by importing the log files into SQL and then running queries, but it's so much slower than Perl (the proxy logs are roughly 1 million lines). Any ideas? You could do something like this: #!/usr/bin/perl -w use strict; my $file = 'domains.txt'; my $log = 'access.log'; my $out = 'access.out'; my %domains = do { open my $fh, $file or die Cannot open $file: $!; local $/; map split, $fh; }; my $search = qr/@{[ join '|', map \Q$_, keys %domains ]}/i; open OUT, $out or die Cannot open $out: $!; open FILE, $log or die Cannot open $log: $!; while ( FILE ) { s/\b($search)(?=\s+(?:OK|DENIED)$)/ $1 ? $1 $domains{$1} : $1 /e; print OUT; } __END__ John -- use Perl; program fulfillment -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Parse and Compare Text Files
In article [EMAIL PROTECTED], Mike M wrote: I've found this script on another message board that is close, but still doesn't work with my data. Any ideas on modifications? I think my biggest problem is the regex in the split function, because what this does is match ONLY against the first column in the line, when I need it to match anything in the fourth column. Thanks for your help, and I'll see what I can do about allowing playboy.com (although since I work at a public school district, it might not be a good idea!) Even worse (I had assumed this was for an employer)... The script follows: [...] while(my $line=FILE2) { my($num); ($num, undef)=split /\s+/,$line, 2; [...] This, I believe, says split $line on white space into two pieces; place first piece into $num and throw away the rest. If you look at perldoc -f split and then try a couple tests, you should be able to get what you want. -Kevin (no expert though) P.S. - No top-posting please. -- Kevin Pfeiffer -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Parse and Compare Text Files
I've found this script on another message board that is close, but still doesn't work with my data. Any ideas on modifications? I think my biggest problem is the regex in the split function, because what this does is match ONLY against the first column in the line, when I need it to match anything in the fourth column. Thanks for your help, and I'll see what I can do about allowing playboy.com (although since I work at a public school district, it might not be a good idea!) The script follows: #!/bin/perl -w use strict; my %domains; open FILE1, ' domains.txt or die $!; while(FILE1) { chomp; $domains{$_}=1; } close FILE1; open OUT, ' access.out' or die $!; open FILE2, ' access.log' or die $!; while(my $line=FILE2) { my($num); ($num, undef)=split /\s+/,$line, 2; if(defined $domains{$num}) { print OUT $line; } else { print OUT $line not found; } } close FILE2; close OUT; Wiggins D'Anconia [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] Mike M wrote: Hi, I'm new to Perl and have what I hope is a simple question: I have a Perl script that parses a log file from our proxy server and reformats it to a more easily readable space-delimited text file. I also have another file that has a categorized list of internet domains, also space-delimited. A snippet of both text files is below: Proxy Log snip 10/23/2003 4:18:32 192.168.0.100 http://www.squid-cache.org OK 10/23/2003 4:18:33 192.168.1.150 http://msn.com OK 10/23/2003 4:18:33 192.168.1.150 http://www.playboy.com DENIED snip Categorized Domains List snip msn.com news playboy.com porn squid-cache.com software snip What I would like to do is write a script that compares the URL in the proxy log with the categorized domains list file and creates a new file that looks something like this: New File snip 10/23/2003 4:18:32 192.168.0.100 http://www.squid-cache.org software OK 10/23/2003 4:18:33 192.168.1.150 http://msn.com news OK 10/23/2003 4:18:33 192.168.1.150 http://www.playboy.com porn DENIED snip Is this possible with Perl?? I've been trying to do this by importing the log files into SQL and then running queries, but it's so much slower than Perl (the proxy logs are roughly 1 million lines). Any ideas? What have you tried, where have you failed? Just about anything is possible with Perl, and there are hints that will make this more bearable, but this isn't a one stop shopping place, so give it a try yourself first You seem to have a good grasp of what is needed, break it down into parts and see what you come up with... 1. We need a list of domains to match against and what category they are in, 2. We need a line from the log to get the domain, 3. We need to look up into the list of domains to see if the domain is there, 4. If it is we need to add the category to the end of the domain, 5. We need a place to store the information back to. So you need at the very least: perldoc -f open perldoc -f print And you are probably going to want, perldoc -f exists perldoc -f keys perldoc -f values perldoc -f grep And then probably a while loopSo some pseudo code might look like: open file of categorized domains store categorized domains into an easily accessible data structure close file open file for writing to store updated log to open file that has log lines in it while we read the file, do some stuff, where: if the line has a domain name pull the domain name compare it to the data structure if it is in teh data structure update the line print the line to the new location repeat close the read file close the write file Have a beer. By the way you should deny msn.com instead of playboy.com ;-) http://danconia.org -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Parse and Compare Text Files
Hi, I'm new to Perl and have what I hope is a simple question: I have a Perl script that parses a log file from our proxy server and reformats it to a more easily readable space-delimited text file. I also have another file that has a categorized list of internet domains, also space-delimited. A snippet of both text files is below: Proxy Log snip 10/23/2003 4:18:32 192.168.0.100 http://www.squid-cache.org OK 10/23/2003 4:18:33 192.168.1.150 http://msn.com OK 10/23/2003 4:18:33 192.168.1.150 http://www.playboy.com DENIED snip Categorized Domains List snip msn.com news playboy.com porn squid-cache.com software snip What I would like to do is write a script that compares the URL in the proxy log with the categorized domains list file and creates a new file that looks something like this: New File snip 10/23/2003 4:18:32 192.168.0.100 http://www.squid-cache.org software OK 10/23/2003 4:18:33 192.168.1.150 http://msn.com news OK 10/23/2003 4:18:33 192.168.1.150 http://www.playboy.com porn DENIED snip Is this possible with Perl?? I've been trying to do this by importing the log files into SQL and then running queries, but it's so much slower than Perl (the proxy logs are roughly 1 million lines). Any ideas? Thanks in advance for your help. Mike -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Parse and Compare Text Files
Mike M wrote: Hi, I'm new to Perl and have what I hope is a simple question: I have a Perl script that parses a log file from our proxy server and reformats it to a more easily readable space-delimited text file. I also have another file that has a categorized list of internet domains, also space-delimited. A snippet of both text files is below: Proxy Log snip 10/23/2003 4:18:32 192.168.0.100 http://www.squid-cache.org OK 10/23/2003 4:18:33 192.168.1.150 http://msn.com OK 10/23/2003 4:18:33 192.168.1.150 http://www.playboy.com DENIED snip Categorized Domains List snip msn.com news playboy.com porn squid-cache.com software snip What I would like to do is write a script that compares the URL in the proxy log with the categorized domains list file and creates a new file that looks something like this: New File snip 10/23/2003 4:18:32 192.168.0.100 http://www.squid-cache.org software OK 10/23/2003 4:18:33 192.168.1.150 http://msn.com news OK 10/23/2003 4:18:33 192.168.1.150 http://www.playboy.com porn DENIED snip Is this possible with Perl?? I've been trying to do this by importing the log files into SQL and then running queries, but it's so much slower than Perl (the proxy logs are roughly 1 million lines). Any ideas? What have you tried, where have you failed? Just about anything is possible with Perl, and there are hints that will make this more bearable, but this isn't a one stop shopping place, so give it a try yourself first You seem to have a good grasp of what is needed, break it down into parts and see what you come up with... 1. We need a list of domains to match against and what category they are in, 2. We need a line from the log to get the domain, 3. We need to look up into the list of domains to see if the domain is there, 4. If it is we need to add the category to the end of the domain, 5. We need a place to store the information back to. So you need at the very least: perldoc -f open perldoc -f print And you are probably going to want, perldoc -f exists perldoc -f keys perldoc -f values perldoc -f grep And then probably a while loopSo some pseudo code might look like: open file of categorized domains store categorized domains into an easily accessible data structure close file open file for writing to store updated log to open file that has log lines in it while we read the file, do some stuff, where: if the line has a domain name pull the domain name compare it to the data structure if it is in teh data structure update the line print the line to the new location repeat close the read file close the write file Have a beer. By the way you should deny msn.com instead of playboy.com ;-) http://danconia.org -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]