Re: Parse and Compare Text Files

2003-11-06 Thread John W. Krahn
Mike M wrote:
 
 Hi,

Hello,

 I'm new to Perl and have what I hope is a simple question:  I have a Perl
 script that parses a log file from our proxy server and reformats it to a
 more easily readable space-delimited text file.  I also have another file
 that has a categorized list of internet domains, also space-delimited.  A
 snippet of both text files is below:
 
 Proxy Log
 snip
 10/23/2003 4:18:32 192.168.0.100 http://www.squid-cache.org OK
 10/23/2003 4:18:33 192.168.1.150 http://msn.com OK
 10/23/2003 4:18:33 192.168.1.150 http://www.playboy.com DENIED
 snip
 
 Categorized Domains List
 snip
 msn.com news
 playboy.com porn
 squid-cache.com software
 snip
 
 What I would like to do is write a script that compares the URL in the proxy
 log with the categorized domains list file and creates a new file that looks
 something like this:
 
 New File
 snip
 10/23/2003 4:18:32 192.168.0.100 http://www.squid-cache.org software OK
 10/23/2003 4:18:33 192.168.1.150 http://msn.com news OK
 10/23/2003 4:18:33 192.168.1.150 http://www.playboy.com porn DENIED
 snip
 
 Is this possible with Perl??  I've been trying to do this by importing the
 log files into SQL and then running queries, but it's so much slower than
 Perl (the proxy logs are roughly 1 million lines).  Any ideas?

You could do something like this:

#!/usr/bin/perl -w
use strict;

my $file = 'domains.txt';
my $log  = 'access.log';
my $out  = 'access.out';

my %domains = do {
open my $fh, $file or die Cannot open $file: $!;
local $/;
map split, $fh;
};

my $search = qr/@{[ join '|', map \Q$_, keys %domains ]}/i;

open OUT,  $out or die Cannot open $out: $!;
open FILE, $log or die Cannot open $log: $!;

while ( FILE ) {
s/\b($search)(?=\s+(?:OK|DENIED)$)/ $1 ? $1 $domains{$1} : $1 /e;
print OUT;
}

__END__


John
-- 
use Perl;
program
fulfillment

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Parse and Compare Text Files

2003-10-28 Thread Kevin Pfeiffer
In article [EMAIL PROTECTED], Mike M wrote:
 I've found this script on another message board that is close, but still
 doesn't work with my data.  Any ideas on modifications?  I think my
 biggest problem is the regex in the split function, because what this does
 is match ONLY against the first column in the line, when I need it to
 match anything
 in the fourth column.  Thanks for your help, and I'll see what I can do
 about allowing playboy.com (although since I work at a public school
 district, it might not be a good idea!)   

Even worse (I had assumed this was for an employer)...

 The script follows:

[...]
 while(my $line=FILE2)
 {
 my($num);
 ($num, undef)=split /\s+/,$line, 2;
[...]

This, I believe, says split $line on white space into two pieces; place
first piece into $num and throw away the rest.

If you look at perldoc -f split and then try a couple tests, you should be
able to get what you want.

-Kevin (no expert though)

P.S. - No top-posting please.

-- 
Kevin Pfeiffer


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Parse and Compare Text Files

2003-10-27 Thread Mike M
I've found this script on another message board that is close, but still
doesn't work with my data.  Any ideas on modifications?  I think my biggest
problem is the regex in the split function, because what this does is match
ONLY against the first column in the line, when I need it to match anything
in the fourth column.  Thanks for your help, and I'll see what I can do
about allowing playboy.com (although since I work at a public school
district, it might not be a good idea!)   The script follows:

#!/bin/perl -w
use strict;

my %domains;
open FILE1, ' domains.txt or
die $!;
while(FILE1)
{
chomp;
$domains{$_}=1;
}
close FILE1;

open OUT, ' access.out' or
die $!;
open FILE2, ' access.log' or
die $!;
while(my $line=FILE2)
{
my($num);
($num, undef)=split /\s+/,$line, 2;
if(defined $domains{$num})
{
print OUT $line;
}
else
{
print OUT $line not found;
}

}
close FILE2;
close OUT;


Wiggins D'Anconia [EMAIL PROTECTED] wrote in message
news:[EMAIL PROTECTED]
 Mike M wrote:
  Hi,
 
  I'm new to Perl and have what I hope is a simple question:  I have a
Perl
  script that parses a log file from our proxy server and reformats it to
a
  more easily readable space-delimited text file.  I also have another
file
  that has a categorized list of internet domains, also space-delimited.
A
  snippet of both text files is below:
 
  Proxy Log
  snip
  10/23/2003 4:18:32 192.168.0.100 http://www.squid-cache.org OK
  10/23/2003 4:18:33 192.168.1.150 http://msn.com OK
  10/23/2003 4:18:33 192.168.1.150 http://www.playboy.com DENIED
  snip
 
  Categorized Domains List
  snip
  msn.com news
  playboy.com porn
  squid-cache.com software
  snip
 
  What I would like to do is write a script that compares the URL in the
proxy
  log with the categorized domains list file and creates a new file that
looks
  something like this:
 
  New File
  snip
  10/23/2003 4:18:32 192.168.0.100 http://www.squid-cache.org software OK
  10/23/2003 4:18:33 192.168.1.150 http://msn.com news OK
  10/23/2003 4:18:33 192.168.1.150 http://www.playboy.com porn DENIED
  snip
 
  Is this possible with Perl??  I've been trying to do this by importing
the
  log files into SQL and then running queries, but it's so much slower
than
  Perl (the proxy logs are roughly 1 million lines).  Any ideas?
 

 What have you tried, where have you failed?  Just about anything is
 possible with Perl, and there are hints that will make this more
 bearable, but this isn't a one stop shopping place, so give it a try
 yourself first

 You seem to have a good grasp of what is needed, break it down into
 parts and see what you come up with...

 1. We need a list of domains to match against and what category they are
in,
 2. We need a line from the log to get the domain,
 3. We need to look up into the list of domains to see if the domain is
 there,
 4. If it is we need to add the category to the end of the domain,
 5. We need a place to store the information back to.

 So you need at the very least:

 perldoc -f open
 perldoc -f print

 And you are probably going to want,

 perldoc -f exists
 perldoc -f keys
 perldoc -f values
 perldoc -f grep

 And then probably a while loopSo some pseudo code might look like:

 open file of categorized domains
 store categorized domains into an easily accessible data structure
 close file

 open file for writing to store updated log to
 open file that has log lines in it
 while we read the file, do some stuff, where:
 if the line has a domain name
 pull the domain name
 compare it to the data structure
 if it is in teh data structure update the line
 print the line to the new location
 repeat

 close the read file
 close the write file

 Have a beer.  By the way you should deny msn.com instead of playboy.com
 ;-)

 http://danconia.org




-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Parse and Compare Text Files

2003-10-24 Thread Mike M
Hi,

I'm new to Perl and have what I hope is a simple question:  I have a Perl
script that parses a log file from our proxy server and reformats it to a
more easily readable space-delimited text file.  I also have another file
that has a categorized list of internet domains, also space-delimited.  A
snippet of both text files is below:

Proxy Log
snip
10/23/2003 4:18:32 192.168.0.100 http://www.squid-cache.org OK
10/23/2003 4:18:33 192.168.1.150 http://msn.com OK
10/23/2003 4:18:33 192.168.1.150 http://www.playboy.com DENIED
snip

Categorized Domains List
snip
msn.com news
playboy.com porn
squid-cache.com software
snip

What I would like to do is write a script that compares the URL in the proxy
log with the categorized domains list file and creates a new file that looks
something like this:

New File
snip
10/23/2003 4:18:32 192.168.0.100 http://www.squid-cache.org software OK
10/23/2003 4:18:33 192.168.1.150 http://msn.com news OK
10/23/2003 4:18:33 192.168.1.150 http://www.playboy.com porn DENIED
snip

Is this possible with Perl??  I've been trying to do this by importing the
log files into SQL and then running queries, but it's so much slower than
Perl (the proxy logs are roughly 1 million lines).  Any ideas?

Thanks in advance for your help.

Mike



-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Parse and Compare Text Files

2003-10-24 Thread Wiggins d'Anconia
Mike M wrote:
Hi,

I'm new to Perl and have what I hope is a simple question:  I have a Perl
script that parses a log file from our proxy server and reformats it to a
more easily readable space-delimited text file.  I also have another file
that has a categorized list of internet domains, also space-delimited.  A
snippet of both text files is below:
Proxy Log
snip
10/23/2003 4:18:32 192.168.0.100 http://www.squid-cache.org OK
10/23/2003 4:18:33 192.168.1.150 http://msn.com OK
10/23/2003 4:18:33 192.168.1.150 http://www.playboy.com DENIED
snip
Categorized Domains List
snip
msn.com news
playboy.com porn
squid-cache.com software
snip
What I would like to do is write a script that compares the URL in the proxy
log with the categorized domains list file and creates a new file that looks
something like this:
New File
snip
10/23/2003 4:18:32 192.168.0.100 http://www.squid-cache.org software OK
10/23/2003 4:18:33 192.168.1.150 http://msn.com news OK
10/23/2003 4:18:33 192.168.1.150 http://www.playboy.com porn DENIED
snip
Is this possible with Perl??  I've been trying to do this by importing the
log files into SQL and then running queries, but it's so much slower than
Perl (the proxy logs are roughly 1 million lines).  Any ideas?
What have you tried, where have you failed?  Just about anything is 
possible with Perl, and there are hints that will make this more 
bearable, but this isn't a one stop shopping place, so give it a try 
yourself first

You seem to have a good grasp of what is needed, break it down into 
parts and see what you come up with...

1. We need a list of domains to match against and what category they are in,
2. We need a line from the log to get the domain,
3. We need to look up into the list of domains to see if the domain is 
there,
4. If it is we need to add the category to the end of the domain,
5. We need a place to store the information back to.

So you need at the very least:

perldoc -f open
perldoc -f print
And you are probably going to want,

perldoc -f exists
perldoc -f keys
perldoc -f values
perldoc -f grep
And then probably a while loopSo some pseudo code might look like:

open file of categorized domains
store categorized domains into an easily accessible data structure
close file
open file for writing to store updated log to
open file that has log lines in it
while we read the file, do some stuff, where:
if the line has a domain name
pull the domain name
compare it to the data structure
if it is in teh data structure update the line
print the line to the new location
repeat
close the read file
close the write file
Have a beer.  By the way you should deny msn.com instead of playboy.com 
;-)

http://danconia.org

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]