Using your idea I ended up with data like this.  Which is odd because the
database should only include 400 and 500 type errors.

176
404
370
157
404
370
526
178
176
404
526
526

So I went ahead and modified it to print the code and the dataline and got
this:

And got this:

404     24.54.175.153 - - [11/Mar/2003:07:48:37 -0800] "GET
/e/t/invest/img/spacer.gif HTTP/1.1" 404 0 "https://
370     209.91.198.57 - - [11/Mar/2003:07:48:24 -0800] "GET
/e/t/search/aaa?qmenu=2&sym=dyn, intc HTTP/1.0" 400 370 
526     66.196.65.24 - - [11/Mar/2003:07:54:32 -0800] "GET
/mod_ssl:error:HTTP-request HTTP/1.0" 400 526 "-" "Mozilla/5.0 (Slur
178     167.127.163.141 - isklvjyy [11/Mar/2003:08:02:46 -0800] "GET
/e/t/aaa HTTP/1.1" 500 178 "-" "Mozilla/4.0 (compatible
404     68.39.167.38 - - [11/Mar/2003:08:06:34 -0800] "GET
/e/t/aaa/img/spacer.gif HTTP/1.1" 404 0 "https://us.etrade.com/e/
526     65.248.129.126 - - [11/Mar/2003:08:03:20 -0800] "GET
/mod_ssl:error:HTTP-request HTTP/1.0" 400 526 "-" "Mozilla/4.0 [en
526     65.248.129.126 - - [11/Mar/2003:08:03:20 -0800] "GET
/mod_ssl:error:HTTP-request HTTP/1.0" 400 526 "-" "Mozilla/4.0 [en

The 404's were right but the rest took the second group of numbers instead
of the needed first.

This is how my code looked:

my $code,$msg;
foreach (@RAW_DATA) {
        $code = $1 if m|HTTP.*\s+(\d{3})|g;
        ($timestamp, $msg) = split(/\t/);
        if (!$code) {
                print "NEXT\n";
                next;
        }
        print "$code\t$msg\n";
        $code = 0;
}



I did manage to get a version of George's to work.  Still interested in
trying all variations though.


Derek


-----Original Message-----
From: Brett W. McCoy [mailto:[EMAIL PROTECTED]
Sent: Wednesday, March 12, 2003 11:02 AM
To: Romeyn, Derek
Cc: 'George P.'; '[EMAIL PROTECTED]'
Subject: RE: Regular Expressions http error code


On Wed, 12 Mar 2003, Brett W. McCoy wrote:

> You're not capturing the correct string.  Here's a code snippet I just
> tried on an Apache log that worked (assuming you have an open file
> handle):
>
> while(<LOG>) {
>       print "$1\n" if m|HTTP.*\s+(\d{3})|g';
> }
>
> $1 contains the matched string inside the parens (\d{3}).

Dang it, when I cut and pasted, a rogue ' got into the code somehow.  That
final ' on the regular expression should not be there.

-- Brett
                                          http://www.chapelperilous.net/
------------------------------------------------------------------------
You are not a fool just because you have done something foolish --
only if the folly of it escapes you.

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to