Hi Derek.
Derek Romeyn wrote:
> Using your idea I ended up with data like this. Which is odd because
> the database should only include 400 and 500 type errors.
>
[snip]
>
> 404 24.54.175.153 - - [11/Mar/2003:07:48:37 -0800] "GET
> /e/t/invest/img/spacer.gif HTTP/1.1" 404 0 "https://
> 370 209.91.198.57 - - [11/Mar/2003:07:48:24 -0800] "GET
> /e/t/search/aaa?qmenu=2&sym=dyn, intc HTTP/1.0" 400 370
> 526 66.196.65.24 - - [11/Mar/2003:07:54:32 -0800] "GET
> /mod_ssl:error:HTTP-request HTTP/1.0" 400 526 "-" "Mozilla/5.0 (Slur
> 178 167.127.163.141 - isklvjyy [11/Mar/2003:08:02:46 -0800] "GET /e/t/aaa
> HTTP/1.1" 500 178 "-" "Mozilla/4.0 (compatible
> 404 68.39.167.38 - - [11/Mar/2003:08:06:34 -0800] "GET /e/t/aaa/img/spacer.gif
> HTTP/1.1" 404 0 "https://us.etrade.com/e/
> 526 65.248.129.126 - - [11/Mar/2003:08:03:20 -0800] "GET
> /mod_ssl:error:HTTP-request HTTP/1.0" 400 526 "-" "Mozilla/4.0 [en
> 526 65.248.129.126 - - [11/Mar/2003:08:03:20 -0800] "GET
> /mod_ssl:error:HTTP-request HTTP/1.0" 400 526 "-" "Mozilla/4.0 [en
>
> The 404's were right but the rest took the second group of numbers
> instead of the needed first.
>
> This is how my code looked:
>
> my $code,$msg;
> foreach (@RAW_DATA) {
> $code = $1 if m|HTTP.*\s+(\d{3})|g;
Here's your problem. You're searching for 'HTTP', followed
by any number of any character, followed by one or more
whitespace characters and three digits. Because the '.*' will
eat up as much as it can, the captured digits will be the
/last/ occurrence of three digits following a space. If you
change '.*' into '.*?' it will match as few characters as possible
and you'll get the three digits you want.
Also, do you need the /g modifier on this search? I don't
think it can make any difference in this context. I'd
recommend using /x though so that you can lay it out
a little more visibly.
> ($timestamp, $msg) = split(/\t/);
I'm not clear from your data which fields you're extracting,
but I assume this split works as you haven't said otherwise.
> if (!$code) {
> print "NEXT\n";
> next;
> }
Surely you really want to 'next' if the initial match fails?
> print "$code\t$msg\n";
> $code = 0;
> }
>
> I did manage to get a version of George's to work. Still interested
> in trying all variations though.
The following corrects all my points above. Use it if you like it.
HTH,
Rob
foreach (@RAW_DATA) {
unless ( m| HTTP.*? \s+ (\d{3}) |x ) {
print "NEXT\n";
next;
}
my $code = $1;
my ($timestamp, $msg) = split(/\t/);
print "$code\t$msg\n";
}
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]