Re: Need to pull matched string plus a few additional bytes

D. Bolliger Fri, 27 Oct 2006 08:11:25 -0700

Phil Miller am Freitag, 27. Oktober 2006 15:36:
> I am working on my very first program and have run into a bit of a
> roadblock.  I am trying to print a report of users who show up in an IIS
> Log file.  The good news is that the format of the userid is
> WINDOWSDOMAIN\USERID.  The bad news is that it is not always at the same
> place in the IIS Log file due to some variable length fields that come
> before it.  Its location can vary left or right by about 10 bytes.
>
>
>
> I read the IIS Log file in one line at a time.  I have gotten far enough
> that I can identify the lines with WINDOWSDOMAIN on it, but am stuck
> there.  The code $userid = substr($logfile_in, 33, 12); gets me close
> but depending on the length of the date, the time or the IP address, it
> is usually off by a few bytes.  A sample of the input is below to
> explain what I am talking about.
>
>
>
> 2006-10-23 12:08:47 24.32.35.123 WINDOWSDOMAIN\USERID 175.128.127.43 80
> GET /itd/styles/main.css
>
> 2006-10-23 12:08:47 24.32.35.123 WINDOWSDOMAIN\USERID 175.128.127.43 80
> GET /itd/styles/contents.aspx
>
> 2006-10-23 12:08:47 24.32.35.123 WINDOWSDOMAIN\USERID 175.128.127.43 80
> GET /itd/styles/footer.aspx
>
>
>
> Essentially what I need to do is find the WINDOWSDOMAIN on a line, and
> write to a file the matched string plus \USERID data (up to the next
> space).  Does anyone have any suggestions?  I'm thinking there must be
> some very easy way to do it since Perl is made for this sort of thing.
> I remember reading about some Perl built-in capability that would take a
> scalar variable and parse it into an array based on a delimiter, but I
> can't remember what it is.  That would probably do it for me.  But if
> you know of a better way, I'm all ears.


Here's demonstration code how you can do it with a regex or with split.
The code assumes that the GET line and the line above are on one line in the 
log.

The two demonstration subs return 1 on match and 0 otherwise, so the counter 
can be updated by the subs' return value.

The $miss_counter is calculated only once, from the hits and the number of 
lines read.

The data after __DATA__ may be wrapped by your mail client (4 lines).

I'm not sure if "WINDOWSDOMAIN" is meant as a hardcoded constant.


#!/usr/bin/perl

use strict;
use warnings;


# see perldoc perlre
#
sub do_regex {
  $_=shift;
  if (m;  \w+  \\  (\w+)  .*  \s/itd/  ;ix) { # NOT OPTIMAL!
    print "userid (regex): $1\n";
    return 1;
  }
  return 0;
}

# see perldoc -f split
#
sub do_split {
  $_=shift;
  my @parts=split;
  if ($parts[7]=~m;/itd/;i) {
    if ( my ($domain, $userid)=split m;\\;, $parts[3] ) {
      print "userid (split): $userid\n";
      return 1;
    }
  }
  return 0;
}

my $hit_counter=0;

while (<DATA>) {
  $hit_counter+=do_regex($_);
  do_split($_);
}

my $miss_counter=$.- $hit_counter;

print "hits: $hit_counter / missed: $miss_counter / read: $. lines\n"

__DATA__
2006-10-23 12:08:47 24.32.35.123 WINDOWSDOMAIN\USERID 175.128.127.43 80 GET 
/itd/styles/main.css
blubb blubb foo bar dummy asdf 44 44 55 66
2006-10-23 12:08:47 24.32.35.123 WINDOWSDOMAIN\USERID 175.128.127.43 80 GET 
/itd/styles/contents.aspx
2006-10-23 12:08:47 24.32.35.123 WINDOWSDOMAIN\USERID 175.128.127.43 80 GET 
/itd/styles/footer.aspx


==============

Some random annotations to your code (there are others as well), 
UNTESTED:

> Below is the code I am using.

# with the following statements your life will be easier!
#
use strict; use warnings;

> open USERIDOUT, ">userid.out.txt";

# perldoc -f open
# perldoc perlvar
#
open my $outf, '>', 'userid.out.txt' or die $!;

> open IISLOG, "<ex061023.log";

open my $log, '<', 'ex061023.log' or die $!;

> $ctr = 0;
> $hit_counter = 0;
> $miss_counter = 0;
> $logfile_in;
> $userid;

Put "my" in front of all these declarations/definitions.

> while (<IISLOG>)

while (<$log>)

> {
>                 $logfile_in = $_;
>                 if ( ($logfile_in =~ m/WINDOWSDOMAIN/i && $logfile_in =~
> m/itd/i)

I think you can omit on () pair here.

>                                 )
>                 {
>                                 print "\n** Found success\n";
>                                 $hit_counter += 1;

# same as
#
$hit_counter++;

>                                 $userid = substr($logfile_in, 33, 12);
> # This is not correct but is somewhat close
>                                 print "\n", $userid;
>                 }
>                 else
>                 {
>                                 print "Did not find success\n";
>                                 $miss_counter += 1;
>                 }
> }
> print "\n Hit Counter = ", $hit_counter;
> print "\n Miss Counter = ", $miss_counter;
> print "\n Total Records Counter = ", $hit_counter + $miss_counter;
>
> close USERIDOUT;

close $outf or die $!;

> close IISLOG;

close $log or die $!;





Dani

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

Re: Need to pull matched string plus a few additional bytes

Reply via email to