Hi:

Newbie here. This is my first attempt at PHP scripting. I'm trying to find
an alternative to Lotus Domino's domlog.nsf for logging web transactions.
Domino does create an Apache compatible text file of the web transactions,
and this is what I’m trying to parse. I started off using a code snibbet I
found on the web. I modified it a little bit to suit my needs. It was
working fine with the small 600k test log file I was using, but since I’ve
moved to the larger 18Mb production log file here’s what happens:

I’ve modified the code and added an echo statement to echo each loop that
gets processed. Initially it starts off very fast but then performance
becomes very slow, to a point where I can count each loop as it’s being
processed. It’s taking a little over 3 hours to parse the entire file. I
figured it was a disk cache thing, so I created a ram drive. This has
improved the performance, but is still taking an hour to parse.

Here is the PHP script I’m using:


<?php

$ac_arr = file('access_log');
$astring = join("", $ac_arr);
$astring = preg_replace("/(\r|\t)/", "", $astring);
$records = preg_split("/(\n)/", $astring, -1, PREG_SPLIT_NO_EMPTY);

$sizerecs = sizeof($records);

// now split into records
$i = 1;
$each_rec = 0;

while($i<$sizerecs) {
$all = $records[$i];

// IP Address ($IP):
$IP = substr($all, 0, strpos($all, " "));
$all = str_replace($IP, "", $all);

//Remote User ($RU):
$string = substr($all, 0, strpos($all, " [")); // www.vpcl.on.ca T123
$sstring = substr($string, strpos($string, " ")+1);
$AUstring = substr($sstring, strpos($sstring, " "));
$RU = preg_replace("/\"/", "", $AUstring);
$RU = trim($RU);
$all = str_replace($string, "", $all);

//Request Time Stamp ($RTS):
preg_match("/\[(.+)\]/", $all, $match);
$RTS = $match[1];
$all = str_replace(" [$RTS] \"", "", $all);

//Http Request Line ($HRL):
$string = substr($all, 0, strpos($all, "\"")+2);
$HRL = str_replace("\"", "", $string);
$all = str_replace($string, "", $all);

//Http Response Status Code (HRSC):
$HRSC = trim(substr($all, 0, strpos($all, " ")+1));
$all = str_replace($HRSC, "", $all);

//Request Content Length (RCL):
$string = substr($all, 0, strpos($all, "\"")+1);
$RCL = trim(str_replace("\"", "", $string));
$all = str_replace($string, "", $all);

//Referring URL (RefU):
$string = substr($all, 0, strpos($all, "\"")+3);
$RefU = substr($all, 0, strpos($all, "\""));
$all = str_replace($string, "", $all);

//User Agent (UA):
$string = substr($all, 0, strpos($all, "\"")+2);
$UA = substr($all, 0, strpos($all, "\""));
$all = str_replace($string, "", $all);

//Time to Process Request:

#$new_format[$each_rec] = "$UA\n";
$new_format[$each_rec] =
"$IP\t$RU\t$RTS\t$HRL\t$HRSC\t$RCL\t$RefU\t$UA\t$all\n";

$fhandle = fopen("/ramdrive/import_file.txt", "w");
  foreach($new_format as $data) {
    fputs($fhandle, "$data");
    }
  fclose($fhandle);

// advance to next record
echo "$i\n";
$i = $i + 1;

$each_rec++;
}
?>


This is running on a Toshiba Tecra A4 Laptop with FreeBSD 7.0 Release.
Plenty of RAM and HDD space. The PHP Version is:

PHP 5.2.5 with Suhosin-Patch 0.9.6.2 (cli) (built: Feb 11 2009 09:28:47)
Copyright (c) 1997-2007 The PHP Group
Zend Engine v2.2.0, Copyright (c) 1998-2007 Zend Technologies

What should I do to get this script to run faster?

Any help is appreciated….

Regards,



Fred Schnittke


----------------------------
Powered by Execulink Webmail
http://www.execulink.com/


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to