for files of any substance, it's much better (cheaper and faster) to read
one line at a time into memory for parsing:
fileObject = open("myfile.log")
while 1:
line = fileObject.readline()
if not line: break
// parse the line here
Chad
-----Original Message-----
From: Adam Getchell [mailto:[EMAIL PROTECTED]]
Sent: Thursday, January 30, 2003 8:18 PM
To: [EMAIL PROTECTED]
Subject: Calculating Squid Hits per IP address
Hello all,
I took recipe 7.5 in the Python Cookbook
(http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/65251) as a start
towards getting some info about hits against our webcache, Squid. I think
I'm missing something simple (or fundamental :-\ )
(I'd post a comment on the recipe, but the webpage is refusing my ASPN
membership currently for this function.)
A line of squid log looks like this:
1040217940.137 31 24.120.240.228 TCP_MISS/200 936 GET
http://XXX.YYY.ZZZ.WWW/Emp/Careers/Images/midMenu_06.gif -
DIRECT/XXX.YYY.ZZZ.WWW image/gif
Which places the Ip address in the 3rd field, unlike the Apache one.
However, the method:
Contents = open(logfile_pathname,"r").xreadlines()
Grabs all of the data in one big line: when I parse the data, I only get the
first entry.
I'm developing with ActivePython-2.2 win32all build 148, although this is
intended to run cross-platform on an OpenBSD box. Here's the results when
run on the command line:
C:\etc>CalculateSquidIpHits.py access.log
{'140.247.117.79': 1}
It grabs the first entry only. I looked at using:
file_object = open(logfile_pathname)
Contents = list(file_object)
But had the same result, only slower (which might lockup on a large logfile
as the recipe suggests).
Thanks for any assistance! (I have LP, PP, PPonWin32, and the PC, so I'll
read some more -- but I learn best by doing ...)
Here's my modified code from the recipe:
def CalculateSquidIpHits(logfile_pathname):
# Make a dictionary to store IP addresses and their hit counts
# and read the contents of the log file line by line
IpHitListing = {}
Contents = open(logfile_pathname, "r").xreadlines()
#file_object = open(logfile_pathname)
#Contents = list(file_object)
# Go through each line of the logfile
for line in Contents:
# Split the string to isolate IP address -- changed from [0] on
Apache
Ip = line.split()[2]
# Ensure length is proper
if 6 < len(Ip) <= 15:
# Increase by 1 if Ip exists; else set hit count = 1
IpHitListing[Ip] = IpHitListing.get(Ip, 0) + 1
return IpHitListing
def main():
import sys
if len(sys.argv)>=2:
HitsDictionary = CalculateSquidIpHits(sys.argv[1])
print HitsDictionary
else:
print "Usage: CalculateSquidIpHits [Logfile]"
if __name__ == '__main__':
main()
***************************
* Adam Getchell [EMAIL PROTECTED]
* System Architect/Programmer (530) 752-1584
* Human Resources Information Systems http://www.hr.ucdavis.edu/
***************************
"Invincibility is in oneself, vulnerability in the opponent." -- Sun Tzu
_______________________________________________
ActivePython mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Other options: http://listserv.ActiveState.com/mailman/listinfo/ActivePython
_______________________________________________
ActivePython mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Other options: http://listserv.ActiveState.com/mailman/listinfo/ActivePython