You might even be better off using the fileinput module:
import fileinput
for line in fileinput.input(filename):
process(line)
I find this to be much cleaner, and you don't have to worry about closing the file, or keeping track of state or anything like that.
And you're still iterating over a line at a time, and not reading the entire file into a buffer (could be very memory intensive, depending on the size of the file, and if you're talking about a webcache log, it's probably a big file)
Just my $0.02
--Kevin Murphy
On Fri, 31 Jan 2003 07:35:42 -0800, Chad Maine wrote:
>for files of any substance, it's much better (cheaper and faster) to read
>one line at a time into memory for parsing:
>
>
>fileObject = open("myfile.log")
>while 1:
> line = fileObject.readline()
> if not line: break
> // parse the line here
>
>Chad
>
>-----Original Message-----
>From: Adam Getchell [mailto:[EMAIL PROTECTED]]
>Sent: Thursday, January 30, 2003 8:18 PM
>To: [EMAIL PROTECTED]
>Subject: Calculating Squid Hits per IP address
>
>
>Hello all,
>
>I took recipe 7.5 in the Python Cookbook
>(http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/65251) as a start
>towards getting some info about hits against our webcache, Squid. I think
>I'm missing something simple (or fundamental :-\)
>
>(I'd post a comment on the recipe, but the webpage is refusing my ASPN
>membership currently for this function.)
>
>A line of squid log looks like this:
>
>1040217940.137 31 24.120.240.228 TCP_MISS/200 936 GET
>http://XXX.YYY.ZZZ.WWW/Emp/Careers/Images/midMenu_06.gif -
>DIRECT/XXX.YYY.ZZZ.WWW image/gif
>
>Which places the Ip address in the 3rd field, unlike the Apache one.
>
>However, the method:
>
>Contents = open(logfile_pathname,"r").xreadlines()
>
>Grabs all of the data in one big line: when I parse the data, I only get the
>first entry.
>
>I'm developing with ActivePython-2.2 win32all build 148, although this is
>intended to run cross-platform on an OpenBSD box. Here's the results when
>run on the command line:
>
>C:\etc>CalculateSquidIpHits.py access.log
>{'140.247.117.79': 1}
>
>It grabs the first entry only. I looked at using:
>
>file_object = open(logfile_pathname)
>Contents = list(file_object)
>
>But had the same result, only slower (which might lockup on a large logfile
>as the recipe suggests).
>
>Thanks for any assistance! (I have LP, PP, PPonWin32, and the PC, so I'll
>read some more -- but I learn best by doing ...)
>
>Here's my modified code from the recipe:
>
>def CalculateSquidIpHits(logfile_pathname):
> # Make a dictionary to store IP addresses and their hit counts
> # and read the contents of the log file line by line
> IpHitListing = {}
> Contents = open(logfile_pathname, "r").xreadlines()
> #file_object = open(logfile_pathname)
> #Contents = list(file_object)
>
> # Go through each line of the logfile
> for line in Contents:
> # Split the string to isolate IP address -- changed from [0] on
>Apache
> Ip = line.split()[2]
>
>
> # Ensure length is proper
> if 6 < len(Ip) <= 15:
> # Increase by 1 if Ip exists; else set hit count = 1
> IpHitListing[Ip] = IpHitListing.get(Ip, 0) + 1
> return IpHitListing
>
>def main():
> import sys
> if len(sys.argv)>=2:
> HitsDictionary = CalculateSquidIpHits(sys.argv[1])
> print HitsDictionary
> else:
> print "Usage: CalculateSquidIpHits [Logfile]"
>
>if __name__ == '__main__':
> main()
>
>
>***************************
>* Adam Getchell [EMAIL PROTECTED]
>* System Architect/Programmer (530) 752-1584
>* Human Resources Information Systems http://www.hr.ucdavis.edu/
>***************************
>"Invincibility is in oneself, vulnerability in the opponent." -- Sun Tzu
>
>_______________________________________________
>ActivePython mailing list
>[EMAIL PROTECTED]
>To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
>Other options: http://listserv.ActiveState.com/mailman/listinfo/ActivePython
>_______________________________________________
>ActivePython mailing list
>[EMAIL PROTECTED]
>To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
>Other options: http://listserv.ActiveState.com/mailman/listinfo/ActivePython
>
- Calculating Squid Hits per IP address Adam Getchell
- Re: Calculating Squid Hits per IP address Jeff Shannon
- RE: Calculating Squid Hits per IP address Chad Maine
- Re: Calculating Squid Hits per IP address Kevin Murphy
- Re: Calculating Squid Hits per IP address Jeff Shannon
- RE: Calculating Squid Hits per IP address Adam Getchell
