On Tue, 16 Dec 2008 12:07:14 -0300, Federico Moreira wrote: > Hi all, > > Im parsing a 4.1GB apache log to have stats about how many times an ip > request something from the server. > > The first design of the algorithm was > > for line in fileinput.input(sys.argv[1:]): > ip = line.split()[0] > if match_counter.has_key(ip): > match_counter[ip] += 1 > else: > match_counter[ip] = 1
nitpick: dict.has_key is usually replaced with if ip in match_counter: ... also, after investigating your code further, I see that you've unnecessarily used generators, the first code is simpler and you've not avoided any creation of huge intermediate list by using the generator this way. You won't get any performance improvement with this, and instead get a performance hit due to function overhead and name look up. -- http://mail.python.org/mailman/listinfo/python-list