Re: urllib2 spinning CPU on read
I didn't try looking at your example, but I think it's likely a bug both in that site's HTTP server and in httplib. If it's the same one I saw, it's already reported, but nobody fixed it yet. http://python.org/sf/1411097 John Thanks. I tried the example in the link you gave, and it appears to be the same behavior. Do you have any suggestions on how I could avoid this in the meantime? -- http://mail.python.org/mailman/listinfo/python-list
urllib2 spinning CPU on read
Hello All, I've ran into this problem on several sites where urllib2 will hang (using all the CPU) trying to read a page. I was able to reproduce it for one particular site. I'm using python 2.4 import urllib2 url = 'http://www.wautomas.info' request = urllib2.Request(url) opener = urllib2.build_opener() result = opener.open(request) data = result.read() It never returns from this read call. I did some profiling to try and see what was going on and make sure it wasn't my code. There was a huge number of calls to (and amount of time spent in) socket.py:315(readline) and to recv. A large amount of time was also spent in httplib.py:482(_read_chunked). Here's the significant part of the statistics: 32564841 function calls (32563582 primitive calls) in 545.250 CPU seconds Ordered by: internal time List reduced from 416 to 50 due to restriction 50 ncalls tottime percall cumtime percall filename:lineno(function) 10844775 233.9200.000 447.4400.000 socket.py:315(readline) 10846078 152.4300.000 152.4300.000 :0(recv) 3 97.330 32.443 544.730 181.577 httplib.py:482(_read_chunked) 10844812 61.0900.000 61.0900.000 :0(join) Also, where should I go to see if something like this has already been reported as a bug? Thanks for any help you can give me. -- http://mail.python.org/mailman/listinfo/python-list
don't need dictionary's keys - hash table?
Hello, I am using some very large dictionaries with keys that are long strings (urls). For a large dictionary these keys start to take up a significant amount of memory. I do not need access to these keys -- I only need to be able to retrieve the value associated with a certain key, so I do not want to have the keys stored in memory. Could I just hash() the url strings first and use the resulting integer as the key? I think what I'm after here is more like a tradition hash table. If I do it this way am I going to get the memory savings I am after? Will the hash function always generate unique keys? Also, would the same technique work for a set? Any other thoughts or considerations are appreciated. Thank You. -- http://mail.python.org/mailman/listinfo/python-list
Re: don't need dictionary's keys - hash table?
[EMAIL PROTECTED] wrote: Hello, I am using some very large dictionaries with keys that are long strings (urls). For a large dictionary these keys start to take up a significant amount of memory. I do not need access to these keys -- I only need to be able to retrieve the value associated with a certain key, so I do not want to have the keys stored in memory. Could I just hash() the url strings first and use the resulting integer as the key? I think what I'm after here is more like a tradition hash table. If I do it this way am I going to get the memory savings I am after? Will the hash function always generate unique keys? Also, would the same technique work for a set? I just realized that of course the hash is not always going to be unique, so this wouldn't really work. And it seems a hash table would still need to store the keys (as strings) so that string comparisons can be done when a collision occurs. I guess there's no avoiding storing they keys? -- http://mail.python.org/mailman/listinfo/python-list
Re: don't need dictionary's keys - hash table?
depending on your application, a bloom filter might be a good enough: http://en.wikipedia.org/wiki/Bloom_filter Thanks (everyone) for the comments. I like the idea of the bloom filter or using an md5 hash, since a rare collision will not be a show-stopper in my case. -- http://mail.python.org/mailman/listinfo/python-list