Bugs item #1208304, was opened at 2005-05-25 05:20 Message generated for change (Comment added) made by holdenweb You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1208304&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Extension Modules Group: Python 2.4 Status: Open Resolution: None Priority: 5 Submitted By: Petr Toman (manekcz) Assigned to: Nobody/Anonymous (nobody) Summary: urllib2's urlopen() method causes a memory leak Initial Comment: It seems that the urlopen(url) methd of the urllib2 module leaves some undestroyable objects in memory. Please try the following code: ========================== if __name__ == '__main__': import urllib2 a = urllib2.urlopen('http://www.google.com') del a # or a = None or del(a) # check memory on memory leaks import gc gc.set_debug(gc.DEBUG_SAVEALL) gc.collect() for it in gc.garbage: print it ========================== In our code, we're using lots of urlopens in a loop and the number of unreachable objects grows beyond all limits :) We also tried a.close() but it didn't help. You can also try the following: ========================== def print_unreachable_len(): # check memory on memory leaks import gc gc.set_debug(gc.DEBUG_SAVEALL) gc.collect() unreachableL = [] for it in gc.garbage: unreachableL.append(it) return len(str(unreachableL)) if __name__ == '__main__': print "at the beginning", print_unreachable_len() import urllib2 print "after import of urllib2", print_unreachable_len() a = urllib2.urlopen('http://www.google.com') print 'after urllib2.urlopen', print_unreachable_len() del a print 'after del', print_unreachable_len() ========================== We're using WindowsXP with latest patches, Python 2.4 (ActivePython 2.4 Build 243 (ActiveState Corp.) based on Python 2.4 (#60, Nov 30 2004, 09:34:21) [MSC v.1310 32 bit (Intel)] on win32). ---------------------------------------------------------------------- Comment By: Steve Holden (holdenweb) Date: 2005-10-14 00:13 Message: Logged In: YES user_id=88157 The Windows 2.4.1 build doesn't show this error, but the Cygwin 2.4.1 build does still have uncollectable objects after a urllib2.urlopen(), so there may be a platform dependency here. No 2.4.2 on Cygwin yet, so nothing conclusive as lsof isn't available. ---------------------------------------------------------------------- Comment By: Brian Wellington (bwelling) Date: 2005-08-15 14:13 Message: Logged In: YES user_id=63197 The real problem we were seeing wasn't the memory leak, it was a file descriptor leak. Leaking references within the interpreter is bad, but the garbage collector will eventually notice that the system is out of memory and clean them. Leaking file descriptors is much worse, as gc won't be triggered when the process has reached it's limit, and the process will start failing with "Too many file descriptors". To easily show this problem, run the following from an interactive python interpreter: import urllib2 f = urllib2.urlopen('http://www.google.com') f.close() and from another window, run "lsof -p <pid of interpreter>". It should show a TCP socket in CLOSE_WAIT, which means the file descriptor is still open. I'm seeing weirdness on Fedora Core 4 today that I didn't see last week where after a few seconds, the file descriptor is listed as "can't identify protocol" instead of TCP, but that's not too relevant, since it's still open. Repeating the urllib2.urlopen()/close() pairs of statements in the interpreter will cause more fds to be leaked, which can also be seen by lsof. ---------------------------------------------------------------------- Comment By: Sean Reifschneider (jafo) Date: 2005-08-12 18:30 Message: Logged In: YES user_id=81797 I've just tried it again using the current CVS version as well as the version installed with Fedora Core 4, and in both cases I was able to run over 100,000 retrievals of http://127.0.0.1/test.html and http://127.0.0.1/google.html. test.html is just "it works" and google.html was generated with "wget -O google.html http://google.com/". I was able to reproduce this before, but now am not. My urllib2.py includes the r.recv=r.read line. I have upgraded from FC3 to FC4, could this be something related to an OS or library interaction? I was going to try to confirm the last message, but now I can't reproduce the failure. ---------------------------------------------------------------------- Comment By: Brian Wellington (bwelling) Date: 2005-08-11 22:22 Message: Logged In: YES user_id=63197 We just ran into this same problem, and worked around it by simply removing the 'r.recv = r.read' line in urllib2.py, and creating a recv alias to the read function in HTTPResponse ('recv = read' in the class). Not sure if this is the best solution, but it seems to work. ---------------------------------------------------------------------- Comment By: Sean Reifschneider (jafo) Date: 2005-06-28 23:52 Message: Logged In: YES user_id=81797 I give up, this code is kind of a maze of twisty little passages. I did try doing "a.fp.close()" and that didn't seem to help at all. Couldn't really make any progress on that though. I also tried doing a "if a.headers.fp: a.headers.fp.close()", which didn't do anything noticable. ---------------------------------------------------------------------- Comment By: Sean Reifschneider (jafo) Date: 2005-06-28 23:27 Message: Logged In: YES user_id=81797 I can reproduce this in both the python.org 2.4 RPM and in a freshly built copy from CVS. If I run a few thousand urlopen()s, I get: Traceback (most recent call last): File "/tmp/mt", line 26, in ? File "/tmp/python/dist/src/Lib/urllib2.py", line 130, in urlopen File "/tmp/python/dist/src/Lib/urllib2.py", line 361, in open File "/tmp/python/dist/src/Lib/urllib2.py", line 379, in _open File "/tmp/python/dist/src/Lib/urllib2.py", line 340, in _call_chain File "/tmp/python/dist/src/Lib/urllib2.py", line 1026, in http_open File "/tmp/python/dist/src/Lib/urllib2.py", line 1001, in do_open urllib2.URLError: <urlopen error (24, 'Too many open files')> Even if I do a a.close(). I'll investigate a bit further. Sean ---------------------------------------------------------------------- Comment By: A.M. Kuchling (akuchling) Date: 2005-06-01 19:13 Message: Logged In: YES user_id=11375 Confirmed. The objects involved seem to be an HTTPResponse and the socket._fileobject wrapper; the assignment 'r.recv=r.read' around line 1013 of urllib2.py seems to be critical to creating the cycle. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1208304&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com