Re: Tracking down memory leaks?
On 12 Feb 2006 05:11:02 -0800, rumours say that MKoool [EMAIL PROTECTED] might have written: I have an application with one function called compute, which given a filename, goes through that file and performs various statistical analyses. It uses arrays extensively and loops alot. it prints the results of it's statistical significance tests to standard out. Since the compute function returns and I think no variables of global scope are being used, I would think that when it does, all memory returns back to the operating system. Would your program work if you substituted collections.deque for the arrays (did you mean array.arrays or lists?)? Please test. Instead, what I see is that every iteration uses several megs more. For example, python uses 52 megs when starting out, it goes through several iterations and I'm suddenly using more than 500 megs of ram. If your algorithms can work with the collections.deque container, can you please check that the memory use pattern changes? Does anyone have any pointers on how to figure out what I'm doing wrong? I suspect that you have more than one large arrays (lists?) that continuously grow. It would be useful if you ran your program on a fairly idle machine and had a way to see if the consumed memory seems to be swapped out without being swapped in eventually. -- TZOTZIOY, I speak England very best. Dear Paul, please stop spamming us. The Corinthians -- http://mail.python.org/mailman/listinfo/python-list
Tracking down memory leaks?
I have an application with one function called compute, which given a filename, goes through that file and performs various statistical analyses. It uses arrays extensively and loops alot. it prints the results of it's statistical significance tests to standard out. Since the compute function returns and I think no variables of global scope are being used, I would think that when it does, all memory returns back to the operating system. Instead, what I see is that every iteration uses several megs more. For example, python uses 52 megs when starting out, it goes through several iterations and I'm suddenly using more than 500 megs of ram. Does anyone have any pointers on how to figure out what I'm doing wrong? Thanks, mohan -- http://mail.python.org/mailman/listinfo/python-list
Re: Tracking down memory leaks?
Em Dom, 2006-02-12 às 05:11 -0800, MKoool escreveu: I have an application with one function called compute, which given a filename, goes through that file and performs various statistical analyses. It uses arrays extensively and loops alot. it prints the results of it's statistical significance tests to standard out. Since the compute function returns and I think no variables of global scope are being used, I would think that when it does, all memory returns back to the operating system. Instead, what I see is that every iteration uses several megs more. For example, python uses 52 megs when starting out, it goes through several iterations and I'm suddenly using more than 500 megs of ram. Does anyone have any pointers on how to figure out what I'm doing wrong? Have you tried to force a garbage collection? Try, for example, running gc.collect() everytime the function returns. See http://www.python.org/doc/current/lib/module-gc.html for more details. Thanks, mohan Cya, Felipe. -- Quem excele em empregar a força militar subjulga os exércitos dos outros povos sem travar batalha, toma cidades fortificadas dos outros povos sem as atacar e destrói os estados dos outros povos sem lutas prolongadas. Deve lutar sob o Céu com o propósito primordial da 'preservação'. Desse modo suas armas não se embotarão, e os ganhos poderão ser preservados. Essa é a estratégia para planejar ofensivas. -- Sun Tzu, em A arte da guerra -- http://mail.python.org/mailman/listinfo/python-list
Re: Tracking down memory leaks?
I *think* Python uses reference counting for garbage collection. I've heard talk of people wanting to change this (to mark and sweep?). Anyway, Python stores a counter with each object. Everytime you make a reference to an object this counter is increased. Everytime a pointer to the object is deleteted or reassigned the counter is decreased. When the counter reaches zero the object is freed from memory. A flaw with this algorithm is that if you create a circular reference the object will never be freed. A linked list where the tail points to the head will have a reference count of 1 for each node, after the head pointer is deleted. So the list is never freed. Make sure you are not creating a circular reference. Something like this: a = [1, 2, 3, 4, 5, 6] b = ['a', 'b', 'c', 'd'] c = [10, 20, 30, 40] a[3] = b b[1] = c c[0] = a the last assignment creates a circular refence, and until it is removed, non of these objects will be removed from memory. I'm not an expert on python internals, and it is possible that they have a way of checking for cases like this. I think the deepcopy method catches this, but I don't *think* basic garbage collection look for this sort of thing. David -- http://mail.python.org/mailman/listinfo/python-list
Re: Tracking down memory leaks?
On Sun, 12 Feb 2006 05:11:02 -0800, MKoool wrote: I have an application with one function called compute, which given a filename, goes through that file and performs various statistical analyses. It uses arrays extensively and loops alot. it prints the results of it's statistical significance tests to standard out. Since the compute function returns and I think no variables of global scope are being used, I would think that when it does, all memory returns back to the operating system. I may be mistaken, and if so I will welcome the correction, but Python does not return memory to the operating system until it terminates. Objects return memory to Python when they are garbage collected, but not the OS. Instead, what I see is that every iteration uses several megs more. For example, python uses 52 megs when starting out, it goes through several iterations and I'm suddenly using more than 500 megs of ram. Does anyone have any pointers on how to figure out what I'm doing wrong? How big is the file you are reading in? If it is (say) 400 MB, then it is hardly surprising that you will be using 500MB of RAM. If the file is 25K, that's another story. How are you storing your data while you are processing it? I'd be looking for hidden duplicates. I suggest you re-factor your program. Instead of one giant function, break it up into lots of smaller ones, and call them from compute. Yes, this will use a little more memory, which might sound counter-productive at the moment when you are trying to use less memory, but in the long term it will allow your computer to use memory more efficiently (it is easier to page small functions as they are needed than one giant function), and it will be much easier for you to write and debug when you can isolate individual pieces of the task in individual functions. Re-factoring will have another advantage: you might just find the problem on your own. -- Steven. -- http://mail.python.org/mailman/listinfo/python-list
Re: Tracking down memory leaks?
MKoool wrote: I have an application with one function called compute, which given a filename, goes through that file and performs various statistical analyses. It uses arrays extensively and loops alot. it prints the results of it's statistical significance tests to standard out. Since the compute function returns and I think no variables of global scope are being used, I would think that when it does, all memory returns back to the operating system. Instead, what I see is that every iteration uses several megs more. For example, python uses 52 megs when starting out, it goes through several iterations and I'm suddenly using more than 500 megs of ram. Does anyone have any pointers on how to figure out what I'm doing wrong? Are you importing any third party modules? It's not unheard of that someone else's code has a memory leak. Thanks, mohan -- http://mail.python.org/mailman/listinfo/python-list
Re: Tracking down memory leaks?
On Sun, 12 Feb 2006 06:01:55 -0800, [EMAIL PROTECTED] wrote: I *think* Python uses reference counting for garbage collection. Yes it does, with special code for detecting and collecting circular references. I've heard talk of people wanting to change this (to mark and sweep?). Reference counting is too simple to be cool *wink* [snip] Make sure you are not creating a circular reference. Something like this: a = [1, 2, 3, 4, 5, 6] b = ['a', 'b', 'c', 'd'] c = [10, 20, 30, 40] a[3] = b b[1] = c c[0] = a the last assignment creates a circular refence, and until it is removed, non of these objects will be removed from memory. I believe Python now handles this sort of situation very well now. I'm not an expert on python internals, and it is possible that they have a way of checking for cases like this. I think the deepcopy method catches this, but I don't *think* basic garbage collection look for this sort of thing. deepcopy has nothing to do with garbage collection. This is where you use deepcopy: py a = [2, 4, [0, 1, 2], 8] # note the nested list py b = a # b and a both are bound to the same list py b is a# b is the same list as a, not just a copy True py c = a[:] # make a shallow copy of a py c is a# c is a copy of a, not a itself False py c[2] is a[2] # but both a and c include the same nested list True What if you want c to include a copy of the nested list? That's where you use deepcopy: py d = copy.deepcopy(a) py d[2] is a[2] False -- Steven. -- http://mail.python.org/mailman/listinfo/python-list
Re: Tracking down memory leaks?
[EMAIL PROTECTED] wrote: MKoool wrote: I have an application with one function called compute, which given a filename, goes through that file and performs various statistical analyses. It uses arrays extensively and loops alot. it prints the results of it's statistical significance tests to standard out. Since the compute function returns and I think no variables of global scope are being used, I would think that when it does, all memory returns back to the operating system. Instead, what I see is that every iteration uses several megs more. For example, python uses 52 megs when starting out, it goes through several iterations and I'm suddenly using more than 500 megs of ram. Does anyone have any pointers on how to figure out what I'm doing wrong? Are you importing any third party modules? It's not unheard of that someone else's code has a memory leak. - sounds like you're working with very large, very sparse matrices, running LSI/SVD or a PCA/covariance analysis, something like that. So it's a specialized problem, you need to specify what libs you're using, what your platform / O/S is, py release, how you installed it, details about C estensions, pyrex/psyco/swig, the more info you supply, the mroe you get back. - be aware there's wrong ways to measure memory, e.g. this long thread: http://mail.python.org/pipermail/python-list/2005-November/310121.html -- http://mail.python.org/mailman/listinfo/python-list
Re: Tracking down memory leaks?
I'm not an expert on python internals, and it is possible that they have a way of checking for cases like this. I think the deepcopy method catches this, but I don't *think* basic garbage collection look for this sort of thing. deepcopy has nothing to do with garbage collection. This is where you use deepcopy: py a = [2, 4, [0, 1, 2], 8] # note the nested list py b = a # b and a both are bound to the same list py b is a# b is the same list as a, not just a copy True py c = a[:] # make a shallow copy of a py c is a# c is a copy of a, not a itself False py c[2] is a[2] # but both a and c include the same nested list True What if you want c to include a copy of the nested list? That's where you use deepcopy: py d = copy.deepcopy(a) py d[2] is a[2] False What I ment is that deepcopy is recursive, and if you have a circular reference in your data structure a recursive copy will become infinite. I think deepcopy has the ability to detect this situation. So if it could be detected for deepcopy, I don't see why it could not be detected for garbage collection purposes. David -- http://mail.python.org/mailman/listinfo/python-list
Re: Tracking down memory leaks?
On 12 Feb 2006 10:13:02 -0800, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: I'm not an expert on python internals, and it is possible that they have a way of checking for cases like this. I think the deepcopy method catches this, but I don't *think* basic garbage collection look for this sort of thing. deepcopy has nothing to do with garbage collection. This is where you use deepcopy: py a = [2, 4, [0, 1, 2], 8] # note the nested list py b = a # b and a both are bound to the same list py b is a# b is the same list as a, not just a copy True py c = a[:] # make a shallow copy of a py c is a# c is a copy of a, not a itself False py c[2] is a[2] # but both a and c include the same nested list True What if you want c to include a copy of the nested list? That's where you use deepcopy: py d = copy.deepcopy(a) py d[2] is a[2] False What I ment is that deepcopy is recursive, and if you have a circular reference in your data structure a recursive copy will become infinite. I think deepcopy has the ability to detect this situation. So if it could be detected for deepcopy, I don't see why it could not be detected for garbage collection purposes. It's moot, since the garbage collector can collect cycles: Make a cycle: a = [] b = [] a.append(b) b.append(a) Get rid of all references to all objects participating in it: del a, b Explicitly invoke the garbage collector in order to observe the number of objects it cleans up: gc.collect() 2 Jean-Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: Tracking down memory leaks?
Hi Steven, Is there any way for making Python return memory no longer needed to the OS? Cases may arise where you indeed need a big memory block temporarily without being able to split it up into smaller chunks. Thank you. malv Steven D'Aprano wrote: Objects return memory to Python when they are garbage collected, but not the OS. -- http://mail.python.org/mailman/listinfo/python-list
Re: Tracking down memory leaks?
malv: Is there any way for making Python return memory no longer needed to the OS? Cases may arise where you indeed need a big memory block temporarily without being able to split it up into smaller chunks. That's not really necessary. On any decent OS it's just unused address space, that doesn't consume any physical memory. And when your process runs out of address space, you should program more carefully :-) -- René Pijlman -- http://mail.python.org/mailman/listinfo/python-list
Re: Tracking down memory leaks?
MKoool wrote: I have an application with one function called compute, which given a filename, goes through that file and performs various statistical analyses. It uses arrays extensively and loops alot. it prints the results of it's statistical significance tests to standard out. Since the compute function returns and I think no variables of global scope are being used, I would think that when it does, all memory returns back to the operating system. Instead, what I see is that every iteration uses several megs more. For example, python uses 52 megs when starting out, it goes through several iterations and I'm suddenly using more than 500 megs of ram. Does anyone have any pointers on how to figure out what I'm doing wrong? if gc.collect() doesn't help: maybe objects of extension libs are not freed correctly. And Python has a real skeleton in the cupboard: a known problem with python object/libs when classes with __del__ are involved ( Once suffered myself from such tremendous unexplainable memory blow up until I found this del gc.garbage[:] remedy: http://www.python.org/doc/current/lib/module-gc.html garbage A list of objects which the collector found to be unreachable but could not be freed (uncollectable objects). By default, this list contains only objects with __del__() methods.3.1Objects that have __del__() methods and are part of a reference cycle cause the entire reference cycle to be uncollectable, including objects not necessarily in the cycle but reachable only from it. Python doesn't collect such cycles automatically because, in general, it isn't possible for Python to guess a safe order in which to run the __del__() methods. If you know a safe order, you can force the issue by examining the garbage list, and explicitly breaking cycles due to your objects within the list. Note that these objects are kept alive even so by virtue of being in the garbage list, so they should be removed from garbage too. For example, after breaking cycles, do del gc.garbage[:] to empty the list. It's generally better to avoid the issue by not creating cycles containing objects with __del__() methods, and garbage can be examined in that case to verify that no such cycles are being created. Robert -- http://mail.python.org/mailman/listinfo/python-list
Re: Tracking down memory leaks?
How big is the file you are reading in? If it is (say) 400 MB, then it is hardly surprising that you will be using 500MB of RAM. If the file is 25K, that's another story. Actually, I am downloading the matrix data from a file on a server on the net using urllib2, and then I am running several basic stats on it using some functions that i get from matplotlib. Most are statistical functions I run on standard vectors, such as standard deviation, mean, median, etc. I do then loop through various matrix items, and then based on a set of criteria, I attempt to perform a sort of linear regression model using a few loops on the vectors. How are you storing your data while you are processing it? I'd be looking for hidden duplicates. I am storing basically everything as a set of vectors. For example, I would have one vector for my X-axis, time. The other variables are the number of units sold and the total aggregate revenue from selling all units. I am wondering if it's actually urllib2 that is messing me up. It could be matplotlib as well, although I doubt it since I do not use matplotlib unless the statistical significance test I produce indicates a high level of strength (very rare), indicating to me that the company has a winning product. -- http://mail.python.org/mailman/listinfo/python-list
Re: Tracking down memory leaks?
Steven D'Aprano [EMAIL PROTECTED] writes: On Sun, 12 Feb 2006 05:11:02 -0800, MKoool wrote: [...] I may be mistaken, and if so I will welcome the correction, but Python does not return memory to the operating system until it terminates. Objects return memory to Python when they are garbage collected, but not the OS. [...] http://groups.google.com/group/comp.lang.python/browse_frm/thread/ce5c44915f43d6d/2bea1c569a65e13e John -- http://mail.python.org/mailman/listinfo/python-list