On 11/07/2011 02:43 PM, Juan Declet-Barreto wrote:
Hi,

Can anyone provide links or basic info on memory management, variable 
dereferencing, or the like? I have a script that traverses a file structure 
using os.walk and adds directory names to a list. It works for a small number 
of directories, but when I set it loose on a directory with thousands of 
dirs/subdirs, it crashes the DOS session and also the Python shell (when I run 
it from the shell).  This makes it difficult to figure out if the allocated 
memory or heap space for the DOS/shell session have overflown, or why it is 
crashing.

Juan Declet-Barreto [ciId:image001.png@01CC9D4A.CB6B9D70]
I don't have any reference to point you to, but CPython's memory management is really pretty simple. However, it's important to tell us the build of Python, as there are several, with very different memory rules. For example Jython, which is Python running in a Java VM, lets the java garbage collector handle things, and it's entirely different.

Likewise, the OS may be relevant. You're using Windows-kind of terminology, but that doesn't prove you're on Windows, nor does it say what version.

Assuming 32 bit CPython 2.7 on XP, the principles are simple. When an object is no longer accessible, it gets garbage collected*. So if you build a list inside a function, and the only reference is from a function's local var, then the whole list will be freed when the function exits. The mistakes many people make are unnecessarily using globals, and using lists when iterables would work just as well.

The tool on XP to tell how much memory is in use is the task manager. As you point out, its hard to catch a short-running app in the act. So you want to add a counter to your code (global), and see how high it gets when it crashes. Then put a test in your code for the timer value, and do an "input" somewhat earlier.

At that point, see how much memory the program is actually using.

Now, when an object is freed, a new one of the same size is likely to immediately re-use the space. But if they're all different sizes, it's somewhat statistical. You might get fragmentation, for example. When Python's pool is full, it asks the OS for more (perhaps using swap space), but I don't think it ever gives it back. So your memory use is a kind of ceiling case. That's why it's problematic to build a huge data structure, and then walk through it, then delete it. The script will probably continue to show the peak memory use, indefinitely.

* (technically, this is ref counted. When the ref reaches zero the object is freed. Real gc is more lazy scanning)


--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to