On 11/07/2011 02:43 PM, Juan Declet-Barreto wrote:
Hi,
Can anyone provide links or basic info on memory management, variable
dereferencing, or the like? I have a script that traverses a file structure
using os.walk and adds directory names to a list. It works for a small number
of directories, but when I set it loose on a directory with thousands of
dirs/subdirs, it crashes the DOS session and also the Python shell (when I run
it from the shell). This makes it difficult to figure out if the allocated
memory or heap space for the DOS/shell session have overflown, or why it is
crashing.
Juan Declet-Barreto [ciId:image001.png@01CC9D4A.CB6B9D70]
I don't have any reference to point you to, but CPython's memory
management is really pretty simple. However, it's important to tell us
the build of Python, as there are several, with very different memory
rules. For example Jython, which is Python running in a Java VM, lets
the java garbage collector handle things, and it's entirely different.
Likewise, the OS may be relevant. You're using Windows-kind of
terminology, but that doesn't prove you're on Windows, nor does it say
what version.
Assuming 32 bit CPython 2.7 on XP, the principles are simple. When an
object is no longer accessible, it gets garbage collected*. So if you
build a list inside a function, and the only reference is from a
function's local var, then the whole list will be freed when the
function exits. The mistakes many people make are unnecessarily using
globals, and using lists when iterables would work just as well.
The tool on XP to tell how much memory is in use is the task manager.
As you point out, its hard to catch a short-running app in the act. So
you want to add a counter to your code (global), and see how high it
gets when it crashes. Then put a test in your code for the timer value,
and do an "input" somewhat earlier.
At that point, see how much memory the program is actually using.
Now, when an object is freed, a new one of the same size is likely to
immediately re-use the space. But if they're all different sizes, it's
somewhat statistical. You might get fragmentation, for example. When
Python's pool is full, it asks the OS for more (perhaps using swap
space), but I don't think it ever gives it back. So your memory use is
a kind of ceiling case. That's why it's problematic to build a huge
data structure, and then walk through it, then delete it. The script
will probably continue to show the peak memory use, indefinitely.
* (technically, this is ref counted. When the ref reaches zero the
object is freed. Real gc is more lazy scanning)
--
http://mail.python.org/mailman/listinfo/python-list