Re: "Help needed - I don't understand how Python manages memory"

2008-04-22 Thread Lie
On Apr 21, 1:14 am, "Hank @ITGroup" <[EMAIL PROTECTED]> wrote:
> Christian Heimes wrote:
> > Gabriel Genellina schrieb:
>
> >> Apart from what everyone has already said, consider that FreqDist may 
> >> import other modules, store global state, create other objects... whatever.
> >> Pure python code should not have any memory leaks (if there are, it's a 
> >> bug in the Python interpreter). Not-carefully-written C extensions may 
> >> introduce memory problems.
>
> > Pure Python code can cause memory leaks. No, that's not a bug in the
> > interpreter but the fault of the developer. For example code that messes
> > around with stack frames and exception object can cause nasty reference
> > leaks.
>
> > Christian
>
> In order to deal with 400 thousands texts consisting of 80 million
> words, and huge sets of corpora , I have to be care about the memory
> things. I need to track every word's behavior, so there needs to be as
> many word-objects as words.
> I am really suffering from the memory problem, even 4G  memory space can
> not survive... Only 10,000 texts can kill it in 2 minutes.
> By the way, my program has been optimized to ``del`` the objects after
> traversing, in order not to store the information in memory all the time.

May we be explained a little further on what you're doing on the 80
million words? Perhaps we could help you better the design since, as
Christian Heimes has said, the 80 million words strains present day
computers to hold on memory all at once as it requires 500 MBs to hold
80 million for 6 ASCII letters words. If you're using Unicode, this
number may double or quadruple. A better solution may be achieved by
loading parts of the text required and process it using generators or
to index the words, it may be slower (or even faster as the OS
wouldn't need to allocate as much memory) but that's a tradeoff you
should decide on.
--
http://mail.python.org/mailman/listinfo/python-list


Re: "Help needed - I don't understand how Python manages memory"

2008-04-21 Thread sturlamolden
On Apr 21, 4:09 am, "Gabriel Genellina" <[EMAIL PROTECTED]>
wrote:

> I'm not sure if this will help the OP at all - going into a world of dangling 
> pointers, keeping track of ownership, releasing memory by hand... One of the 
> good things of Python is automatic memory management. Ensuring that all 
> references to an object are released (the standard Python way) is FAR easier 
> than doing all that by hand.

The owner was complaining he could not manually release memory using
del, as if it was Python's equivalent of a C++ delete[] operator. I
showed him how it could be done. I did not say manual memory
management is a good idea.






-- 
http://mail.python.org/mailman/listinfo/python-list


Re: "Help needed - I don't understand how Python manages memory"

2008-04-21 Thread Andrew MacIntyre
Hank @ITGroup wrote:

> In order to deal with 400 thousands texts consisting of 80 million 
> words, and huge sets of corpora , I have to be care about the memory 
> things. I need to track every word's behavior, so there needs to be as 
> many word-objects as words.
> I am really suffering from the memory problem, even 4G  memory space can 
> not survive... Only 10,000 texts can kill it in 2 minutes.
> By the way, my program has been optimized to ``del`` the objects after 
> traversing, in order not to store the information in memory all the time.

In addition to all the other advice you've been given, I've found it can
pay dividends in memory consumption when each instance of a value (such
as a string) references only 1 object.  This is often referred to as
"interning".  Automatic interning is only performed for a small subset
of possibilities.

For example:

 >>> z1 = 10
 >>> z2 = 10
 >>> z1 is z2
True
 >>> z1 = 1000
 >>> z2 = 1000
 >>> z1 is z2
False
 >>> z1 = 'test'
 >>> z2 = 'test'
 >>> z1 is z2
True
 >>> z1 = 'this is a test string pattern'
 >>> z2 = 'this is a test string pattern'
 >>> z1 is z2
False

Careful use of interning can get a double boost: cutting memory 
consumption and allowing comparisons to short circuit on identity.  It
does cost in maintaining the dictionary that interns the objects though,
and tracking reference counts can be much harder.

-- 
-
Andrew I MacIntyre "These thoughts are mine alone..."
E-mail: [EMAIL PROTECTED]  (pref) | Snail: PO Box 370
[EMAIL PROTECTED] (alt) |Belconnen ACT 2616
Web:http://www.andymac.org/   |Australia
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: "Help needed - I don't understand how Python manages memory"

2008-04-20 Thread Gabriel Genellina
En Sun, 20 Apr 2008 17:19:43 -0300, sturlamolden <[EMAIL PROTECTED]> escribió:

> On Apr 20, 9:09 pm, "Hank @ITGroup" <[EMAIL PROTECTED]> wrote:
>
>> Could you please give us some clear clues to obviously call python to
>> free memory. We want to control its gc operation handily as we were
>> using J**A.
>
> If you want to get rid of a Python object, the only way to do that is
> to get rid of every reference to the object. This is no different from
> Java.
>
> If you just want to deallocate and allocate memory to store text,
> Python lets you do that the same way as C:

I'm not sure if this will help the OP at all - going into a world of dangling 
pointers, keeping track of ownership, releasing memory by hand... One of the 
good things of Python is automatic memory management. Ensuring that all 
references to an object are released (the standard Python way) is FAR easier 
than doing all that by hand.

-- 
Gabriel Genellina

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: "Help needed - I don't understand how Python manages memory"

2008-04-20 Thread Gabriel Genellina
En Sun, 20 Apr 2008 17:23:32 -0300, Christian Heimes <[EMAIL PROTECTED]> 
escribió:

> Martin v. Löwis schrieb:
>> Can you give an example, please?
>
> http://trac.edgewall.org/ contains at least one example of a reference
> leak. It's holding up the release of 0.11 for a while. *scnr*
>
> The problem is also covered by the docs at
> http://docs.python.org/dev/library/sys.html#sys.exc_info

Ah, you scared me for a while... :)
Holding the traceback from sys.exc_info is not a memory leak, just prevents *a 
lot* of objects reaching refcount 0 as long as the execution frames are 
refering to them. Don't store a traceback more than needed and it should be 
fine...

-- 
Gabriel Genellina

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: "Help needed - I don't understand how Python manages memory"

2008-04-20 Thread Steve Holden
Hank @ITGroup wrote:
> Steve Holden wrote:
>> You are suffering from a pathological condition yourself: the desire 
>> to optimize performance in an area where you do not have any problems. 
>> I would suggest you just enjoy using Python and then start to ask 
>> these questions again when you have a real issue that's stopping you 
>> from getting real work done.
>>
>> regards
>>  Steve
>>
> Hi, Steve,
> This not simply a pathological condition. My people are keeping trying 
> many ways to have job done, and the memory problem became the focus we 
> are paying attention on at this moment.
> Could you please give us some clear clues to obviously call python to 
> free memory. We want to control its gc operation handily as we were 
> using J**A.

Well, now you've told us a little more about your application I can 
understand that you need to be careful with memory allocation.

The best thing you can do is to ensure that your program is reasonably 
decomposed into functions. That way the local namespaces have limited 
lifetimes, and only the values that they return are injected into the 
environment. You also need to be careful in exception processing that 
you do not cause a reference to the stack frame to be retained, as that 
can be a fruitful source of references to objects, rendering them 
non-collectable.

You appear to be stressing the limits of a single program under 
present-day memory constraints. I am afraid that no matter how carefully 
you manage object references, any difference you can make is likely to 
be lost in the noise as far as memory utilization is concerned, and you 
may have to consider using less direct methods of processing your data sets.

The gc module does give you some control over the garbage collector, but 
generally speaking most programs don't even need that much control.

regards
  Steve
-- 
Steve Holden+1 571 484 6266   +1 800 494 3119
Holden Web LLC  http://www.holdenweb.com/
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: "Help needed - I don't understand how Python manages memory"

2008-04-20 Thread Martin v. Löwis
> http://trac.edgewall.org/ contains at least one example of a reference
> leak. It's holding up the release of 0.11 for a while. *scnr*

All my investigations on possible memory leaks in Trac have only
confirmed that Python does _not_, I repeat, it does *NOT* leak any
memory in Trac.

Instead, what appears as a leak is an unfortunate side effect of
the typical malloc implementation which prevents malloc from returning
memory to the system. The memory hasn't leaked, and is indeed available
for further allocations by trac.

> The problem is also covered by the docs at
> http://docs.python.org/dev/library/sys.html#sys.exc_info

Ah, but that's not a *reference* leak. If Python (or an extension
module) contains a reference leak, that's a bug.

A reference leak is a leak where the reference counter is increased
without ever being decreased (i.e. without the application having
a chance to ever decrease it correctly). In this case, it's just a
cyclic reference, which will get released whenever the garbage
collector runs next, so it's not a memory leak.

Regards,
Martin
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: "Help needed - I don't understand how Python manages memory"

2008-04-20 Thread Christian Heimes
Martin v. Löwis schrieb:
> Can you give an example, please?

http://trac.edgewall.org/ contains at least one example of a reference
leak. It's holding up the release of 0.11 for a while. *scnr*

The problem is also covered by the docs at
http://docs.python.org/dev/library/sys.html#sys.exc_info

Christian

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: "Help needed - I don't understand how Python manages memory"

2008-04-20 Thread sturlamolden
On Apr 20, 9:09 pm, "Hank @ITGroup" <[EMAIL PROTECTED]> wrote:

> Could you please give us some clear clues to obviously call python to
> free memory. We want to control its gc operation handily as we were
> using J**A.

If you want to get rid of a Python object, the only way to do that is
to get rid of every reference to the object. This is no different from
Java.

If you just want to deallocate and allocate memory to store text,
Python lets you do that the same way as C:


from __future__ import with_statement
import os
from ctypes import c_char, c_char_p, c_long, cdll
from threading import Lock

_libc = cdll.msvcr71 if os.name == 'nt' else cdll.libc
_lock = Lock()

def string_heapalloc(n):
''' allocate a mutable string using malloc '''
with _lock:
malloc = _libc.malloc
malloc.argtypes = [c_long]
malloc.restype = c_char * n
memset = _libc.memset
memset.restype = None
memset.argtypes = [c_char * n, c_char, c_long]
tmp = malloc(n)
memset(tmp,'0',n)
return tmp

def string_heapfree(s):
''' free an allocated string '''
with _lock:
free = _libc.free
free.restype = None
free.argtypes = [c_char_p]
ptr_first_char = c_char_p( s[0] )
free(ptr_first_char)


if __name__ == '__main__':
s = string_heapalloc(1000)
s[:26] = 'abcdefghijklmnopqrstuvwxyz'
print s[:]
string_heapfree(s)

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: "Help needed - I don't understand how Python manages memory"

2008-04-20 Thread Christian Heimes
Hank @ITGroup schrieb:
> In order to deal with 400 thousands texts consisting of 80 million
> words, and huge sets of corpora , I have to be care about the memory
> things. I need to track every word's behavior, so there needs to be as
> many word-objects as words.
> I am really suffering from the memory problem, even 4G  memory space can
> not survive... Only 10,000 texts can kill it in 2 minutes.
> By the way, my program has been optimized to ``del`` the objects after
> traversing, in order not to store the information in memory all the time.

No ordinary system and programming language can hold that much data in
memory at once. Your design is broken; some may call it even insane.

I highly recommend ZODB for your problem. ZODB will allow you to work
with several GB of data in a transaction oriented way without the needs
of an external database server like Postgres or MySQL. ZODB even
supports clustering and mounting of additional database from the same
file system or an external server.

Christian
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: "Help needed - I don't understand how Python manages memory"

2008-04-20 Thread Hank @ITGroup
Steve Holden wrote:
>
> You are suffering from a pathological condition yourself: the desire 
> to optimize performance in an area where you do not have any problems. 
> I would suggest you just enjoy using Python and then start to ask 
> these questions again when you have a real issue that's stopping you 
> from getting real work done.
>
> regards
>  Steve
>
Hi, Steve,
This not simply a pathological condition. My people are keeping trying 
many ways to have job done, and the memory problem became the focus we 
are paying attention on at this moment.
Could you please give us some clear clues to obviously call python to 
free memory. We want to control its gc operation handily as we were 
using J**A.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: "Help needed - I don't understand how Python manages memory"

2008-04-20 Thread Martin v. Löwis
> In order to deal with 400 thousands texts consisting of 80 million
> words, and huge sets of corpora , I have to be care about the memory
> things. I need to track every word's behavior, so there needs to be as
> many word-objects as words.
> I am really suffering from the memory problem, even 4G  memory space can
> not survive... Only 10,000 texts can kill it in 2 minutes.
> By the way, my program has been optimized to ``del`` the objects after
> traversing, in order not to store the information in memory all the time.

It may then well be that your application leaks memory, however, the
examples that you have given so far don't demonstrate that. Most likely,
you still keep references to objects at some point, causing the leak.

It's fairly difficult to determine the source of such a problem.
As a starting point, I recommend to do

print len(gc.get_objects())

several times in the program, to see how the number of (gc-managed)
objects increases. This number should continually grow up, or else
you don't have a memory leak (or one in a C module which would be
even harder to determine).

Then, from time to time, call

import gc
from collections import defaultdict
def classify():
counters = defaultdict(lambda:0)
for o in gc.get_objects():
counters[type(o)] += 1
counters = [(freq, t) for t,freq in counters.items()]
counters.sort()
for freq,t in counters[-10:]:
print t.__name__, freq

a number of times, and see what kind of objects get allocated.

Then, for the most frequent kind of object, investigate whether
any of them "should" have been deleted. If any, try to find
out a) whether the code that should have released them was executed,
and b) why they are still referenced (use gc.get_referrers for that).
And so on.

Regards,
Martin
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: "Help needed - I don't understand how Python manages memory"

2008-04-20 Thread Gabriel Genellina
En Sun, 20 Apr 2008 15:02:37 -0300, Torsten Bronger <[EMAIL PROTECTED]> 
escribió:
> Gabriel Genellina writes:
>> En Sun, 20 Apr 2008 14:43:17 -0300, Christian Heimes <[EMAIL PROTECTED]>
>> escribió:
>>> Gabriel Genellina schrieb:
>>>
 Apart from what everyone has already said, consider that
 FreqDist may import other modules, store global state, create
 other objects... whatever.  Pure python code should not have any
 memory leaks (if there are, it's a bug in the Python
 interpreter). Not-carefully-written C extensions may introduce
 memory problems.
>>>
>>> Pure Python code can cause memory leaks. No, that's not a bug in
>>> the interpreter but the fault of the developer. For example code
>>> that messes around with stack frames and exception object can
>>> cause nasty reference leaks.
>>
>> Ouch!
>> May I assume that code that doesn't use stack frames nor stores
>> references to exception objects/tracebacks is safe?
>
> Circular referencing is no leaking on the C level but in a way it is
> memory leaking, too.

The garbage collector will eventually dispose of the cycle, unless you use 
__del__. Of course it's better if the code itself breaks the cycle when it's 
done using the objects.
The above comment about stack frames and exceptions does not refer to this 
situation, I presume.

-- 
Gabriel Genellina

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: "Help needed - I don't understand how Python manages memory"

2008-04-20 Thread Martin v. Löwis
> Pure Python code can cause memory leaks. No, that's not a bug in the
> interpreter but the fault of the developer. For example code that messes
> around with stack frames and exception object can cause nasty reference
> leaks.

Can you give an example, please?

Regards,
Martin
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: "Help needed - I don't understand how Python manages memory"

2008-04-20 Thread Hank @ITGroup
Christian Heimes wrote:
> Gabriel Genellina schrieb:
>   
>> Apart from what everyone has already said, consider that FreqDist may import 
>> other modules, store global state, create other objects... whatever.
>> Pure python code should not have any memory leaks (if there are, it's a bug 
>> in the Python interpreter). Not-carefully-written C extensions may introduce 
>> memory problems.
>> 
>
> Pure Python code can cause memory leaks. No, that's not a bug in the
> interpreter but the fault of the developer. For example code that messes
> around with stack frames and exception object can cause nasty reference
> leaks.
>
> Christian
>
>   
In order to deal with 400 thousands texts consisting of 80 million 
words, and huge sets of corpora , I have to be care about the memory 
things. I need to track every word's behavior, so there needs to be as 
many word-objects as words.
I am really suffering from the memory problem, even 4G  memory space can 
not survive... Only 10,000 texts can kill it in 2 minutes.
By the way, my program has been optimized to ``del`` the objects after 
traversing, in order not to store the information in memory all the time.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: "Help needed - I don't understand how Python manages memory"

2008-04-20 Thread Torsten Bronger
Hallöchen!

Gabriel Genellina writes:

> En Sun, 20 Apr 2008 14:43:17 -0300, Christian Heimes <[EMAIL PROTECTED]>
> escribió:
>
>> Gabriel Genellina schrieb:
>>
>>> Apart from what everyone has already said, consider that
>>> FreqDist may import other modules, store global state, create
>>> other objects... whatever.  Pure python code should not have any
>>> memory leaks (if there are, it's a bug in the Python
>>> interpreter). Not-carefully-written C extensions may introduce
>>> memory problems.
>>
>> Pure Python code can cause memory leaks. No, that's not a bug in
>> the interpreter but the fault of the developer. For example code
>> that messes around with stack frames and exception object can
>> cause nasty reference leaks.
>
> Ouch!
> May I assume that code that doesn't use stack frames nor stores
> references to exception objects/tracebacks is safe?

Circular referencing is no leaking on the C level but in a way it is
memory leaking, too.

Tschö,
Torsten.

-- 
Torsten Bronger, aquisgrana, europa vetus
  Jabber ID: [EMAIL PROTECTED]
   (See http://ime.webhop.org for further contact info.)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: "Help needed - I don't understand how Python manages memory"

2008-04-20 Thread Gabriel Genellina
En Sun, 20 Apr 2008 14:43:17 -0300, Christian Heimes <[EMAIL PROTECTED]> 
escribió:

> Gabriel Genellina schrieb:
>> Apart from what everyone has already said, consider that FreqDist may import 
>> other modules, store global state, create other objects... whatever.
>> Pure python code should not have any memory leaks (if there are, it's a bug 
>> in the Python interpreter). Not-carefully-written C extensions may introduce 
>> memory problems.
>
> Pure Python code can cause memory leaks. No, that's not a bug in the
> interpreter but the fault of the developer. For example code that messes
> around with stack frames and exception object can cause nasty reference
> leaks.

Ouch!
May I assume that code that doesn't use stack frames nor stores references to 
exception objects/tracebacks is safe?

-- 
Gabriel Genellina

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: "Help needed - I don't understand how Python manages memory"

2008-04-20 Thread Christian Heimes
Gabriel Genellina schrieb:
> Apart from what everyone has already said, consider that FreqDist may import 
> other modules, store global state, create other objects... whatever.
> Pure python code should not have any memory leaks (if there are, it's a bug 
> in the Python interpreter). Not-carefully-written C extensions may introduce 
> memory problems.

Pure Python code can cause memory leaks. No, that's not a bug in the
interpreter but the fault of the developer. For example code that messes
around with stack frames and exception object can cause nasty reference
leaks.

Christian

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: "Help needed - I don't understand how Python manages memory"

2008-04-20 Thread Gabriel Genellina
En Sun, 20 Apr 2008 09:46:37 -0300, Hank @ITGroup <[EMAIL PROTECTED]> escribió:

> ``starting python``# == Windows Task Manager:
> Python.exe  *4,076 *K memory-usage ==
>  >>> st1='abcdefg'*99 # == 10,952 K ==
>  >>> del st1 # == *4,104*K ==
>  >>> st1='abcdefg'*99 # == 10,952 K ==
>  >>> del st1 # == 4,104 K ==
>
>  >>> li = ['abcde']*99  # == 8,024 K ==
>  >>> del li# == *4,108* K ==
>
>  >>> from nltk import FreqDist # == 17,596 ==
>  >>> fd = FreqDist()# == 17,596 ==
>  >>> for i in range(99):fd.inc(i)  # == 53,412 ==
>  >>> del fd   # == *28,780* ==
>  >>> fd2 = FreqDist()   # == 28,780 ==
>  >>> for i in range(99):fd2.inc(i)  # == 53,412 ==
>  >>> del fd2# == 28,780 K ==
>
>  >>> def foo():
> ... fd3 = FreqDist()
> ... for i in range(99):fd3.inc(i)
>
>  >>>  foo() # == *28,788* K ==
>
>  >>> def bar():
> ... fd4 = FreqDist()
> ... for i in range(99):fd4.inc(i)
> ... del fd4
>  # == 28,788 K ==
>  >>> bar() # == 28,788 K ==
>
>
> That is my question, after ``del``, sometimes the memory space returns
> back as nothing happened, sometimes not... ...
> What exactly was happening???

Apart from what everyone has already said, consider that FreqDist may import 
other modules, store global state, create other objects... whatever.
Pure python code should not have any memory leaks (if there are, it's a bug in 
the Python interpreter). Not-carefully-written C extensions may introduce 
memory problems.

-- 
Gabriel Genellina

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: "Help needed - I don't understand how Python manages memory"

2008-04-20 Thread Steve Holden
Hank @ITGroup wrote:
> Apology for the previous offensive title~~
> :)
> Thanks, Rintsch, Arnaud and Daniel, for replying so soon.
> 
> I redid the experiment. What following is the record -
> 
> ``starting python``# == Windows Task Manager: 
> Python.exe  *4,076 *K memory-usage ==
>  >>> st1='abcdefg'*99 # == 10,952 K ==
>  >>> del st1 # == *4,104*K ==
>  >>> st1='abcdefg'*99 # == 10,952 K ==
>  >>> del st1 # == 4,104 K ==
> 
>  >>> li = ['abcde']*99  # == 8,024 K ==
>  >>> del li# == *4,108* K ==
> 
>  >>> from nltk import FreqDist # == 17,596 ==
>  >>> fd = FreqDist()# == 17,596 ==
>  >>> for i in range(99):fd.inc(i)  # == 53,412 ==
>  >>> del fd   # == *28,780* ==
>  >>> fd2 = FreqDist()   # == 28,780 ==
>  >>> for i in range(99):fd2.inc(i)  # == 53,412 ==
>  >>> del fd2# == 28,780 K ==
> 
>  >>> def foo():
> ... fd3 = FreqDist()
> ... for i in range(99):fd3.inc(i)
> 
>  >>>  foo() # == *28,788* K ==
> 
>  >>> def bar():
> ... fd4 = FreqDist()
> ... for i in range(99):fd4.inc(i)
> ... del fd4
>  # == 28,788 K ==
>  >>> bar() # == 28,788 K ==
> 
> 
> That is my question, after ``del``, sometimes the memory space returns 
> back as nothing happened, sometimes not... ...
> What exactly was happening???
> 
> Best regards to all PYTHON people ~~
> !!! Python Team are great !!!
> 
It doesn't really make that much sense to watch memory usage as you have 
been doing. Your first test case appears to trigger a specific 
pathology, where the memory allocator actually returns the memory to the 
operating system when the garbage collector manages to free all of it.

Most often this doesn't happen - a chunk of memory might be 99.99% free 
but still have one small piece used, and so while there is a large 
amount of "free" memory for Python to allocate without requesting more 
process memory, this won't be reflected in any external measurement.

You are suffering from a pathological condition yourself: the desire to 
optimize performance in an area where you do not have any problems. I 
would suggest you just enjoy using Python (its memory management doesn't 
suck at all, so your title line was inflammatory and simply highlights 
your lack of knowledge) and then start to ask these questions again when 
you have a real issue that's stopping you from getting real work done.

regards
  Steve

-- 
Steve Holden+1 571 484 6266   +1 800 494 3119
Holden Web LLC  http://www.holdenweb.com/
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: "Help needed - I don't understand how Python manages memory"

2008-04-20 Thread sturlamolden
On Apr 20, 2:46 pm, "Hank @ITGroup" <[EMAIL PROTECTED]> wrote:

> That is my question, after ``del``, sometimes the memory space returns
> back as nothing happened, sometimes not... ...
> What exactly was happening???

Python has a garbage collector. Objects that cannot be reached from
any scope is reclaimed, sooner or later. This includes objects with
reference count of zero, or objects that participate in unreachable
reference cycles. Since Python uses a reference counting scheme, it
does not tend to accumulate so much garbage as Java or .NET. When the
reference count for an object drops to zero, it is immediately freed.

You cannot control when Python's garbage collector frees an object
from memory.

What del does is to delete the object reference from the current
scope. It does not delete the object from memory. That is, the del
statement decrements the reference count by one, and removes the
reference from the current scope. Whether it should removed completely
depends on whether someone else is using it. The object is not
reclaimed unless the reference count has dropped all the way down to
zero. If there still are references to the object other places in your
program, it is not reclaimed upon your call to del.





-- 
http://mail.python.org/mailman/listinfo/python-list


Re: "Help needed - I don't understand how Python manages memory"

2008-04-20 Thread Marc 'BlackJack' Rintsch
On Sun, 20 Apr 2008 22:46:37 +1000, Hank @ITGroup wrote:

> Apology for the previous offensive title~~
> :)
> Thanks, Rintsch, Arnaud and Daniel, for replying so soon.
> 
> I redid the experiment. What following is the record -
> 
> ``starting python``# == Windows Task Manager: 
> Python.exe  *4,076 *K memory-usage ==
>  >>> st1='abcdefg'*99 # == 10,952 K ==
>  >>> del st1 # == *4,104*K ==
>  >>> st1='abcdefg'*99 # == 10,952 K ==
>  >>> del st1 # == 4,104 K ==
> 
>  >>> li = ['abcde']*99  # == 8,024 K ==
>  >>> del li# == *4,108* K ==
> 
>  >>> from nltk import FreqDist # == 17,596 ==
>  >>> fd = FreqDist()# == 17,596 ==
>  >>> for i in range(99):fd.inc(i)  # == 53,412 ==
>  >>> del fd   # == *28,780* ==
>  >>> fd2 = FreqDist()   # == 28,780 ==
>  >>> for i in range(99):fd2.inc(i)  # == 53,412 ==
>  >>> del fd2# == 28,780 K ==
> 
>  >>> def foo():
> ... fd3 = FreqDist()
> ... for i in range(99):fd3.inc(i)
> 
>  >>>  foo() # == *28,788* K ==
> 
>  >>> def bar():
> ... fd4 = FreqDist()
> ... for i in range(99):fd4.inc(i)
> ... del fd4
>  # == 28,788 K ==
>  >>> bar() # == 28,788 K ==
> 
> 
> That is my question, after ``del``, sometimes the memory space returns 
> back as nothing happened, sometimes not... ...
> What exactly was happening???

Something.  Really it's a bit complex and implementation dependent.  Stop
worrying about it until it really becomes a problem.

First of all there's no guarantee that memory will be reported as free by
the OS because it is up to the C runtime library if it "gives back" freed
memory to the OS or not.  Second the memory management of Python involves
"arenas" of objects that only get freed when all objects in it are freed. 
Third some types and ranges of objects get special treatment as integers
that are allocated, some even preallocated and never freed again.  All
this is done to speed things up because allocating and deallocating loads
of small objects is an expensive operation.

Bottom line: let the Python runtime manage the memory and forget about the
``del`` keyword.  It is very seldom used in Python and if used then to
delete a reference from a container and not "bare" names.  In your `bar()`
function it is completely unnecessary for example because the name `fd4`
disappears right after that line anyway.

Ciao,
Marc 'BlackJack' Rintsch
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: "Help needed - I don't understand how Python manages memory"

2008-04-20 Thread Hank @ITGroup
Apology for the previous offensive title~~
:)
Thanks, Rintsch, Arnaud and Daniel, for replying so soon.

I redid the experiment. What following is the record -

``starting python``# == Windows Task Manager: 
Python.exe  *4,076 *K memory-usage ==
 >>> st1='abcdefg'*99 # == 10,952 K ==
 >>> del st1 # == *4,104*K ==
 >>> st1='abcdefg'*99 # == 10,952 K ==
 >>> del st1 # == 4,104 K ==

 >>> li = ['abcde']*99  # == 8,024 K ==
 >>> del li# == *4,108* K ==

 >>> from nltk import FreqDist # == 17,596 ==
 >>> fd = FreqDist()# == 17,596 ==
 >>> for i in range(99):fd.inc(i)  # == 53,412 ==
 >>> del fd   # == *28,780* ==
 >>> fd2 = FreqDist()   # == 28,780 ==
 >>> for i in range(99):fd2.inc(i)  # == 53,412 ==
 >>> del fd2# == 28,780 K ==

 >>> def foo():
... fd3 = FreqDist()
... for i in range(99):fd3.inc(i)

 >>>  foo() # == *28,788* K ==

 >>> def bar():
... fd4 = FreqDist()
... for i in range(99):fd4.inc(i)
... del fd4
 # == 28,788 K ==
 >>> bar() # == 28,788 K ==


That is my question, after ``del``, sometimes the memory space returns 
back as nothing happened, sometimes not... ...
What exactly was happening???

Best regards to all PYTHON people ~~
!!! Python Team are great !!!

-- 
http://mail.python.org/mailman/listinfo/python-list