Re: Garbage collection problem with generators

2016-12-28 Thread Haochuan Guo
Sorry about breaking the rule.

I'm just curios about this problem. And I'm using this workaround to prevent
redundant resource creation.

https://gist.githubusercontent.com/wooparadog/16948ca6c8ffb22214bf491a280406da/raw/-


On Wed, Dec 28, 2016 at 9:12 PM Chris Angelico  wrote:

> On Wed, Dec 28, 2016 at 9:03 PM, Haochuan Guo 
> wrote:
> > Anyone? The script to reproduce this problem is in:
> >
> > https://gist.github.com/wooparadog/766f8007d4ef1227f283f1b040f102ef
> >
> > On Fri, Dec 23, 2016 at 8:39 PM Haochuan Guo 
> wrote:
> >
> >> This is reproducible with python2.7, but not in python3.5. I've also
> tried
> >> with `thread` instead of `gevent`, it still happens. I'm guessing it's
> >> related to garbage collection of generators.
> >>
> >> Did I bump into a python2 bug? Or am I simply wrong about the way to
> close
> >> generators...?
>
> (Please don't top-post.)
>
> Maybe the fix is to just use Python 3.5+? :) It probably is to do with
> the garbage collection of generators; so you may want to consider
> using something very explicit (eg a context manager) to ensure that
> you call gen.close().
>
> ChrisA
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Garbage collection problem with generators

2016-12-28 Thread Chris Angelico
On Wed, Dec 28, 2016 at 9:03 PM, Haochuan Guo  wrote:
> Anyone? The script to reproduce this problem is in:
>
> https://gist.github.com/wooparadog/766f8007d4ef1227f283f1b040f102ef
>
> On Fri, Dec 23, 2016 at 8:39 PM Haochuan Guo  wrote:
>
>> This is reproducible with python2.7, but not in python3.5. I've also tried
>> with `thread` instead of `gevent`, it still happens. I'm guessing it's
>> related to garbage collection of generators.
>>
>> Did I bump into a python2 bug? Or am I simply wrong about the way to close
>> generators...?

(Please don't top-post.)

Maybe the fix is to just use Python 3.5+? :) It probably is to do with
the garbage collection of generators; so you may want to consider
using something very explicit (eg a context manager) to ensure that
you call gen.close().

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Garbage collection problem with generators

2016-12-28 Thread Haochuan Guo
Anyone? The script to reproduce this problem is in:

https://gist.github.com/wooparadog/766f8007d4ef1227f283f1b040f102ef

On Fri, Dec 23, 2016 at 8:39 PM Haochuan Guo  wrote:

> Hi, everyone
>
> I'm building a http long polling client for our company's discovery
> service and something weird happened in the following code:
>
> ```python
> while True:
> try:
> r = requests.get("url", stream=True, timeout=3)
> for data in r.iter_lines():
> processing_data...
> except TimeoutException:
> time.sleep(10)
> ```
>
> When I deliberately times out the request and then check the connections
> with `lsof -p process`, I discover that there are *two active 
> connections*(ESTABLISH)
> instead of one. After digging around, it turns out it might not be the
> problem with `requests` at all, but gc related to generators.
>
> So I write this script to demonstrate the problem:
>
> https://gist.github.com/wooparadog/766f8007d4ef1227f283f1b040f102ef
>
> Function `A.a` will return a generator which will raise an exception. And
> in function `b`, I'm building new a new instance of `A` and iterate over
> the exception-raising generator. In the exception handler, I'll close the
> generator, delete it, delete the `A` instance, call `gc.collect()` and do
> the whole process all over again.
>
> There's another greenlet checking the `A` instances by using
> `gc.get_objects()`. It turns out there are always two `A` instances.
>
> This is reproducible with python2.7, but not in python3.5. I've also tried
> with `thread` instead of `gevent`, it still happens. I'm guessing it's
> related to garbage collection of generators.
>
> Did I bump into a python2 bug? Or am I simply wrong about the way to close
> generators...?
>
> Thanks
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: garbage collection / reference cycles (cont.)

2009-03-29 Thread Aaron Brady
On Mar 25, 12:12 am, Aaron Brady  wrote:
> On Mar 25, 12:11 am, Aaron Brady  wrote:
> > Hello,
>
> > I am posting the code I mentioned on Saturday that collects garbage
> > and cyclic garbage in a flattened two-step process.  The code takes
> > 122 lines incl. comments, with 100 in tests.  It should be in a reply
> > to this.
snip

Here is the output.  Someone suggested I add it.  It may or may not be
utterly unintelligible.  It's quite long, 367 lines.

>>> run( 'psu' ) # external references to 'p', 's', and 'u'

 decref
parents {: {(, 'at2')}, : {(, 'at')}, : {(, 'at'), (, 'at'), (, 'at1')}, :
{(, 'at3')}}
refct_copy {: 0, : 1, : 2, : 2}
 decref
parents {: {(, 'at'), (, 'at'), (, 'at1'
)}, : {(, 'at2')}, : {(, 'at')}, :
{(, 'at3')}}
refct_copy {: 0, : 0, : 2, : 2}
 decref
parents {}
refct_copy {: 2}
p: (q), q: (pqrt), r: (q), s: (p), t: (), u: (t)
>>>
>>> p.decref() # decref 'p'.  should not free any.
 decref
parents {: {(, 'at'), (, 'at'), (, 'at1'
)}, : {(, 'at2')}, : {(, 'at')}, :
{(, 'at3')}}
refct_copy {: 1, : 0, : 0, : 1}
>>>
>>> assert_exist( p, q, r, s, t, u )
 exists
 exists
 exists
 exists
 exists
 exists
>>>
>>> run( 'psu' ) # start over

 decref
parents {: {(, 'at2')}, : {(, 'at')}, : {(, 'at3')}, : {(, 'at'), (, 'at'),
 (, 'at1')}}
refct_copy {: 0, : 1, : 2, : 2}
 decref
parents {: {(, 'at'), (, 'at'), (, 'at1'
)}, : {(, 'at2')}, : {(, 'at')}, :
{(, 'at3')}}
refct_copy {: 0, : 0, : 2, : 2}
 decref
parents {}
refct_copy {: 2}
p: (q), q: (pqrt), r: (q), s: (p), t: (), u: (t)
>>>
>>> p.decref()
 decref
parents {: {(, 'at'), (, 'at'), (, 'at1'
)}, : {(, 'at2')}, : {(, 'at')}, :
{(, 'at3')}}
refct_copy {: 1, : 0, : 1, : 0}
>>>
>>> s.decref() # decref 'p' and 's'.  should decref 'q', 'r'
,
 decref
 decref
finalizing 
parents {: {(, 'at'), (, 'at'), (, 'at1'
)}, : {(, 'at2')}, : {(, 'at')}, :
{(, 'at3')}}
refct_copy {: 0, : 0, : 1, : 0}
cycle of  found
 decref
parents {: {(, 'at2')}, : {(, 'at')}, : {(, 'at3')}, : {(, 'at'), (, 'at1')
}}
refct_copy {: 0, : 0, : 0, : 1}
cycle of  found
 decref
 decref
 decref
 decref
finalizing 
 decref
finalizing 
finalizing 
parents {}
refct_copy {: 1}
>>># and 't'.  should finalize 's', 'p', 'r', 'q
'.
...
>>> assert_exist( t, u )
 exists
 exists
>>> assert_destroyed( p, q, r, s )
 destroyed
 destroyed
 destroyed
 destroyed
>>>
>>> run( 'psu' )

 decref
parents {: {(, 'at2')}, : {(, 'at')}, : {(, 'at'), (, 'at'), (, 'at1')}, :
{(, 'at3')}}
refct_copy {: 0, : 1, : 2, : 2}
 decref
parents {: {(, 'at'), (, 'at'), (, 'at1'
)}, : {(, 'at2')}, : {(, 'at3')}, :
 {(, 'at')}}
refct_copy {: 0, : 0, : 2, : 2}
 decref
parents {}
refct_copy {: 2}
p: (q), q: (pqrt), r: (q), s: (p), t: (), u: (t)
>>>
>>> s.decref()
 decref
 decref
finalizing 
parents {: {(, 'at'), (, 'at'), (, 'at1'
)}, : {(, 'at2')}, : {(, 'at3')}, :
 {(, 'at')}}
refct_copy {: 1, : 0, : 0, : 1}
>>>
>>> p.decref() # same result, different order
 decref
parents {: {(, 'at'), (, 'at'), (, 'at1'
)}, : {(, 'at2')}, : {(, 'at3')}, :
 {(, 'at')}}
refct_copy {: 0, : 0, : 0, : 1}
cycle of  found
 decref
parents {: {(, 'at2')}, : {(, 'at')}, : {(, 'at'), (, 'at1')}, : {(, 'at3')
}}
refct_copy {: 0, : 0, : 1, : 0}
cycle of  found
 decref
 decref
 decref
 decref
finalizing 
 decref
finalizing 
finalizing 
parents {}
refct_copy {: 1}
>>>
>>> assert_exist( t, u )
 exists
 exists
>>> assert_destroyed( p, q, r, s )
 destroyed
 destroyed
 destroyed
 destroyed
>>>
>>> run( 'psu' )

 decref
parents {: {(, 'at2')}, : {(, 'at')}, : {(, 'at'), (, 'at'), (, 'at1')}, :
{(, 'at3')}}
refct_copy {: 0, : 1, : 2, : 2}
 decref
parents {: {(, 'at'), (, 'at'), (, 'at1'
)}, : {(, 'at2')}, : {(, 'at3')}, :
 {(, 'at')}}
refct_copy {: 0, : 0, : 2, : 2}
 decref
parents {}
refct_copy {: 2}
p: (q), q: (pqrt), r: (q), s: (p), t: (), u: (t)
>>>
>>> s.decref() # should finalize 's'.
 decref
 decref
finalizing 
parents {: {(, 'at'), (, 'at'), (, 'at1'
)}, : {(, 'at2')}, : {(, 'at3')}, :
 {(, 'at')}}
refct_copy {: 1, : 0, : 0, : 1}
>>>
>>> assert_exist( p, q, r, t, u )
 exists
 exists
 exists
 exists
 exists
>>> assert_destroyed( s )
 destroyed
>>>
>>> run( 'qsu' ) # more.

 decref
parents {: {(, 'at'), (, 'at'), (, 'at1'
)}, : {(, 'at2')}, : {(, 'at')}, :
{(, 'at3')}}
refct_copy {: 1, : 1, : 1, : 2}
 decref
parents {: {(, 'at'), (, 'at'), (, 'at1'
)}, : {(, 'at2')}, : {(, 'at')}, :
{(, 'at3')}}
refct_copy {: 0, : 1, : 1, : 2}
 decref
parents {}
refct_copy {: 2}
p: (q), q: (pqrt), r: (q), s: (p), t: (), u: (t)
>>>
>>> q.decref()
 decref
parents {: {(, 'at2')}, : {(, 'at')}, : {(, 'at'), (, 'at'), (, 'at1')}, :
{(, 'at3')}}
refct_copy {: 0, : 0, : 1, : 1}
>>>
>>> assert_exist( p, q, r, s, t, u )
 exists
 exists
 exists
 exists
 exists
 exists
>>>
>>> run( 'qsu' )

 decref
parents {: {(, 'at'), (, 'at'), (, 'at1'
)}, : {(, 'at2')}, : {(, 'at')}, :
{(, 'at3')}}
refct_copy {: 1, : 1, : 2, : 1}
 decref
parents {: {(, 'at'), (, 'at'), (, 'at1'
)}, : {(, 'at2')}, : {(, 'at')}, :
{(, 'at3')}}
refct_copy {: 0, : 1, : 1, : 2}
 decref
parents {}
refct_

Re: garbage collection / reference cycles (cont.)

2009-03-29 Thread Aaron Brady
On Mar 25, 12:11 am, Aaron Brady  wrote:
> Hello,
>
> I am posting the code I mentioned on Saturday that collects garbage
> and cyclic garbage in a flattened two-step process.  The code takes
> 122 lines incl. comments, with 100 in tests.  It should be in a reply
> to this.
>
> My aim is a buffer-like object which can contain reference-counted
> objects.  This is a preliminary Python version of the cycle detector.

snip formality

Someone suggested that it wasn't clear to them what my goal was in
this post.  I created a garbage collector that has an extra method
that user-defined objects don't have in Python's.  It is 'final_attr',
which requests the objects to drop their reference to the specified
attr.  After it returns, the object is moved to the back of the
collection queue.  This means that it knows what references of its own
it is losing; they are still valid at the time 'final_attr' is called;
and other objects' references to /it/ are still valid too.

I want a technical discussion of its strengths and weaknesses.

Aahz suggested to try python-ideas:
http://mail.python.org/pipermail/python-ideas/2009-March/003774.html
--
http://mail.python.org/mailman/listinfo/python-list


Re: garbage collection / reference cycles (cont.)

2009-03-24 Thread Aaron Brady
On Mar 25, 12:11 am, Aaron Brady  wrote:
> Hello,
>
> I am posting the code I mentioned on Saturday that collects garbage
> and cyclic garbage in a flattened two-step process.  The code takes
> 122 lines incl. comments, with 100 in tests.  It should be in a reply
> to this.
>
> My aim is a buffer-like object which can contain reference-counted
> objects.  This is a preliminary Python version of the cycle detector.
> I expect to port it to C++, but the buffer object as well as object
> proxies are Python objects.  The memory management strategy,
> synchronization, etc., are other modules.  It is similar in principle
> to Python's own 'gc'.  If it's sound, it may have some educational and
> explanatory value also.
>
> Anyway, since I received a little interest in it, I wanted to follow
> up.  It is free to play with.  If there's a better group to ask about
> this, or there are more scholarly, widely-used, or thorough treatments
> or implementations, I'm interested.

from collections import deque

class Globals:
to_collect= deque() # FIFO of garbage that has been decref'ed;
# Queue them instead of nested 'gc' calls

to_collect_set= set() # hash lookup of the same information
ser_gc_running= False # bool flag if the GC is running

def schedule_collect( ob ):
''' Add to FIFO- no gc call '''
if ob in Globals.to_collect_set:
return
Globals.to_collect.append( ob )
Globals.to_collect_set.add( ob )

def serial_gc( ):
''' Visit objects which have been decref'ed.  If they
have left reachability, enqueue the entire cycle they
are in; this as opposed to nested 'final' calls. '''
if Globals.ser_gc_running:
return
Globals.ser_gc_running= True

while Globals.to_collect:
ob= Globals.to_collect.popleft( )
Globals.to_collect_set.remove( ob )
if ob.ref_ct== 0:
ob.final( )
else:
incycle= Globals.cycle_detect( ob )
if incycle:
# Request object to drop its referenecs;
# re-queue the object.  (Potential
# infinite loop, if objects do not comply.)
for k, v in list( ob.__dict__.items( ) ):
if not isinstance( v, ManagedOb ):
continue
ob.final_attr( k )
Globals.schedule_collect( ob )


Globals.ser_gc_running= False

def cycle_detect( ob ):
''' Detect an unreachable reference cycle in the
descendants of 'ob'.  Return True if so, False if
still reachable.  Only called when walking the
'to_collect' queue. '''
parents= { } # disjunction( ancestors, descendants )
bfs= deque( [ ob ] )
refct_copy= { ob: ob.ref_ct }
# copy the ref_ct's to a map;
# decrement the copies on visit (breadth-first)
while bfs:
x= bfs.popleft( )
for k, v in x.__dict__.items( ):
if not isinstance( v, ManagedOb ):
continue
if v not in refct_copy:
refct_copy[ v ]= v.ref_ct
bfs.append( v )
if v not in parents:
parents[ v ]= set( )
refct_copy[ v ]-= 1
parents[ v ].add( ( x, k ) )
print( 'parents', parents )
print( 'refct_copy', refct_copy )

# any extra-cyclic references?
if refct_copy[ ob ]:
return False

# (ancestors && descendants) all zero?
# --(breadth-first)
bfs= deque( [ ob ] )
visited= set( [ ob ] )
while bfs:
x= bfs.popleft( )
for n, _ in parents[ x ]:
if n in visited:
continue
if refct_copy[ n ]:
return False
visited.add( n )
bfs.append( n )
print( 'cycle of', ob, 'found' )
return True

class ManagedOb:
def __init__( self, name ):
self.ref_ct= 1
self.name= name
def assign( self, attr, other ):
''' setattr function (basically) '''
if hasattr( self, attr ):
getattr( self, attr ).decref( )
other.incref( )
setattr( self, attr, other )
def incref( self ):
self.ref_ct+= 1
def decref( self ):
print( self, 'decref' )
self.ref_ct-= 1
# check for cycles and poss. delete
Globals.schedule_collect( self )
Globals.serial_gc( ) # trip the collector
def final_attr( self, attr ):
''' magic function.  your object has left
reachability and is requested to drop its
reference to 'attr'. '''
ob= getattr( self, attr )
delattr( self, attr )
ob.decref( )
def final( self ):
for _, v in self.__dict

Re: garbage collection / cyclic references

2009-03-22 Thread Aaron Brady
On Mar 21, 11:59 am, "andrew cooke"  wrote:
> Aaron Brady wrote:
> > My point is, that garbage collection is able to detect when there are
> > no program-reachable references to an object.  Why not notify the
> > programmer (the programmer's objects) when that happens?  If the
> > object does still have other unreachable references, s/he should be
> > informed of that too.
>
> i think we're mixing python-specific and more general / java details, but,
> as far as i understand things, state of the art (and particularly
> generational) garbage collectors don't guarantee that objects will ever be
> reclaimed.  this is a trade for efficiency, and it's a trade that seems to
> be worthwhile and popular.

It's at best worthless, but so is politics.  I take it back; you can
reclaim memory in large numbers with a probabilistic finalizer.  The
expected value of reclaiming a KB with a 90% chance of call is .9 KB.

The allocation structure I am writing will have a very long up-time.
I can forcibly reclaim the memory of an object involved in a cycle,
but lingering references it has will never be detected.  Though, if I
can't guarantee 100% reclamation, I'll have to be anticipating a
buffer dump eventually anyway, which makes, does it not, 90% about the
same as 99%.

> furthermore, you're mixing responsibilities for two logically separate
> ideas just because a particular implementation happens to associate them,
> which is not a good idea from a design pov.

I think a silent omission of finalization is the only alternative.  If
so they're mixed, one way or the other.  I argue it is closer to
associating your class with a hash table: they are logically separate
ideas.  Perhaps implementation is logically separate from design
altogether.

> i can remember, way back in the mists of time

I understand they were having a fog problem there yesterday... not to
mention a sale on sand.  "Today: Scattered showers and thunderstorms
before 1pm, then a slight chance of showers."

> using java finalizers for
> doing this kind of thing.  and then learning that it was a bad idea.  once
> i got over the initial frustration, it really hasn't been a problem.  i
> haven't met a situation

I don't suppose I imagine one.  So, you could argue that it's a low
priority.  Washing your hands of the rare, though, disqualifies you
from the associate's in philosophy.  I bet you want to meet my
customers, too.

> where i needed to tie resource management and
> memory management together (except for interfacing with c code that does
> not use the host language's gc - and i can imagine that for python this is
> a very strong (perhaps *the*) argument for reference counting).

I'm using a specialized mapping type to implement the back end of user-
defined classes.  Since I know the implementation, which in particular
maps strings to objects, I can always just break cycles by hand; that
is, until someone wants a C extension.  Then they will want tp_clear
and tp_traverse methods.

> as an bonus example, consider object caching - a very common technique
> that immediately breaks anything that associates other resources with
> memory use.

I assume your other processes are notified of the cache state.  From
what I understand, Windows supports /named/ caching.  Collaborators
can check the named cache, in the process creating it if it doesn't
exist, and read and write at will there.

> just because, in one limited case, you can do something, doesn't mean it's
> a good idea.
>
> andrew

But you're right.  I haven't talked this over much on the outside, so
I might be missing something huge, and serialized two-step
finalization (tm) is the secret least of my worries.
--
http://mail.python.org/mailman/listinfo/python-list


Re: garbage collection / cyclic references

2009-03-21 Thread Aaron Brady
On Mar 21, 1:04 pm, John Nagle  wrote:
> Aaron Brady wrote:
> > Hello,
>
> > I was reading and Googling about garbage collection, reference
> > counting, and the problem of cyclic references.
>
> > Python's garbage collection module claims to be able to detect and
> > break cyclic garbage.  Some other languages merely prohibit it.  Is
> > this the place to ask about its technique?
>
> > I understand that the disadvantage is a non-deterministic order of
> > deletion/finalization.  
>
>      Garbage collection and destructors or "finalizers" don't play well
> together.  It's a fundamental problem.  Calling finalizers from the
> garbage collector is painful.  It introduces concurrency where the
> user may not have expected it.  Consider what happens if a finalizer
> tries to lock something.  What if GC runs while that lock is locked?
> This can create a deadlock situation.  Calling finalizers from the
> garbage collector can result in intermittent, very hard to find bugs.

As I understand it, 'obj.decref()' can call 'other.decref()', which
can try to access its reference to 'obj', which has already begun
cleanup.  At that point, 'obj' is in an inconsistent state.  Its own
methods are secretly called during its '__del__'.

One step would be to serialize this process, so that 'other.decref()'
gets deferred until 'obj.decref()' has completed.  Then attributes are
in an all-or-nothing state, which is at least consistent.  However,
that means every external reference in a '__del__' method has to be
wrapped in an exception handler, one at a time, because the object /
might/ already be in a reference cycle.  (Non-container types are
excepted.)  The remaining reference merely needs to change its class
to a ReclaimedObject type.  That's acceptable if documented.  I also
believe it solves the potential for deadlock.

> (Look up "re-animation"
> in Microsoft Managed C++ literature.  It's not pretty.)

Pass!

>      Python actually has reference counting backed up by garbage collection.
> Most objects are destroyed as soon as all references to them disappear.
> Garbage collection is only needed to deal with cycles.
>
>      Python has "weak references", which won't keep an object around
> once all the regular references are deleted.  These are useful in
> some situations.  In a tree, for example, pointers towards the leaves
> should be strong pointers, while back-pointers towards the root should
> be weak pointers.
snip
>
>     Personally, I'd argue that the right answer is to prohibit cycles of
> strong pointers.  That should be considered a programming error, and
> detected at run time, at least by debugging tools.  With weak pointers,
> you don't really need cycles of strong pointers.

Reference cycles can be detected anyway with debug tools, even prior
to destruction.  My objection is that would complicate control flow
severely:

for x in y:
z.append( x )

becomes:

for x in y:
if cyclic_ref( x ):
z.append( weakref.ref( x ) )
else:
z.append( x )

And worse, every attribute access has to be wrapped.

for x in z:
if isinstance( x, __builtins__.weakref ):
if x() is not None:
print( x() )
else:
print( x )

In other words, it interferes with uniform access to attributes and
container members.  However, in the case where you know a structure a
priori, it's a good technique, as your example showed.  I observe that
my proposal has the same weakness!

If you make the case that you usually do know the structure your data
have, I won't be able to disprove it.  The example would come from a
peer-to-peer representation of something, or storage of relational
data.

Regardless, the group has responded to most of my original post.  I
don't think I emphasized however that I'm designing an allocation
system that can contain reference cycles; and I was asking if such
special methods, '__gc_cycle__( self, attr )' or '__gc_clear__
( self )' would be right for me.  I'm also interested in feedback
about the serialization method of ref. counting earlier in this post.

>     The advantage of this is a clean order of destruction.  This is useful
> in window widget systems, where you have objects with pointers going in many
> directions, yet object destruction has substantial side effects.
>
>     Python originally had only reference counting, and didn't have weak 
> pointers.
> If weak pointers had gone in before the garbage collector, Python might have
> gone in this direction.
>
>                                 John Nagle

--
http://mail.python.org/mailman/listinfo/python-list


Re: garbage collection / cyclic references

2009-03-21 Thread John Nagle

Aaron Brady wrote:

Hello,

I was reading and Googling about garbage collection, reference
counting, and the problem of cyclic references.

Python's garbage collection module claims to be able to detect and
break cyclic garbage.  Some other languages merely prohibit it.  Is
this the place to ask about its technique?

I understand that the disadvantage is a non-deterministic order of
deletion/finalization.  


Garbage collection and destructors or "finalizers" don't play well
together.  It's a fundamental problem.  Calling finalizers from the
garbage collector is painful.  It introduces concurrency where the
user may not have expected it.  Consider what happens if a finalizer
tries to lock something.  What if GC runs while that lock is locked?
This can create a deadlock situation.  Calling finalizers from the
garbage collector can result in intermittent, very hard to find bugs.

C++ takes destructors seriously; objects are supposed to be destructed
exactly once, and if they're of "auto" scope (a local object in the
Python sense) they will reliably be cleaned up at block exit.
Microsoft's "Managed C++" broke those rules; in Managed C++,
destructors can be called more than once.  (Look up "re-animation"
in Microsoft Managed C++ literature.  It's not pretty.)

Python actually has reference counting backed up by garbage collection.
Most objects are destroyed as soon as all references to them disappear.
Garbage collection is only needed to deal with cycles.

Python has "weak references", which won't keep an object around
once all the regular references are deleted.  These are useful in
some situations.  In a tree, for example, pointers towards the leaves
should be strong pointers, while back-pointers towards the root should
be weak pointers.

   I once modified BeautifulSoup, the HTML parser, to use weak pointers
that way.  BeautifulSoup trees are big and don't go away immediately when
no longer needed, because they have backpointers.  They hang around until
the next GC cycle.  With the version that uses weak pointers, they go away
as soon as they're no longer needed.  We've found this useful in a web
crawler; the data space used drops and actual GC runs are no longer
necessary.

   Personally, I'd argue that the right answer is to prohibit cycles of
strong pointers.  That should be considered a programming error, and
detected at run time, at least by debugging tools.  With weak pointers,
you don't really need cycles of strong pointers.

   The advantage of this is a clean order of destruction.  This is useful
in window widget systems, where you have objects with pointers going in many
directions, yet object destruction has substantial side effects.

   Python originally had only reference counting, and didn't have weak pointers.
If weak pointers had gone in before the garbage collector, Python might have
gone in this direction.

John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Re: garbage collection / cyclic references

2009-03-21 Thread andrew cooke
Aaron Brady wrote:
> My point is, that garbage collection is able to detect when there are
> no program-reachable references to an object.  Why not notify the
> programmer (the programmer's objects) when that happens?  If the
> object does still have other unreachable references, s/he should be
> informed of that too.

i think we're mixing python-specific and more general / java details, but,
as far as i understand things, state of the art (and particularly
generational) garbage collectors don't guarantee that objects will ever be
reclaimed.  this is a trade for efficiency, and it's a trade that seems to
be worthwhile and popular.

furthermore, you're mixing responsibilities for two logically separate
ideas just because a particular implementation happens to associate them,
which is not a good idea from a design pov.

i can remember, way back in the mists of time, using java finalizers for
doing this kind of thing.  and then learning that it was a bad idea.  once
i got over the initial frustration, it really hasn't been a problem.  i
haven't met a situation where i needed to tie resource management and
memory management together (except for interfacing with c code that does
not use the host language's gc - and i can imagine that for python this is
a very strong (perhaps *the*) argument for reference counting).

as an bonus example, consider object caching - a very common technique
that immediately breaks anything that associates other resources with
memory use.

just because, in one limited case, you can do something, doesn't mean it's
a good idea.

andrew


--
http://mail.python.org/mailman/listinfo/python-list


Re: garbage collection / cyclic references

2009-03-21 Thread Aaron Brady
On Mar 21, 10:28 am, Aaron Brady  wrote:
> On Mar 21, 9:50 am, "andrew cooke"  wrote:
>
>
>
> > Aaron Brady wrote:
> > > On Mar 21, 7:54 am, "andrew cooke"  wrote:
> > >> they should not be used to do things like flushing and closing
> > >> files, for example.
> > > What is your basis for this claim, if it's not the mere unreliability
> > > of finalization?  IOW, are you not merely begging the question?
>
> > I'm not sure it's clear, but I was talking about Java.
>
> > As Paul implied, a consequence of completely automated garbage management
> > is that it is (from a programmer's POV) deterministic.  So it's a
> [indeterministic]
> > programming error to rely on the finalizer to free resources that don't
> > follow that model (ie any resource that's anything other that
> [than]
> > reasonable
> > amounts of memory).
>
> > That's pretty much an unavoidable consequence of fully automated garbage
> > collection.  You can pretend it's not, and try using finalizers for other
> > work if you want.  That's fine - it's your code, not mine.  I'm just
> > explaining how life is.
>
> > Andrew
>
> My point is, that garbage collection is able to detect when there are
> no program-reachable references to an object.  Why not notify the
> programmer (the programmer's objects) when that happens?  If the
> object does still have other unreachable references, s/he should be
> informed of that too.
snip

I took the liberty of composing a sample cyclic reference detector.  I
will post the class definition later on in the discussion (when and
if).

The 'run' method resets the globals to a sample graph, as
illustrated.  'p' and 's' start out with one simulated program-visible
reference each.  As you see, the details are already long and boring
(yum).  I added comments post-facto.

>>> run() #only decref 'p'
p: (q), q: (pr), r: (q), s: (p)
>>>
>>> p.decref() #not safe to delete
{: 1, : 0, : 0}
>>>
>>>
>>> run() #decref 'p' then 's'
p: (q), q: (pr), r: (q), s: (p)
>>>
>>> p.decref()
{: 1, : 0, : 0}
>>>
>>> s.decref()
{: 0, : 0, : 0, : 0}
 ALL zero #'s' safe to delete
{: 0, : 0, : 0}
 ALL zero #also deletes 'p', also safe
finalizing 
>>>
>>>
>>> run()
p: (q), q: (pr), r: (q), s: (p)
>>>
>>> s.decref()
{: 0, : 1, : 0, : 0}
{: 1, : 0, : 0}
finalizing  #deletion
>>>
>>> p.decref()
{: 0, : 0, : 0}
 ALL zero #'p' safe to delete
>>>
>>>
>>> run()
p: (q), q: (pr), r: (q), s: (p)
>>>
>>> s.decref()
{: 0, : 1, : 0, : 0}
{: 1, : 0, : 0}
finalizing  #'p' not safe, reference still visible

We notice the duplicate 'all zero' indicator on run #2.  The cycle
detector ran on 's.decref', then 's' called 'p.decref', then the cycle
detector ran on that.  'q' and 'r' are safe to delete on runs 2 and 3.

Here is the implementation of 'final':

def final( self ):
for _, v in self.__dict__.items( ):
if not isinstance( v, G ):
continue
v.decref( )
print( 'finalizing', self )

The object should be asked to finish its references (cyclic only?),
but remain alive.  The programmer should see that the state is
consistent.  Later, its __del__ will be called.

We can decide that '__leave_reachability__' will be called without
nesting; and/or that '__del__' will be called without nesting, by
breaking finalization in to two steps.

FTR, this makes __leave_reachability__ about the equivalent of
tp_clear, since tp_traverse is prior defined for user-defined types.
--
http://mail.python.org/mailman/listinfo/python-list


Re: garbage collection / cyclic references

2009-03-21 Thread Aaron Brady
On Mar 21, 9:50 am, "andrew cooke"  wrote:
> Aaron Brady wrote:
> > On Mar 21, 7:54 am, "andrew cooke"  wrote:
> >> they should not be used to do things like flushing and closing
> >> files, for example.
> > What is your basis for this claim, if it's not the mere unreliability
> > of finalization?  IOW, are you not merely begging the question?
>
> I'm not sure it's clear, but I was talking about Java.
>
> As Paul implied, a consequence of completely automated garbage management
> is that it is (from a programmer's POV) deterministic.  So it's a
[indeterministic]
> programming error to rely on the finalizer to free resources that don't
> follow that model (ie any resource that's anything other that
[than]
> reasonable
> amounts of memory).
>
> That's pretty much an unavoidable consequence of fully automated garbage
> collection.  You can pretend it's not, and try using finalizers for other
> work if you want.  That's fine - it's your code, not mine.  I'm just
> explaining how life is.
>
> Andrew

Hi, nice to talk to you this early.  Sorry you're in a bad mood.
You've sure come to the right place to find friends though.  

My point is, that garbage collection is able to detect when there are
no program-reachable references to an object.  Why not notify the
programmer (the programmer's objects) when that happens?  If the
object does still have other unreachable references, s/he should be
informed of that too.

I advanced an additional method to this end.  Do you argue that there
aren't any cases in which the class could make use of the information;
or there aren't /enough/ cases so in which?

Perhaps it would help to handle a contrary case by hand.  Two objects
need to make write operations each to the other when they are closed.
Would it be sufficient in general, knowing nothing further about them,
to queue some information, and close?  Do they always know at design-
time their references will be cyclic?  Would a mere
'__leave_reachability__' method be more generally informative or
robust?  Would it constitute a two-step destruction, to notify objects
when they're unreachable, and then finalize?  The two objects' write
operations could execute in such a method, without risking prior
destruction.
--
http://mail.python.org/mailman/listinfo/python-list


Re: garbage collection / cyclic references

2009-03-21 Thread andrew cooke
andrew cooke wrote:
> Aaron Brady wrote:
>> On Mar 21, 7:54 am, "andrew cooke"  wrote:
>>> they should not be used to do things like flushing and closing
>>> files, for example.
>> What is your basis for this claim, if it's not the mere unreliability
>> of finalization?  IOW, are you not merely begging the question?
>
> I'm not sure it's clear, but I was talking about Java.

crap.  i meant to say INdeterministic.

sorry, i am in a foul mood (for completely unrelated reasons) and probably
shouldn't be making posts to a public newsgroup.

andrew

> As Paul implied, a consequence of completely automated garbage management
> is that it is (from a programmer's POV) deterministic.  So it's a
> programming error to rely on the finalizer to free resources that don't
> follow that model (ie any resource that's anything other that reasonable
> amounts of memory).
>
> That's pretty much an unavoidable consequence of fully automated garbage
> collection.  You can pretend it's not, and try using finalizers for other
> work if you want.  That's fine - it's your code, not mine.  I'm just
> explaining how life is.
>
> Andrew
>
>


--
http://mail.python.org/mailman/listinfo/python-list


Re: garbage collection / cyclic references

2009-03-21 Thread andrew cooke
Aaron Brady wrote:
> On Mar 21, 7:54 am, "andrew cooke"  wrote:
>> they should not be used to do things like flushing and closing
>> files, for example.
> What is your basis for this claim, if it's not the mere unreliability
> of finalization?  IOW, are you not merely begging the question?

I'm not sure it's clear, but I was talking about Java.

As Paul implied, a consequence of completely automated garbage management
is that it is (from a programmer's POV) deterministic.  So it's a
programming error to rely on the finalizer to free resources that don't
follow that model (ie any resource that's anything other that reasonable
amounts of memory).

That's pretty much an unavoidable consequence of fully automated garbage
collection.  You can pretend it's not, and try using finalizers for other
work if you want.  That's fine - it's your code, not mine.  I'm just
explaining how life is.

Andrew


--
http://mail.python.org/mailman/listinfo/python-list


Re: garbage collection / cyclic references

2009-03-21 Thread Aaron Brady
On Mar 21, 7:54 am, "andrew cooke"  wrote:
> Paul Rubin wrote:
> > "andrew cooke"  writes:
> >> the two dominant virtual machines - .net and the jvm both handle
> >> circular
> >> references with no problem whatever.
>
> > AFAIK, they also don't guarantee that finalizers ever run, much less
> > run in deterministic order.
>
> i think you're right, but i'm missing your point - perhaps there was some
> sub-context to the original post that i didn't understand?
>
> finalizers should not be considered part of a public resource management
> api - they should not be used to do things like flushing and closing
> files, for example.   i think this was a minor "issue" early in java's
> adoption (i guess because of incorrect assumptions made by c++
> programmers) (in python the with context is a much better mechanism for
> this kind of thing - the best java has is the finally statement).  but
> it's one of those things that (afaik) isn't an issue once you fully
> embrace the language (rather like, say, semantically meaningful
> indentation).
>
> but i'm sure you know all that, so i'm still wondering what i've missed.
>
> andrew

Theoretically, my object should be able to maintain an open resource
for its lifetime; and its clients shouldn't need to know what its
lifetime is.  Therefore, it needs a callback when that is over.

If finalization methods could be called in a structurally sound
manner, they could be relied on to handle flushing and closing files.

> they should not be used to do things like flushing and closing
> files, for example.

What is your basis for this claim, if it's not the mere unreliability
of finalization?  IOW, are you not merely begging the question?
--
http://mail.python.org/mailman/listinfo/python-list


Re: garbage collection / cyclic references

2009-03-21 Thread andrew cooke
Paul Rubin wrote:
> "andrew cooke"  writes:
>> the two dominant virtual machines - .net and the jvm both handle
>> circular
>> references with no problem whatever.
>
> AFAIK, they also don't guarantee that finalizers ever run, much less
> run in deterministic order.

i think you're right, but i'm missing your point - perhaps there was some
sub-context to the original post that i didn't understand?

finalizers should not be considered part of a public resource management
api - they should not be used to do things like flushing and closing
files, for example.   i think this was a minor "issue" early in java's
adoption (i guess because of incorrect assumptions made by c++
programmers) (in python the with context is a much better mechanism for
this kind of thing - the best java has is the finally statement).  but
it's one of those things that (afaik) isn't an issue once you fully
embrace the language (rather like, say, semantically meaningful
indentation).

but i'm sure you know all that, so i'm still wondering what i've missed.

andrew



--
http://mail.python.org/mailman/listinfo/python-list


Re: garbage collection / cyclic references

2009-03-21 Thread Martin v. Löwis
> The actual backend of CPython requires garbage-collected container
> types to implement tp_inquiry and tp_clear methods, but user-defined
> types apparently aren't required to conform.

tp_inquiry doesn't exist, you probably mean tp_traverse. tp_traverse
is completely irrelevant for python-defined types; the VM can traverse
a user-defined type just fine even without the help of tp_traverse.
If a C-defined type fails to implement tp_traverse when it should,
then garbage collection breaks entirely.

tp_clear isn't invoked for an object at all if the object is in a
cycle with finalizers, so it's not something that you can use to
detect that you are in a cycle with finalizers.

Cycles with finalizers are considered a bug; application programmers
should check gc.garbage at the end of the program to determine whether
they have this bug. There is an easy design pattern around it, so I'm
-1 on complicating the GC protocol.

Regards,
Martin
--
http://mail.python.org/mailman/listinfo/python-list


Re: garbage collection / cyclic references

2009-03-21 Thread Aaron Brady
On Mar 20, 8:12 pm, "andrew cooke"  wrote:
> Aaron Brady wrote:
>
> [...]
>
> > caveats and fragilities?  If free software can do it, why isn't it all
> > over the industry?  What disqualifies it from solved-problem status?
>
> the two dominant virtual machines - .net and the jvm both handle circular
> references with no problem whatever.  this is standard in modern garbage
> collection - go read a book on the subject (personally i like grune et
> al's modern compiler design).  it *is* a solved problem.  if anything,
> python is behind the curve, not ahead of it, but this may change with the
> next generation of python implementations (pypy targets a variety of vms,
> i think).
>
> as for the extra methods you suggest - why do you want to expose
> implementation details in an api?  that is not the normal aim of good
> design.
>
> andrew

"Circular references ...can only be cleaned up if there are no Python-
level __del__() methods involved."  __del__ doc.

"Python doesn’t collect ... cycles automatically because, in general,
it isn’t possible for Python to guess a safe order in which to run the
__del__() methods."  gc.garbage doc.

"Errors should never pass silently." -The Zen of Python

I advance that cyclic objects should be notified when their external
references go to zero, but their usual '__del__' is inappropriate.  If
objects implement a __del__ method, they can choose to implement a
'__gc_cycle__' method, and then just drop the specified attribute.  It
needn't be called on every object in the cycle, either; once it's
called on one object, another object's normal __del__ may be safely
called.  Output, unproduced:

>>> del x
In X.__gc_cycle__, 'other' attribute.  Deleting...
In Y.__del__.
In X.__del__.
>>>

The actual backend of CPython requires garbage-collected container
types to implement tp_inquiry and tp_clear methods, but user-defined
types apparently aren't required to conform.

Supporting Cyclic Garbage Collection
http://docs.python.org/3.0/c-api/gcsupport.html
--
http://mail.python.org/mailman/listinfo/python-list


Re: garbage collection / cyclic references

2009-03-20 Thread Paul Rubin
"andrew cooke"  writes:
> the two dominant virtual machines - .net and the jvm both handle circular
> references with no problem whatever.

AFAIK, they also don't guarantee that finalizers ever run, much less
run in deterministic order.
--
http://mail.python.org/mailman/listinfo/python-list


Re: garbage collection / cyclic references

2009-03-20 Thread andrew cooke
Aaron Brady wrote:
[...]
> caveats and fragilities?  If free software can do it, why isn't it all
> over the industry?  What disqualifies it from solved-problem status?

the two dominant virtual machines - .net and the jvm both handle circular
references with no problem whatever.  this is standard in modern garbage
collection - go read a book on the subject (personally i like grune et
al's modern compiler design).  it *is* a solved problem.  if anything,
python is behind the curve, not ahead of it, but this may change with the
next generation of python implementations (pypy targets a variety of vms,
i think).

as for the extra methods you suggest - why do you want to expose
implementation details in an api?  that is not the normal aim of good
design.

andrew


--
http://mail.python.org/mailman/listinfo/python-list


Re: Garbage collection of recursive inner function

2008-08-05 Thread from . future . import
On Aug 5, 5:23 am, Terry Reedy <[EMAIL PROTECTED]> wrote:

> To understand this, it helps to realize that Python functions are not,
> in themselves, recursive.  Recursiveness at any time is a property of a
> function in an environment, which latter can change.  More specifically,
> a function call is recursive if the expression indicating the function
> to call happens to indicate the function containing the call at the time
> of evaluation just before the evaluation of the argument expressions.

I didn't realize that the function looks up itself in the outer
environment when it makes the recursive call, instead of at definition
time.

> Adding 'inner = None' at the end of an outer function will break the
> cycle and with CPython, all will be collected when outer exits.

I think I'll use that for inner functions that do need to access the
outer environment, but do not need to live longer than the call to the
outer function.

> Not a bug, but an educational example and possibly useful to someone
> running on CPython with gc turned off and making lots of calls to
> functions with inner functions with recursive references.  I learned a
> bit answering this.

That describes our application: in some cases, we have several
gigabytes of small objects, in which case mark-and-sweep garbage
collection takes quite a long time, especially if some of the objects
have been pushed into the swap. I have broken all cycles in our own
data structures a while ago, but got an unexpected memory leak because
of these cyclic references from inner functions.

Thanks for your clear explanation!

Bye,
Maarten
--
http://mail.python.org/mailman/listinfo/python-list


Re: Garbage collection of recursive inner function

2008-08-04 Thread Terry Reedy



[EMAIL PROTECTED] wrote:

I encountered garbage collection behaviour that I didn't expect when
using a recursive function inside another function:


To understand this, it helps to realize that Python functions are not, 
in themselves, recursive.  Recursiveness at any time is a property of a 
function in an environment, which latter can change.  More specifically, 
a function call is recursive if the expression indicating the function 
to call happens to indicate the function containing the call at the time 
of evaluation just before the evaluation of the argument expressions. 
See examples below.


>  the definition of

the inner function seems to contain a circular reference, which means
it is only collected by the mark-and-sweep collector, not by reference
counting. Here is some code that demonstrates it:


The inner function is part of a circular reference that is originally 
part of the outer function, but which may survive the call to outer



def outer():
def inner(n):
if n == 0:
return 1
else:
return n * inner(n - 1)


   inner1 = inner
   def inner(n): return 1
# original inner still exists but is no longer 'recursive'

def out2():
  def inner1(n): return 1
  def inner(n):
if n: return n*inner1(n-1)
else: return 1
  # inner is obviously not recursive
  inner1 = inner
  # but now it is


If the inner function is moved outside the scope of the outer
function, gc.garbage will be empty.


With 'inner' in the global namespace, no (circular) closure is needed to 
keep it alive past the outer lifetime.


> If the inner function is inside

but not recursive, gc.garbage will also be empty.


Not necessarily so.  What matters is that inner has a non-local 
reference to outer's local name 'inner'.  Try

  def inner(): return inner
which contains no calls, recursive or otherwise.

> If the outer  function is called twice,
>  there will be twice as many objects in gc.garbage.

And so on, until gc happens.


Is this expected behaviour?   Collecting an object when its refcount
reaches zero is preferable to collecting it with mark-and-sweep, but


Adding 'inner = None' at the end of an outer function will break the 
cycle and with CPython, all will be collected when outer exits.

Jython and IronPython do not, I believe, do reference counting.

Adding 'del inner' gives
SyntaxError: cannot delete variable 'inner' referenced in inner scope.


maybe there is a reason that a circular reference must exist in this
situation. I want to check that first so I don't report a bug for
something that is not a bug.


Not a bug, but an educational example and possibly useful to someone 
running on CPython with gc turned off and making lots of calls to 
functions with inner functions with recursive references.  I learned a 
bit answering this.


Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list


Re: Garbage collection of recursive inner function

2008-08-04 Thread Diez B. Roggisch

[EMAIL PROTECTED] schrieb:

Hi,

I encountered garbage collection behaviour that I didn't expect when
using a recursive function inside another function: the definition of
the inner function seems to contain a circular reference, which means
it is only collected by the mark-and-sweep collector, not by reference
counting. Here is some code that demonstrates it:

===
def outer():

def inner(n):
if n == 0:
return 1
else:
return n * inner(n - 1)

return 42

import gc
gc.set_debug(gc.DEBUG_SAVEALL)
print outer()
gc.collect()
print gc.garbage
===

Output when executed:
$ python internal_func_gc.py
42
[, (,), ]

Note that the inner function is not called at all, it is only defined.
If the inner function is moved outside the scope of the outer
function, gc.garbage will be empty. If the inner function is inside
but not recursive, gc.garbage will also be empty. If the outer
function is called twice, there will be twice as many objects in
gc.garbage.

Is this expected behaviour? Collecting an object when its refcount
reaches zero is preferable to collecting it with mark-and-sweep, but
maybe there is a reason that a circular reference must exist in this
situation. I want to check that first so I don't report a bug for
something that is not a bug.


The reference comes from the closure of inner. And inner is part of the 
closure, so there is a circular reference.


I don't see a way to overcome this - consider the following code:

def outer():

  def inner():
  inner()

  if random.random() > .5:
 return inner


How is the GC/refcounting to know if it can create a reference or not?



Diez
--
http://mail.python.org/mailman/listinfo/python-list


Re: Garbage collection

2008-02-19 Thread Duncan Booth
Nick Craig-Wood <[EMAIL PROTECTED]> wrote:

> [<__main__.Y object at 0xb7d9fc8c>, <__main__.Y object at 0xb7d9fcac>,
> <__main__.Y object at 0xb7d9fc2c>] [<__main__.Y object at 0xb7d9fc8c>]
> 
> (It behaves slightly differently in the interactive interpreter for
> reasons I don't understand - so save it to a file and try it!)

Any expression in the interactive interpreter is implicitly assigned to 
the variable '_', so after your first call to Y.list() you've saved 
references to the complete list in _. Assignments aren't expressions so 
after assigning to a and c you haven't changed _. If you throw in 
another unrelated expression you'll be fine:

>>> a = Y()
>>> b = Y()
>>> c = Y()
>>> Y.list()
[<__main__.Y object at 0x0117F230>, <__main__.Y object at 0x0117F2B0>, 
<__main__.Y object at 0x0117F210>, <__main__.Y object at 0x0117F670>, 
<__main__.Y object at 0x0117F690>, <__main__.Y object at 0x0117F6B0>, 
<__main__.Y object at 0x0117F310>]
>>> a = 1
>>> c = 1
>>> c
1
>>> Y.list()
[<__main__.Y object at 0x0117F6B0>]

> In fact I find most of the times I wanted __del__ can be fixed by
> using a weakref.WeakValueDictionary or weakref.WeakKeyDictionary for a
> much better result.

The WeakValueDictionary is especially good when you want a Python 
wrapper round some external non-python thing, just use the address of 
the external thing as the key for the dictionary and you can avoid 
having duplicate Python objects.

The other option for classes involved in a cycle is to move the __del__ 
(and anything it needs) down to another class which isn't part of the 
cycle, so the original example becomes:

>>> class Monitor(object):
def __del__(self): print "gone"


>>> class X(object):
def __init__(self):
self._mon = Monitor()


>>> a = X()
>>> a = 1
gone
>>> b = X()
>>> b.someslot = b
>>> b = 1
>>> import gc
>>> gc.collect()
gone
8
>>> 

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Garbage collection

2008-02-19 Thread Nick Craig-Wood
Hrvoje Niksic <[EMAIL PROTECTED]> wrote:
>  Simon Pickles <[EMAIL PROTECTED]> writes:
> 
> > Ken wrote:
> >> What is your __del__ method doing?
> >>   
> > Actually, nothing but printing a message when the object is deleted,
> > just morbid curiosity.
> >
> > I've yet to see one of the destructor messages, tho
> 
>  Do your objects participate in reference cycles?  In that case they
>  are deallocated by the cycle collector, and the cycle collector
>  doesn't invoke __del__.
> 
> >>> class X(object):
>  ...   def __del__(self): print "gone"
>  ...
> >>> a = X()
> >>> a = 1
>  gone
> >>> b = X()
> >>> b.someslot = b
> >>> b = 1
> >>> import gc
> >>> gc.collect()
>  0
> >>> 

If you want to avoid this particular problem, use a weakref.

  >>> c = X()
  >>> from weakref import proxy
  >>> c.weak_reference = proxy(c)
  >>> c.weak_reference.__del__
  >
  >>> c = 1
  >>> gc.collect()
  gone
  0
  >>>  

Or perhaps slightly more realistically, here is an example of using a
WeakKeyDictionary instead of __del__ methods for keeping an accurate
track of all classes of a given type.

from weakref import WeakKeyDictionary

class Y(object):
_registry = WeakKeyDictionary()
def __init__(self):
self._registry[self] = True
@classmethod
def list(cls):
return cls._registry.keys()

a = Y()
b = Y()
c = Y()
Y.list()
a = 1
c = 1
Y.list()

Which produces the output

[<__main__.Y object at 0xb7d9fc8c>, <__main__.Y object at 0xb7d9fcac>, 
<__main__.Y object at 0xb7d9fc2c>]
[<__main__.Y object at 0xb7d9fc8c>]

(It behaves slightly differently in the interactive interpreter for
reasons I don't understand - so save it to a file and try it!)

In fact I find most of the times I wanted __del__ can be fixed by
using a weakref.WeakValueDictionary or weakref.WeakKeyDictionary for a
much better result.

-- 
Nick Craig-Wood <[EMAIL PROTECTED]> -- http://www.craig-wood.com/nick
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Garbage collection

2008-02-19 Thread Duncan Booth
Jarek Zgoda <[EMAIL PROTECTED]> wrote:

> Is that true assumption that __del__ has the same purpose (and same
> limitations, i.e. the are not guaranteed to be fired) as Java finalizer
> methods?

One other point I should have mentioned about __del__: if you are running 
under Windows and the user hits Ctrl+Break then unless you handle it Python 
will exit without doing any cleanup at all (as opposed to any other method 
of exiting such as Ctrl+C). If this matters to you then you can install a 
signal handler to catch the Ctrl+Break and exit cleanly.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Garbage collection

2008-02-19 Thread Duncan Booth
Jarek Zgoda <[EMAIL PROTECTED]> wrote:

> Duncan Booth napisa³(a):
> 
>> Pretty much. If you have a __del__ method on an object then in the
>> worst case the only thing that can be guaranteed is that it will be
>> called zero, one or more than one times. (Admittedly the last of
>> these only happens if you work at it).
>> 
>> If it is called then is may be either immediately the last reference
>> to the object is lost, or it may be later when the garbage collector
>> runs (and not necessarily the first next time the garbage collector
>> runs). 
> 
> Java finalizers are not called upon VM exit, only when object is swept
> by GC (for example when the object is destroyed upon program exit),
> the CPython docs read that this is the case for Python too. Is this
> behaviour standard for all VM implementations or is
> implementation-dependent (CPython, Jython, IronPython)?
> 
Yes, CPython does reference counting so it can call __del__ immediately 
an object is unreferenced. The GC only comes into play when there is a 
reference loop involved. If an object is directly involved in a 
reference loop then __del__ is not called for that object, but a loop 
could reference another object and its __del__ would be called when the 
loop was collected.

Other Python implementations may behave differently: presumably Jython 
works as for Java (but I don't know the details of that), and IronPython 
uses the CLR which has its own peculiarities: finalizers are all called 
on a single thread which is *not* the thread used to construct the 
object, so if you use finalizers in a CLR program your program is 
necessarily multi-threaded with all that implies. Also it takes at least 
two GC cycles to actually release memory on a CLR object with a 
finalizer, on the first cycle objects subject to finalization are simply 
added to a list (so are again referenceable), on the second cycle if the 
finalizer has completed and the object is unreferenced it can be 
collected. CLR finalizers also have the interesting quirk that before 
the finalizer is called any references the object has to other objects 
are cleared: that allows the system to call finalizers in any order.

Otherwise I think the behaviour on exit is pretty standard. If I 
remember correctly there is a final garbage collection to give 
finalizers a chance to run. Any objects which become newly unreferenced 
as a result of that garbage collection will have __del__ called as 
usual, but any which merely become unreachable and therefore would be 
caught in a subsequent garbage collection won't.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Garbage collection

2008-02-19 Thread Hrvoje Niksic
Simon Pickles <[EMAIL PROTECTED]> writes:

> Ken wrote:
>> What is your __del__ method doing?
>>   
> Actually, nothing but printing a message when the object is deleted,
> just morbid curiosity.
>
> I've yet to see one of the destructor messages, tho

Do your objects participate in reference cycles?  In that case they
are deallocated by the cycle collector, and the cycle collector
doesn't invoke __del__.

>>> class X(object):
...   def __del__(self): print "gone"
...
>>> a = X()
>>> a = 1
gone
>>> b = X()
>>> b.someslot = b
>>> b = 1
>>> import gc
>>> gc.collect()
0
>>> 
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Garbage collection

2008-02-19 Thread Jarek Zgoda
Duncan Booth napisał(a):

> Pretty much. If you have a __del__ method on an object then in the worst 
> case the only thing that can be guaranteed is that it will be called zero, 
> one or more than one times. (Admittedly the last of these only happens if 
> you work at it).
> 
> If it is called then is may be either immediately the last reference to the 
> object is lost, or it may be later when the garbage collector runs (and not 
> necessarily the first next time the garbage collector runs).

Java finalizers are not called upon VM exit, only when object is swept
by GC (for example when the object is destroyed upon program exit), the
CPython docs read that this is the case for Python too. Is this
behaviour standard for all VM implementations or is
implementation-dependent (CPython, Jython, IronPython)?

-- 
Jarek Zgoda
Skype: jzgoda | GTalk: [EMAIL PROTECTED] | voice: +48228430101

"We read Knuth so you don't have to." (Tim Peters)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Garbage collection

2008-02-19 Thread Duncan Booth
Jarek Zgoda <[EMAIL PROTECTED]> wrote:

> Ken napisa³(a):
> 
>> The good news is that you almost never have to do anything to clean up. 
>> My guess is that you might not even need to overload __del__ at all. 
>> People from a C++ background often mistakenly think that they have to
>> write destructors when in fact they do not.
> 
> Is that true assumption that __del__ has the same purpose (and same
> limitations, i.e. the are not guaranteed to be fired) as Java finalizer
> methods?
> 

Pretty much. If you have a __del__ method on an object then in the worst 
case the only thing that can be guaranteed is that it will be called zero, 
one or more than one times. (Admittedly the last of these only happens if 
you work at it).

If it is called then is may be either immediately the last reference to the 
object is lost, or it may be later when the garbage collector runs (and not 
necessarily the first next time the garbage collector runs).

The nasty case to watch for is when __del__ is called while the program is 
exiting: any global variables in the module may have already been cleared 
so you cannot be sure that you can reference anything other than attributes 
on the object being destroyed (and if you call methods on the same or other 
objects they may also find they cannot reference all their globals).

Fortunately cases when you actually need to use __del__ are very rare.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Garbage collection

2008-02-19 Thread Jarek Zgoda
Ken napisał(a):

> The good news is that you almost never have to do anything to clean up. 
> My guess is that you might not even need to overload __del__ at all. 
> People from a C++ background often mistakenly think that they have to
> write destructors when in fact they do not.

Is that true assumption that __del__ has the same purpose (and same
limitations, i.e. the are not guaranteed to be fired) as Java finalizer
methods?

-- 
Jarek Zgoda
Skype: jzgoda | GTalk: [EMAIL PROTECTED] | voice: +48228430101

"We read Knuth so you don't have to." (Tim Peters)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Garbage collection

2008-02-19 Thread Simon Pickles
Ken wrote:
> What is your __del__ method doing?
>   
Actually, nothing but printing a message when the object is deleted, 
just morbid curiosity.

I've yet to see one of the destructor messages, tho

>
>   from sys import getrefcount
>   print getrefcount(x)
>
>   
Perfect, thanks

Simon
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Garbage collection

2008-02-18 Thread Aahz
In article <[EMAIL PROTECTED]>,
Ken  <[EMAIL PROTECTED]> wrote:
>Simon Pickles wrote:
>>
>> For instance, I have a manager looking after many objects in a dict. 
>> When those objects are no longer needed, I use del manager[objectid], 
>> hoping to force the garbage collector to perform the delete.
>>
>> However, this doesn't trigger my overloaded __del__ destructor. Can I 
>> simply rely on the python garbage collector to take it from here?
>   
>Objects are deleted at some undefined time after there are no references 
>to the object.

Assuming we're talking about CPython, objects are deleted immediately
when there are no references to the object.  The problem is that it's
not always obvious when the refcount goes to zero.
-- 
Aahz ([EMAIL PROTECTED])   <*> http://www.pythoncraft.com/

"All problems in computer science can be solved by another level of 
indirection."  --Butler Lampson
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Garbage collection

2008-02-18 Thread Ken
Simon Pickles wrote:
> Hi,
>
> I'm building a server with python, but coming from a c++ background, 
> garbage collection seems strange.
>
> For instance, I have a manager looking after many objects in a dict. 
> When those objects are no longer needed, I use del manager[objectid], 
> hoping to force the garbage collector to perform the delete.
>
> However, this doesn't trigger my overloaded __del__ destructor. Can I 
> simply rely on the python garbage collector to take it from here?
>   
Objects are deleted at some undefined time after there are no references 
to the object.

You will need to change your thinking about how destructors work.  It is 
very different from C++.

The good news is that you almost never have to do anything to clean up.  
My guess is that you might not even need to overload __del__ at all.  
People from a C++ background often mistakenly think that they have to 
write destructors when in fact they do not.  What is your __del__ method 
doing?
> Is there a way to find how many references exist for an object?
>   
yes:

  from sys import getrefcount
  print getrefcount(x)


> Thanks
>
> Simon
>
>   

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Garbage collection

2007-03-22 Thread Alex Martelli
Steve Holden <[EMAIL PROTECTED]> wrote:
   ...
> > a. fork
> > b. do the memory-hogging work in the child process
> > c. meanwhile the parent just waits
> > d. the child sends back to the parent the small results
> > e. the child terminates
> > f. the parent proceeds merrily
> > 
> > I learned this architectural-pattern a long, long time ago, around the
> > time when fork first got implemented via copy-on-write pages... 
> > 
> Yup, it's easier to be pragmatic and find the real solution to your 
> problem than it is to try and mould reality to your idea of what the 
> solution should be ...

"That's why all progress is due to the unreasonable man", hm?-)


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Garbage collection

2007-03-22 Thread Alex Martelli
Tom Wright <[EMAIL PROTECTED]> wrote:

> real programs.  I can't help thinking that there are some situations where
> you need a lot of memory for a short time though, and it would be nice to
> be able to use it briefly and then hand most of it back.  Still, I see the
> practical difficulties with doing this.

What I do in those cases:
a. fork
b. do the memory-hogging work in the child process
c. meanwhile the parent just waits
d. the child sends back to the parent the small results
e. the child terminates
f. the parent proceeds merrily

I learned this architectural-pattern a long, long time ago, around the
time when fork first got implemented via copy-on-write pages... 


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Garbage collection

2007-03-22 Thread Steve Holden
Alex Martelli wrote:
> Tom Wright <[EMAIL PROTECTED]> wrote:
> 
>> real programs.  I can't help thinking that there are some situations where
>> you need a lot of memory for a short time though, and it would be nice to
>> be able to use it briefly and then hand most of it back.  Still, I see the
>> practical difficulties with doing this.
> 
> What I do in those cases:
> a. fork
> b. do the memory-hogging work in the child process
> c. meanwhile the parent just waits
> d. the child sends back to the parent the small results
> e. the child terminates
> f. the parent proceeds merrily
> 
> I learned this architectural-pattern a long, long time ago, around the
> time when fork first got implemented via copy-on-write pages... 
> 
Yup, it's easier to be pragmatic and find the real solution to your 
problem than it is to try and mould reality to your idea of what the 
solution should be ...

regards
  Steve
-- 
Steve Holden   +44 150 684 7255  +1 800 494 3119
Holden Web LLC/Ltd  http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
Recent Ramblings   http://holdenweb.blogspot.com

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Garbage collection

2007-03-21 Thread Cameron Laird
In article <[EMAIL PROTECTED]>,
Dennis Lee Bieber  <[EMAIL PROTECTED]> wrote:
>On Wed, 21 Mar 2007 15:32:17 +, Tom Wright <[EMAIL PROTECTED]>
>declaimed the following in comp.lang.python:
>
>> 
>> True, but why does Python hang on to the memory at all?  As I understand it,
>> it's keeping a big lump of memory on the int free list in order to make
>> future allocations of large numbers of integers faster.  If that memory is
>> about to be paged out, then surely future allocations of integers will be
>> *slower*, as the system will have to:
>> 
>   It may not just be that free list -- which on a machine with lots of
>RAM may never be paged out anyway [mine (XP) currently shows: physical
>memory total/available/system: 2095196/1355296/156900K, commit charge
>total/limit/peak: 514940/3509272/697996K (limit includes page/swap file
>of 1.5GB)] -- it could easily just be that the OS or runtime just
>doesn't return memory to the OS until a process/executable image exits.
.
.
.
... and there *are* (or at least have been) situations where
it was profitable for an application which knew it had finished
its memory-intensive work to branch to a new instance of itself
in a smaller memory space.  That's the only, and necessarily
"tricky", answer to the question about how to make sure all that
free stuff gets back to the OS.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Garbage collection

2007-03-21 Thread Aahz
In article <[EMAIL PROTECTED]>,
Nick Craig-Wood  <[EMAIL PROTECTED]> wrote:
>Steven D'Aprano <[EMAIL PROTECTED]> wrote:
>>
>>  Or you could just have an "object leak" somewhere. Do you have any
>>  complicated circular references that the garbage collector can't resolve?
>>  Lists-of-lists? Trees? Anything where objects aren't being freed when you
>>  think they are? Are you holding on to references to lists? It's more
>>  likely that your code simply isn't freeing lists you think are being freed
>>  than it is that Python is holding on to tens of megabytes of random
>>  text.
>
>This is surely just the fragmented heap problem.

Possibly.  I believe PyMalloc doesn't have as much a problem in this
area, but off-hand I don't remember the extent to which strings use
PyMalloc.  Nevertheless, my bet is on holding references as the problem
with doubled memory use.
-- 
Aahz ([EMAIL PROTECTED])   <*> http://www.pythoncraft.com/

"Typing is cheap.  Thinking is expensive."  --Roy Smith
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Garbage collection

2007-03-21 Thread Nick Craig-Wood
Steven D'Aprano <[EMAIL PROTECTED]> wrote:
>  Or you could just have an "object leak" somewhere. Do you have any
>  complicated circular references that the garbage collector can't resolve?
>  Lists-of-lists? Trees? Anything where objects aren't being freed when you
>  think they are? Are you holding on to references to lists? It's more
>  likely that your code simply isn't freeing lists you think are being freed
>  than it is that Python is holding on to tens of megabytes of random
>  text.

This is surely just the fragmented heap problem.

It is a hard problem returning unused memory to the OS since it
usually comes in page size (4k) chunks and you can only return pages
on the end of your memory (the sbrk() interface).

The glibc allocator uses mmap() for large allocations which *can* be
returned to the OS without any fragmentation worries.

However if you have lots of small allocations then the heap will be
fragmented and you'll never be able to return the memory to the OS.

However that is why we have virtual memory systems.

-- 
Nick Craig-Wood <[EMAIL PROTECTED]> -- http://www.craig-wood.com/nick
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Garbage collection

2007-03-21 Thread Steven D'Aprano
On Wed, 21 Mar 2007 17:19:23 +, Tom Wright wrote:

>> So what's your actual problem that you are trying to solve?
> 
> I have a program which reads a few thousand text files, converts each to a
> list (with readlines()), creates a short summary of the contents of each (a
> few floating point numbers) and stores this summary in a master list.  From
> the amount of memory it's using, I think that the lists containing the
> contents of each file are kept in memory, even after there are no
> references to them.  Also, if I tell it to discard the master list and
> re-read all the files, the memory use nearly doubles so I presume it's
> keeping the lot in memory.

Ah, now we're getting somewhere!

Python's caching behaviour with strings is almost certainly going to be
different to its caching behaviour with ints. (For example, Python caches
short strings that look like identifiers, but I don't believe it caches
great blocks of text or short strings which include whitespace.)

But again, you haven't really described a problem, just a set of
circumstances. Yes, the memory usage doubles. *Is* that a problem in
practice? A few thousand 1KB files is one thing; a few thousand 1MB files
is an entirely different story.

Is the most cost-effective solution to the problem to buy another 512MB of
RAM? I don't say that it is. I just point out that you haven't given us
any reason to think it isn't.


> The program may run through several collections of files, but it only keeps
> a reference to the master list of the most recent collection it's looked
> at.  Obviously, it's not ideal if all the old collections hang around too,
> taking up space and causing the machine to swap.

Without knowing exactly what your doing with the data, it's hard to tell
where the memory is going. I suppose if you are storing huge lists of
millions of short strings (words?), they might all be cached. Is there a
way you can avoid storing the hypothetical word-lists in RAM, perhaps by
writing them straight out to a disk file? That *might* make a
difference to the caching algorithm used.

Or you could just have an "object leak" somewhere. Do you have any
complicated circular references that the garbage collector can't resolve?
Lists-of-lists? Trees? Anything where objects aren't being freed when you
think they are? Are you holding on to references to lists? It's more
likely that your code simply isn't freeing lists you think are being freed
than it is that Python is holding on to tens of megabytes of random text.



-- 
Steven.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Garbage collection

2007-03-21 Thread Steve Holden
Tom Wright wrote:
> Steven D'Aprano wrote:
>> You've described an extremely artificial set of circumstances: you create
>> 40,000,000 distinct integers, then immediately destroy them. The obvious
>> solution to that "problem" of Python caching millions of integers you
>> don't need is not to create them in the first place.
> 
> I know it's a very artificial setup - I was trying to make the situation
> simple to demonstrate in a few lines.  The point was that it's not caching
> the values of those integers, as they can never be read again through the
> Python interface.  It's just holding onto the space they occupy in case
> it's needed again.
> 
>> So what's your actual problem that you are trying to solve?
> 
> I have a program which reads a few thousand text files, converts each to a
> list (with readlines()), creates a short summary of the contents of each (a
> few floating point numbers) and stores this summary in a master list.  From
> the amount of memory it's using, I think that the lists containing the
> contents of each file are kept in memory, even after there are no
> references to them.  Also, if I tell it to discard the master list and
> re-read all the files, the memory use nearly doubles so I presume it's
> keeping the lot in memory.
> 
I'd like to bet you are keeping references to them without realizing it. 
The interpreter won't generally allocate memory that it can get by 
garbage collection, and reference counting pretty much eliminates the 
need for garbage collection anyway except when you create cyclic data 
structures.

> The program may run through several collections of files, but it only keeps
> a reference to the master list of the most recent collection it's looked
> at.  Obviously, it's not ideal if all the old collections hang around too,
> taking up space and causing the machine to swap.
> 
We may need to see code here for you to convince us of the correctness 
of your hypothesis. It sounds pretty screwy to me.

>>> but is there anything I can do to get that memory back without closing
>>> Python?
>> Why do you want to manage memory yourself anyway? It seems like a
>> horrible, horrible waste to use a language designed to manage memory for
>> you, then insist on over-riding it's memory management.
> 
> I agree.  I don't want to manage it myself.  I just want it to re-use memory
> or hand it back to the OS if it's got an awful lot that it's not using. 
> Wouldn't you say it was wasteful if (say) an image editor kept an
> uncompressed copy of an image around in memory after the image had been
> closed?
> 
Yes, but I'd say it was the programmer's fault if it turned out that the 
interpreter wasn't doing anything wrong ;-) It could be something inside 
an exception handler that is keeping a reference to a stack frame or 
something silly like that.

regards
  Steve
-- 
Steve Holden   +44 150 684 7255  +1 800 494 3119
Holden Web LLC/Ltd  http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
Recent Ramblings   http://holdenweb.blogspot.com

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Garbage collection

2007-03-21 Thread Tom Wright
[EMAIL PROTECTED] wrote:
> If your program's behavior is:
> 
> * allocate a list of 1e7 ints
> * delete that list
> 
> how does the Python interpreter know your next bit of execution won't be
> to repeat the allocation?

It doesn't know, but if the program runs for a while without repeating it,
it's a fair bet that it won't mind waiting the next time it does a big
allocation.  How long 'a while' is would obviously be open to debate.

> In addition, checking to see that an arena in 
> the free list can be freed is itself not a free operation.
> (snip thorough explanation)

Yes, that's a good point.  It looks like the list is designed for speedy
re-use of the memory it points to, which seems like a good choice.  I quite
agree that it should hang on to *some* memory, and perhaps my artificial
situation has shown this as a problem when it wouldn't cause any issues for
real programs.  I can't help thinking that there are some situations where
you need a lot of memory for a short time though, and it would be nice to
be able to use it briefly and then hand most of it back.  Still, I see the
practical difficulties with doing this.

-- 
I'm at CAMbridge, not SPAMbridge
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Garbage collection

2007-03-21 Thread Tom Wright
Steve Holden wrote:
> Easy to say. How do you know the memory that's not in use is in a
> contiguous block suitable for return to the operating system? I can
> pretty much guarantee it won't be. CPython doesn't use a relocating
> garbage collection scheme

Fair point.  That is difficult and I don't see a practical solution to it
(besides substituting a relocating garbage collector, which seems like a
major undertaking).

> Right. So all we have to do is identify those portions of memory that
> will never be read again and return them to the OS. That should be easy.
> Not.

Well, you have this nice int free list which points to all the bits which
will never be read again (they might be written to, but if you're writing
without reading then it doesn't really matter where you do it).  The point
about contiguous chunks still applies though.


-- 
I'm at CAMbridge, not SPAMbridge
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Garbage collection

2007-03-21 Thread Tom Wright
Steven D'Aprano wrote:
> You've described an extremely artificial set of circumstances: you create
> 40,000,000 distinct integers, then immediately destroy them. The obvious
> solution to that "problem" of Python caching millions of integers you
> don't need is not to create them in the first place.

I know it's a very artificial setup - I was trying to make the situation
simple to demonstrate in a few lines.  The point was that it's not caching
the values of those integers, as they can never be read again through the
Python interface.  It's just holding onto the space they occupy in case
it's needed again.

> So what's your actual problem that you are trying to solve?

I have a program which reads a few thousand text files, converts each to a
list (with readlines()), creates a short summary of the contents of each (a
few floating point numbers) and stores this summary in a master list.  From
the amount of memory it's using, I think that the lists containing the
contents of each file are kept in memory, even after there are no
references to them.  Also, if I tell it to discard the master list and
re-read all the files, the memory use nearly doubles so I presume it's
keeping the lot in memory.

The program may run through several collections of files, but it only keeps
a reference to the master list of the most recent collection it's looked
at.  Obviously, it's not ideal if all the old collections hang around too,
taking up space and causing the machine to swap.

>> but is there anything I can do to get that memory back without closing
>> Python?
> 
> Why do you want to manage memory yourself anyway? It seems like a
> horrible, horrible waste to use a language designed to manage memory for
> you, then insist on over-riding it's memory management.

I agree.  I don't want to manage it myself.  I just want it to re-use memory
or hand it back to the OS if it's got an awful lot that it's not using. 
Wouldn't you say it was wasteful if (say) an image editor kept an
uncompressed copy of an image around in memory after the image had been
closed?

-- 
I'm at CAMbridge, not SPAMbridge
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Garbage collection

2007-03-21 Thread Steven D'Aprano
On Wed, 21 Mar 2007 15:32:17 +, Tom Wright wrote:

>> Memory contention would be a problem if your Python process wanted to keep
>> that memory active at the same time as you were running GIMP.
> 
> True, but why does Python hang on to the memory at all?  As I understand it,
> it's keeping a big lump of memory on the int free list in order to make
> future allocations of large numbers of integers faster.  If that memory is
> about to be paged out, then surely future allocations of integers will be
> *slower*, as the system will have to:
> 
> 1) page out something to make room for the new integers
> 2) page in the relevant chunk of the int free list
> 3) zero all of this memory and do any other formatting required by Python
> 
> If Python freed (most of) the memory when it had finished with it, then all
> the system would have to do is:
> 
> 1) page out something to make room for the new integers
> 2) zero all of this memory and do any other formatting required by Python
> 
> Surely Python should free the memory if it's not been used for a certain
> amount of time (say a few seconds), as allocation times are not going to be
> the limiting factor if it's gone unused for that long.  Alternatively, it
> could mark the memory as some sort of cache, so that if it needed to be
> paged out, it would instead be de-allocated (thus saving the time taken to
> page it back in again when it's next needed)

And increasing the time it takes to re-create the objects in the cache
subsequently.

Maybe this extra effort is worthwhile when the free int list holds 10**7
ints, but is it worthwhile when it holds 10**6 ints? How about 10**5 ints?
10**3 ints?

How many free ints is "typical" or even "common" in practice?

The lesson I get from this is, instead of creating such an enormous list
of integers in the first place with range(), use xrange() instead.

Fresh running instance of Python 2.5:

$ ps up 9579
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
steve 9579  0.0  0.2   6500  2752 pts/7S+   03:42   0:00 python2.5


Run from within Python:

>>> n = 0
>>> for i in xrange(int(1e7)):
... # create lots of ints, one at a time
... # instead of all at once
... n += i # make sure the int is used
...
>>> n
499500L


And the output of ps again:

$ ps up 9579
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
steve 9579  4.2  0.2   6500  2852 pts/7S+   03:42   0:11 python2.5

Barely moved a smidgen.

For comparison, here's what ps reports after I create a single list with
range(int(1e7)), and again after I delete the list:

$ ps up 9579 # after creating list with range(int(1e7))
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
steve 9579  1.9 15.4 163708 160056 pts/7   S+   03:42   0:11 python2.5

$ ps up 9579 # after deleting list
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
steve 9579  1.7 11.6 124632 120992 pts/7   S+   03:42   0:12 python2.5


So there is another clear advantage to using xrange instead of range,
unless you specifically need all ten million ints all at once.



-- 
Steven.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Garbage collection

2007-03-21 Thread Steven D'Aprano
On Wed, 21 Mar 2007 15:03:17 +, Tom Wright wrote:

[snip]

> Ah, thanks for explaining that.  I'm a little wiser about memory allocation
> now, but am still having problems reclaiming memory from unused objects
> within Python.  If I do the following:
> 

> (memory use: 15 MB)
 a = range(int(4e7))
> (memory use: 1256 MB)
 a = None
> (memory use: 953 MB)
> 
> ...and then I allocate a lot of memory in another process (eg. open a load
> of files in the GIMP), then the computer swaps the Python process out to
> disk to free up the necessary space.  Python's memory use is still reported
> as 953 MB, even though nothing like that amount of space is needed.

Who says it isn't needed? Just because *you* have only one object
existing, doesn't mean the Python environment has only one object existing.


> From what you said above, the problem is in the underlying C libraries, 

What problem? 

Nothing you've described seems like a problem to me. It sounds like a
modern, 21st century operating system and programming language working
like they should. Why do you think this is a problem?

You've described an extremely artificial set of circumstances: you create
40,000,000 distinct integers, then immediately destroy them. The obvious
solution to that "problem" of Python caching millions of integers you
don't need is not to create them in the first place.

In real code, the chances are that if you created 4e7 distinct integers
you'll probably need them again -- hence the cache. So what's your actual
problem that you are trying to solve?


> but is there anything I can do to get that memory back without closing
> Python?

Why do you want to manage memory yourself anyway? It seems like a
horrible, horrible waste to use a language designed to manage memory for
you, then insist on over-riding it's memory management.

I'm not saying that there is never any good reason for fine control of the
Python environment, but this doesn't look like one to me.


-- 
Steven.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Garbage collection

2007-03-21 Thread skip

Tom> True, but why does Python hang on to the memory at all?  As I
Tom> understand it, it's keeping a big lump of memory on the int free
Tom> list in order to make future allocations of large numbers of
Tom> integers faster.  If that memory is about to be paged out, then
Tom> surely future allocations of integers will be *slower*, as the
Tom> system will have to:

Tom> 1) page out something to make room for the new integers
Tom> 2) page in the relevant chunk of the int free list
Tom> 3) zero all of this memory and do any other formatting required by
Tom>Python 

If your program's behavior is:

* allocate a list of 1e7 ints
* delete that list

how does the Python interpreter know your next bit of execution won't be to
repeat the allocation?  In addition, checking to see that an arena in the
free list can be freed is itself not a free operation.  From the comments at
the top of intobject.c:

   free_list is a singly-linked list of available PyIntObjects, linked
   via abuse of their ob_type members.

Each time an int is allocated, the free list is checked to see if it's got a
spare object lying about sloughin off.  If so, it is plucked from the list
and reinitialized appropriately.  If not, a new block of memory sufficient
to hold about 250 ints is grabbed via a call to malloc, which *might* have
to grab more memory from the OS.  Once that block is allocated, it's strung
together into a free list via the above ob_type slot abuse.  Then the 250 or
so items are handed out one-by-one as needed and stitched back into the free
list as they are freed.

Now consider how difficult it is to decide if that block of 250 or so
objects is all unused so that we can free() it.  We have to walk through the
list and check to see if that chunk is in the free list.  That's complicated
by the fact that the ref count fields aren't initialized to zero until a
particular chunk is first used as an allocated int object and would have to
be to support this block free operation (=> more cost up front).  Still,
assume we can semi-efficiently determine that a particular block is composed
of all freed int-object-sized chunks.  We will then unstitch it from the
chain of blocks and call free() to free it.  Still, we are left with the
behavior of the operating system's malloc/free implementation.  It probably
won't sbrk() the block back to the OS, so after all that work your process
still holds the memory.

Okay, so malloc/free won't work.  We could boost the block size up to the
size of a page and use mmap() to map a page into memory.  I suspect that
would become still more complicated to implement, and the block size being
probably about eight times larger than the current block size would incur
even more cost to determine if it was full of nothing but freed objects.

Tom> If Python freed (most of) the memory when it had finished with it,
Tom> then all the system would have to do is:

That's the rub.  Figuring out when it is truly "finished" with the memory.

Tom> Surely Python should free the memory if it's not been used for a
Tom> certain amount of time (say a few seconds), as allocation times are
Tom> not going to be the limiting factor if it's gone unused for that
Tom> long.

This is generally the point in such discussions where I respond with
something like, "patches cheerfully accepted". ;-) If you're interested in
digging into this, have a look at the free list implementation in
Objects/intobject.c.  It might make for a good Google Summer of Code
project:

http://code.google.com/soc/psf/open.html
http://code.google.com/soc/psf/about.html

but I'm not the guy you want mentoring such a project.  There are a lot of
people who understand the ins and outs of Python's memory allocation code
much better than I do.

Tom> I've also tested similar situations on Python under Windows XP, and
Tom> it shows the same behaviour, so I think this is a Python and/or
Tom> GCC/libc issue, rather than an OS issue (assuming Python for linux
Tom> and Python for windows are both compiled with GCC).

Sure, my apologies.  The malloc/free implementation is strictly speaking not
part of the operating system.  I tend to mentally lump them together because
it's uncommon for people to use a malloc/free implementation different than
the one delivered with their computer.

Skip
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Garbage collection

2007-03-21 Thread Steve Holden
Tom Wright wrote:
> [EMAIL PROTECTED] wrote:
>> Tom> ...and then I allocate a lot of memory in another process (eg.
>> open Tom> a load of files in the GIMP), then the computer swaps the
>> Python
>> Tom> process out to disk to free up the necessary space.  Python's
>> Tom> memory use is still reported as 953 MB, even though nothing like
>> Tom> that amount of space is needed.  From what you said above, the
>> Tom> problem is in the underlying C libraries, but is there anything I
>> Tom> can do to get that memory back without closing Python?
>>
>> Not really.  I suspect the unused pages of your Python process are paged
>> out, but that Python has just what it needs to keep going.
> 
> Yes, that's what's happening.
> 
>> Memory contention would be a problem if your Python process wanted to keep
>> that memory active at the same time as you were running GIMP.
> 
> True, but why does Python hang on to the memory at all?  As I understand it,
> it's keeping a big lump of memory on the int free list in order to make
> future allocations of large numbers of integers faster.  If that memory is
> about to be paged out, then surely future allocations of integers will be
> *slower*, as the system will have to:
> 
> 1) page out something to make room for the new integers
> 2) page in the relevant chunk of the int free list
> 3) zero all of this memory and do any other formatting required by Python
> 
> If Python freed (most of) the memory when it had finished with it, then all
> the system would have to do is:
> 
> 1) page out something to make room for the new integers
> 2) zero all of this memory and do any other formatting required by Python
> 
> Surely Python should free the memory if it's not been used for a certain
> amount of time (say a few seconds), as allocation times are not going to be
> the limiting factor if it's gone unused for that long.  Alternatively, it
> could mark the memory as some sort of cache, so that if it needed to be
> paged out, it would instead be de-allocated (thus saving the time taken to
> page it back in again when it's next needed)
> 
Easy to say. How do you know the memory that's not in use is in a 
contiguous block suitable for return to the operating system? I can 
pretty much guarantee it won't be. CPython doesn't use a relocating 
garbage collection scheme, so objects always stay at the same place in 
the process's virtual memory unless they have to be grown to accommodate 
additional data.
> 
>> I think the process's resident size is more important here than virtual
>> memory size (as long as you don't exhaust swap space). 
> 
> True in theory, but the computer does tend to go rather sluggish when paging
> large amounts out to disk and back.  Surely the use of virtual memory
> should be avoided where possible, as it is so slow?  This is especially
> true when the contents of the blocks paged out to disk will never be read
> again.
> 
Right. So all we have to do is identify those portions of memory that 
will never be read again and return them to the OS. That should be easy. 
Not.
> 
> I've also tested similar situations on Python under Windows XP, and it shows
> the same behaviour, so I think this is a Python and/or GCC/libc issue,
> rather than an OS issue (assuming Python for linux and Python for windows
> are both compiled with GCC).
> 
It's probably a dynamic memory issue. Of course if you'd like to provide 
a patch to switch it over to a relocating garbage collection scheme 
we'll all await it with  bated breath :)

regards
  Steve
-- 
Steve Holden   +44 150 684 7255  +1 800 494 3119
Holden Web LLC/Ltd  http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
Recent Ramblings   http://holdenweb.blogspot.com

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Garbage collection

2007-03-21 Thread Tom Wright
[EMAIL PROTECTED] wrote:
> Tom> ...and then I allocate a lot of memory in another process (eg.
> open Tom> a load of files in the GIMP), then the computer swaps the
> Python
> Tom> process out to disk to free up the necessary space.  Python's
> Tom> memory use is still reported as 953 MB, even though nothing like
> Tom> that amount of space is needed.  From what you said above, the
> Tom> problem is in the underlying C libraries, but is there anything I
> Tom> can do to get that memory back without closing Python?
> 
> Not really.  I suspect the unused pages of your Python process are paged
> out, but that Python has just what it needs to keep going.

Yes, that's what's happening.

> Memory contention would be a problem if your Python process wanted to keep
> that memory active at the same time as you were running GIMP.

True, but why does Python hang on to the memory at all?  As I understand it,
it's keeping a big lump of memory on the int free list in order to make
future allocations of large numbers of integers faster.  If that memory is
about to be paged out, then surely future allocations of integers will be
*slower*, as the system will have to:

1) page out something to make room for the new integers
2) page in the relevant chunk of the int free list
3) zero all of this memory and do any other formatting required by Python

If Python freed (most of) the memory when it had finished with it, then all
the system would have to do is:

1) page out something to make room for the new integers
2) zero all of this memory and do any other formatting required by Python

Surely Python should free the memory if it's not been used for a certain
amount of time (say a few seconds), as allocation times are not going to be
the limiting factor if it's gone unused for that long.  Alternatively, it
could mark the memory as some sort of cache, so that if it needed to be
paged out, it would instead be de-allocated (thus saving the time taken to
page it back in again when it's next needed)


> I think the process's resident size is more important here than virtual
> memory size (as long as you don't exhaust swap space). 

True in theory, but the computer does tend to go rather sluggish when paging
large amounts out to disk and back.  Surely the use of virtual memory
should be avoided where possible, as it is so slow?  This is especially
true when the contents of the blocks paged out to disk will never be read
again.


I've also tested similar situations on Python under Windows XP, and it shows
the same behaviour, so I think this is a Python and/or GCC/libc issue,
rather than an OS issue (assuming Python for linux and Python for windows
are both compiled with GCC).

-- 
I'm at CAMbridge, not SPAMbridge
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Garbage collection

2007-03-21 Thread skip

Tom> ...and then I allocate a lot of memory in another process (eg. open
Tom> a load of files in the GIMP), then the computer swaps the Python
Tom> process out to disk to free up the necessary space.  Python's
Tom> memory use is still reported as 953 MB, even though nothing like
Tom> that amount of space is needed.  From what you said above, the
Tom> problem is in the underlying C libraries, but is there anything I
Tom> can do to get that memory back without closing Python?

Not really.  I suspect the unused pages of your Python process are paged
out, but that Python has just what it needs to keep going.  Memory
contention would be a problem if your Python process wanted to keep that
memory active at the same time as you were running GIMP.  I think the
process's resident size is more important here than virtual memory size (as
long as you don't exhaust swap space).

Skip
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Garbage collection

2007-03-21 Thread Tom Wright
[EMAIL PROTECTED] wrote:
> You haven't forgotten to do anything.  Your attempts at freeing memory are
> being thwarted (in part, at least) by Python's int free list.  I believe
> the int free list remains after the 10M individual ints' refcounts drop to
> zero. The large storage for the list is grabbed in one gulp and thus
> mmap()d I believe, so it is reclaimed by being munmap()d, hence the drop
> from 320+MB to 250+MB.
> 
> I haven't looked at the int free list or obmalloc implementations in
> awhile, but if the free list does return any of its memory to the system
> it probably just calls the free() library function.  Whether or not the
> system actually reclaims any memory from your process is dependent on the
> details of themalloc/free implementation's details.  That is, the behavior
> is outside Python's control.

Ah, thanks for explaining that.  I'm a little wiser about memory allocation
now, but am still having problems reclaiming memory from unused objects
within Python.  If I do the following:

>>>
(memory use: 15 MB)
>>> a = range(int(4e7))
(memory use: 1256 MB)
>>> a = None
(memory use: 953 MB)

...and then I allocate a lot of memory in another process (eg. open a load
of files in the GIMP), then the computer swaps the Python process out to
disk to free up the necessary space.  Python's memory use is still reported
as 953 MB, even though nothing like that amount of space is needed.  From
what you said above, the problem is in the underlying C libraries, but is
there anything I can do to get that memory back without closing Python?

-- 
I'm at CAMbridge, not SPAMbridge
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Garbage collection

2007-03-21 Thread Thinker
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Tom Wright wrote:
> Thinker wrote:
>> How do you know amount of memory used by Python? ps ? top or
>> something?
>
> $ ps up `pidof python2.5` USER PID %CPU %MEM VSZ RSS TTY
> STAT START TIME COMMAND tew24 26275 0.0 11.9 257592 243988
> pts/6 S+ 13:10 0:00 python2.5
>
> "VSZ" is "Virtual Memory Size" (ie. total memory used by the
> application) "RSS" is "Resident Set Size" (ie. non-swapped physical
> memory)
>
>
This is amount of memory allocate by process not Python interpreter.
It is managemed by
malloc() of C library. When you free a block memory by free()
function, it only return
the memory to C library for later use, but C library not always return
the memory to
the kernel.

Since there is a virtual memory for modem OS, inactive memory will be
paged
to pager when more physical memory blocks are need. It don't hurt
much if you have enough
swap space.

What you get from ps command is memory allocated by process, it don't
means
they are used by Python interpreter.

- --
Thinker Li - [EMAIL PROTECTED] [EMAIL PROTECTED]
http://heaven.branda.to/~thinker/GinGin_CGI.py
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (FreeBSD)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGATzJ1LDUVnWfY8gRAjSOAKC3uzoAWBow0VN77srjR5eBF0kXawCcCUYv
0RgdHNHqWMEn2Ap7zQuOFaQ=
=/hWg
-END PGP SIGNATURE-

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Garbage collection

2007-03-21 Thread Tom Wright
Thinker wrote:
> How do you know amount of memory used by Python?
> ps ? top or something?

$ ps up `pidof python2.5`
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
tew2426275  0.0 11.9 257592 243988 pts/6   S+   13:10   0:00 python2.5

"VSZ" is "Virtual Memory Size" (ie. total memory used by the application)
"RSS" is "Resident Set Size" (ie. non-swapped physical memory)


-- 
I'm at CAMbridge, not SPAMbridge
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Garbage collection

2007-03-21 Thread skip
Tom> I suspect I may be missing something vital here, but Python's
Tom> garbage collection doesn't seem to work as I expect it to.  Here's
Tom> a small test program which shows the problem on python 2.4 and 2.5:

Tom> (at this point, Python is using 15MB)

>>> a = range(int(1e7))
>>> a = None
>>> import gc
>>> gc.collect()
0

Tom> (at this point, Python is using 252MB)

Tom> Is there something I've forgotten to do? Why is Python still using
Tom> such a lot of memory?

You haven't forgotten to do anything.  Your attempts at freeing memory are
being thwarted (in part, at least) by Python's int free list.  I believe the
int free list remains after the 10M individual ints' refcounts drop to zero.
The large storage for the list is grabbed in one gulp and thus mmap()d I
believe, so it is reclaimed by being munmap()d, hence the drop from 320+MB
to 250+MB.

I haven't looked at the int free list or obmalloc implementations in awhile,
but if the free list does return any of its memory to the system it probably
just calls the free() library function.  Whether or not the system actually
reclaims any memory from your process is dependent on the details of the
malloc/free implementation's details.  That is, the behavior is outside
Python's control.

Skip
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Garbage collection

2007-03-21 Thread Thinker
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Tom Wright wrote:
> Hi all
>
> I suspect I may be missing something vital here, but Python's garbage
> collection doesn't seem to work as I expect it to. Here's a small test
> program which shows the problem on python 2.4 and 2.5:
... skip .
> (at this point, Python is using 252MB)
>
>
> Is there something I've forgotten to do? Why is Python still using such a
> lot of memory?
>
>
> Thanks!
>
How do you know amount of memory used by Python?
ps 、 top or something?

- --
Thinker Li - [EMAIL PROTECTED] [EMAIL PROTECTED]
http://heaven.branda.to/~thinker/GinGin_CGI.py
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (FreeBSD)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGATUI1LDUVnWfY8gRAhy9AKDTA2vZYkF7ZLl9Ufy4i+onVSmWhACfTAOv
PdQn/V1ppnaKAhdrblA3y+0=
=dmnr
-END PGP SIGNATURE-

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Garbage collection with QT

2005-06-08 Thread Ken Godee
> Is there a way, to find out all references to the QMainWindow or its 
> hosted QTable, for having a mechanism to destroy them?
> 
Yes, of coarse, the docs are your friend :)

QObject::children()
QObject::removeChild()
QObject::parent()

To find all the children for an instance you
can create a loop.

An example of a dialog window function
that cleans it self up 


def xdialog(self,vparent,info):

 vlogin = dialogwindow(parent=vparent,modal=1)

 while 1:

 vlogin.exec_loop()

 if vlogin.result() == 0:
 vparent.removeChild(vlogin)
 del vlogin
 break




-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Garbage collection with QT

2005-06-08 Thread Mike Tammerman
Not all leakage problems caused by qt or python. There is a wrapping
layer between Qt and Python provided by SIP. Therefore, SIP may cause
leakages. Also PyQt had a paintCell memory leakage problem several
months ago. If you're using an old snapshot of PyQt or SIP, that would
be a problem. Try using the latest snapshots. Also mention your
versions and problems to the PyKDE mailinglist, it could be more
helpful.

If you want to delete C++ objects in Qt, consider using
QObject.deleteLater() method. IMHO, this won't help.

Mike

-- 
http://mail.python.org/mailman/listinfo/python-list