Re: [Python-ideas] Automatic context managers
On Sat, Apr 27, 2013 at 1:54 AM, MRAB pyt...@mrabarnett.plus.com wrote: On 26/04/2013 14:02, anatoly techtonik wrote: This circular reference problem is interesting. In object space it probably looks like a stellar detached from the visible (attached) universe. Is the main problem in detecting it? The problem is in knowing in which order the objects should be collected. For example, if A refers to B and B refers to A, should you collect A then B, or B then A? If you collect A first, then, for a time, B will be referring to a non-existent object. That's not good if the objects have destructors which need to be run. Spin-off thread from python-ideas to discuss a more general question of garbage collection of cyclic structures. Once it's been proven that there's an unreferenced cycle, why not simply dispose of one of the objects, and replace all references to it (probably only one - preferably pick an object with the fewest references) with a special temporary object? In fact, that could probably be done in CPython by wiping out the object in memory and replacing it with a special marker of some sort, which would then automatically take over all references to the old object. Any attempt to manipulate this object could simply pop back with a DestructedObject exception or something. Is this a plausible (never mind viable yet, just conceptually plausible) alternative to sticking them into gc.garbage and ignoring them? It'd allow a double-linked list/tree to function cleanly - imagine, for instance, something like the DOM facilities available to web browser scripts: class DOMObject: def __init__(self,parent): self.parent=parent self.firstchild=self.sibling=None if not parent: return if not parent.firstchild: parent.firstchild=self else: child=parent.firstchild while child.sibling: child=child.sibling child.sibling=self def __del__(self): print(Disposing of id #%d%id(self)) document=DOMObject(None) body=DOMObject(document) p=DOMObject(body) p=DOMObject(body) p=DOMObject(body) del document,body,p gc.collect() The __del__ method would need to clean up the external resources used by this object, but wouldn't have to walk the tree. Yet, just because there is a reference loop and there are __del__ methods, the garbage collector gives up and leaves it to the program author to deal with. I can understand if this is considered too complicated and too unusual a circumstance to be worth bothering to support, but I'm curious as to whether it's at least conceptually reasonable to do something like this. ChrisA -- http://mail.python.org/mailman/listinfo/python-list
CPython's cyclic garbage collector (was [Python-ideas] Automatic context managers)
[[ Resending under a more appropriate subject line... sorry about that, ignore the other one as it'll only confuse matters ]] On Sat, Apr 27, 2013 at 1:54 AM, MRAB pyt...@mrabarnett.plus.com wrote: On 26/04/2013 14:02, anatoly techtonik wrote: This circular reference problem is interesting. In object space it probably looks like a stellar detached from the visible (attached) universe. Is the main problem in detecting it? The problem is in knowing in which order the objects should be collected. For example, if A refers to B and B refers to A, should you collect A then B, or B then A? If you collect A first, then, for a time, B will be referring to a non-existent object. That's not good if the objects have destructors which need to be run. Spin-off thread from python-ideas to discuss a more general question of garbage collection of cyclic structures. Once it's been proven that there's an unreferenced cycle, why not simply dispose of one of the objects, and replace all references to it (probably only one - preferably pick an object with the fewest references) with a special temporary object? In fact, that could probably be done in CPython by wiping out the object in memory and replacing it with a special marker of some sort, which would then automatically take over all references to the old object. Any attempt to manipulate this object could simply pop back with a DestructedObject exception or something. Is this a plausible (never mind viable yet, just conceptually plausible) alternative to sticking them into gc.garbage and ignoring them? It'd allow a double-linked list/tree to function cleanly - imagine, for instance, something like the DOM facilities available to web browser scripts: class DOMObject: def __init__(self,parent): self.parent=parent self.firstchild=self.sibling=None if not parent: return if not parent.firstchild: parent.firstchild=self else: child=parent.firstchild while child.sibling: child=child.sibling child.sibling=self def __del__(self): print(Disposing of id #%d%id(self)) document=DOMObject(None) body=DOMObject(document) p=DOMObject(body) p=DOMObject(body) p=DOMObject(body) del document,body,p gc.collect() The __del__ method would need to clean up the external resources used by this object, but wouldn't have to walk the tree. Yet, just because there is a reference loop and there are __del__ methods, the garbage collector gives up and leaves it to the program author to deal with. I can understand if this is considered too complicated and too unusual a circumstance to be worth bothering to support, but I'm curious as to whether it's at least conceptually reasonable to do something like this. ChrisA -- http://mail.python.org/mailman/listinfo/python-list
Re: CPython's cyclic garbage collector (was [Python-ideas] Automatic context managers)
On 04/26/2013 12:54 PM, Chris Angelico wrote: [[ Resending under a more appropriate subject line... sorry about that, ignore the other one as it'll only confuse matters ]] On Sat, Apr 27, 2013 at 1:54 AM, MRAB pyt...@mrabarnett.plus.com wrote: On 26/04/2013 14:02, anatoly techtonik wrote: This circular reference problem is interesting. In object space it probably looks like a stellar detached from the visible (attached) universe. Is the main problem in detecting it? The problem is in knowing in which order the objects should be collected. For example, if A refers to B and B refers to A, should you collect A then B, or B then A? If you collect A first, then, for a time, B will be referring to a non-existent object. That's not good if the objects have destructors which need to be run. Spin-off thread from python-ideas to discuss a more general question of garbage collection of cyclic structures. Once it's been proven that there's an unreferenced cycle, why not simply dispose of one of the objects, and replace all references to it (probably only one - preferably pick an object with the fewest references) with a special temporary object? In fact, that could probably be done in CPython by wiping out the object in memory and replacing it with a special marker of some sort, which would then automatically take over all references to the old object. Any attempt to manipulate this object could simply pop back with a DestructedObject exception or something. Is this a plausible (never mind viable yet, just conceptually plausible) alternative to sticking them into gc.garbage and ignoring them? It'd allow a double-linked list/tree to function cleanly - imagine, for instance, something like the DOM facilities available to web browser scripts: class DOMObject: def __init__(self,parent): self.parent=parent self.firstchild=self.sibling=None if not parent: return if not parent.firstchild: parent.firstchild=self else: child=parent.firstchild while child.sibling: child=child.sibling child.sibling=self def __del__(self): print(Disposing of id #%d%id(self)) document=DOMObject(None) body=DOMObject(document) p=DOMObject(body) p=DOMObject(body) p=DOMObject(body) del document,body,p gc.collect() The __del__ method would need to clean up the external resources used by this object, but wouldn't have to walk the tree. Yet, just because there is a reference loop and there are __del__ methods, the garbage collector gives up and leaves it to the program author to deal with. I can understand if this is considered too complicated and too unusual a circumstance to be worth bothering to support, but I'm curious as to whether it's at least conceptually reasonable to do something like this. ChrisA I don't see what your special temporary object actually accomplishes. Seems to me you need to declare that your __del__() methods promise not to reference each other, and the gc would then check all objects in the cycle, and do its present behavior if any of the destructors is not specially declared. I'm not sure how often you'd have a non-trivial destructor that wouldn't reference any objects. And doing a static analysis of what will happen during the destructor would be pretty messy. So the best I and come up with is to keep the declaration, but require a try/catch to cleanly terminate each destructor if it ever references anything in the tree. -- DaveA -- http://mail.python.org/mailman/listinfo/python-list
Re: CPython's cyclic garbage collector (was [Python-ideas] Automatic context managers)
On Sat, Apr 27, 2013 at 3:42 AM, Dave Angel da...@davea.name wrote: I don't see what your special temporary object actually accomplishes. Seems to me you need to declare that your __del__() methods promise not to reference each other, and the gc would then check all objects in the cycle, and do its present behavior if any of the destructors is not specially declared. It wouldn't be declared; it'd simply throw an exception if anything different happened. I'm not sure how often you'd have a non-trivial destructor that wouldn't reference any objects. And doing a static analysis of what will happen during the destructor would be pretty messy. So the best I and come up with is to keep the declaration, but require a try/catch to cleanly terminate each destructor if it ever references anything in the tree. And yeah. If you catch the exception inside __del__, you can cope with the destructed object yourself (or LBLY, if you wish). Alternatively, you just proceed as normal, and when your __del__ throws an exception, the gc then copes (not sure *how* it should cope - log it to stderr and carry on?). Same as normal exception handling. The advantage of this style is that the code to deal with the cycle is kept right in the cyclic object's destructor - right where the problem is. Doing it through gc.garbage requires that some other operation periodically check for garbage - after the GC has done its own periodic check. Seems simpler/cleaner to do it as part of the gc run itself. ChrisA -- http://mail.python.org/mailman/listinfo/python-list
Re: CPython's cyclic garbage collector (was [Python-ideas] Automatic context managers)
On 04/26/2013 01:57 PM, Chris Angelico wrote: On Sat, Apr 27, 2013 at 3:42 AM, Dave Angel da...@davea.name wrote: I don't see what your special temporary object actually accomplishes. Seems to me you need to declare that your __del__() methods promise not to reference each other, and the gc would then check all objects in the cycle, and do its present behavior if any of the destructors is not specially declared. It wouldn't be declared; it'd simply throw an exception if anything different happened. I'm not sure how often you'd have a non-trivial destructor that wouldn't reference any objects. And doing a static analysis of what will happen during the destructor would be pretty messy. So the best I and come up with is to keep the declaration, but require a try/catch to cleanly terminate each destructor if it ever references anything in the tree. And yeah. If you catch the exception inside __del__, you can cope with the destructed object yourself (or LBLY, if you wish). Alternatively, you just proceed as normal, and when your __del__ throws an exception, the gc then copes (not sure *how* it should cope - log it to stderr and carry on?). Same as normal exception handling. The advantage of this style is that the code to deal with the cycle is kept right in the cyclic object's destructor - right where the problem is. Doing it through gc.garbage requires that some other operation periodically check for garbage - after the GC has done its own periodic check. Seems simpler/cleaner to do it as part of the gc run itself. You must think me dense by now. But I don't understand what the two different garbage collection operations are that you're positing. As far as I know, there's ref counting, which is quick, and frees something as soon as the count goes to zero. Then there's gc, which has to scan through all the objects from a known starting set, and identify those things which aren't accessible, and free any that don't have a __del__() method. And it's only in the gc step that cycles and such are identifiable. -- DaveA -- http://mail.python.org/mailman/listinfo/python-list
Re: [Python-ideas] Automatic context managers
Chris Angelico於 2013年4月27日星期六UTC+8上午12時52分38秒寫道: On Sat, Apr 27, 2013 at 1:54 AM, MRAB pyt...@mrabarnett.plus.com wrote: On 26/04/2013 14:02, anatoly techtonik wrote: This circular reference problem is interesting. In object space it probably looks like a stellar detached from the visible (attached) universe. Is the main problem in detecting it? The problem is in knowing in which order the objects should be collected. For example, if A refers to B and B refers to A, should you collect A then B, or B then A? If you collect A first, then, for a time, B will be referring to a non-existent object. That's not good if the objects have destructors which need to be run. Spin-off thread from python-ideas to discuss a more general question of garbage collection of cyclic structures. Once it's been proven that there's an unreferenced cycle, why not simply dispose of one of the objects, and replace all references to it (probably only one - preferably pick an object with the fewest references) with a special temporary object? In fact, that could probably be done in CPython by wiping out the object in memory and replacing it with a special marker of some sort, which would then automatically take over all references to the old object. Any attempt to manipulate this object could simply pop back with a DestructedObject exception or something. Is this a plausible (never mind viable yet, just conceptually plausible) alternative to sticking them into gc.garbage and ignoring them? It'd allow a double-linked list/tree to function cleanly - imagine, for instance, something like the DOM facilities available to web browser scripts: class DOMObject: def __init__(self,parent): self.parent=parent self.firstchild=self.sibling=None if not parent: return if not parent.firstchild: parent.firstchild=self else: child=parent.firstchild while child.sibling: child=child.sibling child.sibling=self def __del__(self): print(Disposing of id #%d%id(self)) document=DOMObject(None) body=DOMObject(document) p=DOMObject(body) p=DOMObject(body) p=DOMObject(body) del document,body,p gc.collect() The __del__ method would need to clean up the external resources used by this object, but wouldn't have to walk the tree. Yet, just because there is a reference loop and there are __del__ methods, the garbage collector gives up and leaves it to the program author to deal with. I can understand if this is considered too complicated and too unusual a circumstance to be worth bothering to support, but I'm curious as to whether it's at least conceptually reasonable to do something like this. ChrisA Please use the deep-copy methods from time to time to disentangle referenced objects in python. The cyclic reference cycle has to be broken by some mean first in python to proceed for further actions in the gc. -- http://mail.python.org/mailman/listinfo/python-list
Re: CPython's cyclic garbage collector (was [Python-ideas] Automatic context managers)
On Fri, Apr 26, 2013 at 10:54 AM, Chris Angelico ros...@gmail.com wrote: Once it's been proven that there's an unreferenced cycle, why not simply dispose of one of the objects, and replace all references to it (probably only one - preferably pick an object with the fewest references) with a special temporary object? In fact, that could probably be done in CPython by wiping out the object in memory and replacing it with a special marker of some sort, which would then automatically take over all references to the old object. Any attempt to manipulate this object could simply pop back with a DestructedObject exception or something. I think it still boils down to the same problem -- how should Python *predictably* choose which object will be disposed of in order to break the cycle? I don't see that this question is any different than asking how should Python choose which __del__ method should be called first, since the object so disposed of would still need its __del__ method called. -- http://mail.python.org/mailman/listinfo/python-list
Re: CPython's cyclic garbage collector (was [Python-ideas] Automatic context managers)
On Fri, Apr 26, 2013 at 1:31 PM, Dave Angel da...@davea.name wrote: On 04/26/2013 01:57 PM, Chris Angelico wrote: And yeah. If you catch the exception inside __del__, you can cope with the destructed object yourself (or LBLY, if you wish). Alternatively, you just proceed as normal, and when your __del__ throws an exception, the gc then copes (not sure *how* it should cope - log it to stderr and carry on?). Same as normal exception handling. The advantage of this style is that the code to deal with the cycle is kept right in the cyclic object's destructor - right where the problem is. Doing it through gc.garbage requires that some other operation periodically check for garbage - after the GC has done its own periodic check. Seems simpler/cleaner to do it as part of the gc run itself. You must think me dense by now. But I don't understand what the two different garbage collection operations are that you're positing. As far as I know, there's ref counting, which is quick, and frees something as soon as the count goes to zero. Then there's gc, which has to scan through all the objects from a known starting set, and identify those things which aren't accessible, and free any that don't have a __del__() method. And it's only in the gc step that cycles and such are identifiable. Whenever the GC finds a cycle that is unreferenced but uncollectable, it stores those objects in the list gc.garbage. At that point, if the user wishes to clean up those cycles, it is up to them to delve into gc.garbage, untangle the objects contained within, break the cycles, and remove them from the list so that they can be freed by the ref counter. This user-supplied step is what Chris is referring to as some other periodic check. If the user does not do this, then those objects simply remain in the gc.garbage list until the program terminates. -- http://mail.python.org/mailman/listinfo/python-list
Re: CPython's cyclic garbage collector (was [Python-ideas] Automatic context managers)
On 04/26/2013 06:43 PM, Ian Kelly wrote: SNIP Whenever the GC finds a cycle that is unreferenced but uncollectable, it stores those objects in the list gc.garbage. At that point, if the user wishes to clean up those cycles, it is up to them to delve into gc.garbage, untangle the objects contained within, break the cycles, and remove them from the list so that they can be freed by the ref counter. This user-supplied step is what Chris is referring to as some other periodic check. If the user does not do this, then those objects simply remain in the gc.garbage list until the program terminates. I didn't know there was a callback that a user could hook into. That's very interesting. -- DaveA -- http://mail.python.org/mailman/listinfo/python-list
Re: CPython's cyclic garbage collector (was [Python-ideas] Automatic context managers)
On 04/26/2013 06:50 PM, Ian Kelly wrote: On Fri, Apr 26, 2013 at 10:54 AM, Chris Angelico ros...@gmail.com wrote: Once it's been proven that there's an unreferenced cycle, why not simply dispose of one of the objects, and replace all references to it (probably only one - preferably pick an object with the fewest references) with a special temporary object? In fact, that could probably be done in CPython by wiping out the object in memory and replacing it with a special marker of some sort, which would then automatically take over all references to the old object. Any attempt to manipulate this object could simply pop back with a DestructedObject exception or something. I think it still boils down to the same problem -- how should Python *predictably* choose which object will be disposed of in order to break the cycle? I don't see that this question is any different than asking how should Python choose which __del__ method should be called first, since the object so disposed of would still need its __del__ method called. Perhaps if the __del__ methods disposed of the non-cyclic parts first, an exception doing the cyclic cleanups wouldn't hurt. Picture a doubly-linked list of file-type objects. Each one could release its file before trying to do anything with the links. -- DaveA -- http://mail.python.org/mailman/listinfo/python-list
Re: CPython's cyclic garbage collector (was [Python-ideas] Automatic context managers)
Whenever the GC finds a cycle that is unreferenced but uncollectable, it stores those objects in the list gc.garbage. At that point, if the user wishes to clean up those cycles, it is up to them to delve into gc.garbage, untangle the objects contained within, break the cycles, and remove them from the list so that they can be freed by the ref counter. I wonder if it would be useful to provide a gc.garbagehook analogous to sys.excepthook? Users could assign a function of their choice to much the cyclic garbage periodically. Just a thought, flying out of my fingers before my brain could stop it... Skip -- http://mail.python.org/mailman/listinfo/python-list
Re: CPython's cyclic garbage collector (was [Python-ideas] Automatic context managers)
On Sat, Apr 27, 2013 at 8:50 AM, Ian Kelly ian.g.ke...@gmail.com wrote: On Fri, Apr 26, 2013 at 10:54 AM, Chris Angelico ros...@gmail.com wrote: Once it's been proven that there's an unreferenced cycle, why not simply dispose of one of the objects, and replace all references to it (probably only one - preferably pick an object with the fewest references) with a special temporary object? In fact, that could probably be done in CPython by wiping out the object in memory and replacing it with a special marker of some sort, which would then automatically take over all references to the old object. Any attempt to manipulate this object could simply pop back with a DestructedObject exception or something. I think it still boils down to the same problem -- how should Python *predictably* choose which object will be disposed of in order to break the cycle? I don't see that this question is any different than asking how should Python choose which __del__ method should be called first, since the object so disposed of would still need its __del__ method called. It wouldn't need to. It just chooses arbitrarily. Maybe it picks the one with the lowest id(), maybe it picks the object that takes up the most space, maybe the one that would have been put into gc.garbage first. But it's got to die tomorrow, so it really doesn't matter[1], and there's no way for the GC to know which one won't be referenced by the other's __del__ method. [1] I'm involved in a Gilbert and Sullivan Society performance, curtain up in less than three hours, so I'm morally obligated to quote GS somewhere. ChrisA -- http://mail.python.org/mailman/listinfo/python-list
Re: CPython's cyclic garbage collector (was [Python-ideas] Automatic context managers)
On Sat, Apr 27, 2013 at 9:45 AM, Dave Angel da...@davea.name wrote: I didn't know there was a callback that a user could hook into. That's very interesting. On Sat, Apr 27, 2013 at 10:22 AM, Skip Montanaro s...@pobox.com wrote: Whenever the GC finds a cycle that is unreferenced but uncollectable, it stores those objects in the list gc.garbage. At that point, if the user wishes to clean up those cycles, it is up to them to delve into gc.garbage, untangle the objects contained within, break the cycles, and remove them from the list so that they can be freed by the ref counter. I wonder if it would be useful to provide a gc.garbagehook analogous to sys.excepthook? Users could assign a function of their choice to much the cyclic garbage periodically. Just a thought, flying out of my fingers before my brain could stop it... As far as I know, Dave, there isn't currently one; Skip, that's close to what I'm talking about - it saves on the periodic check. But burying it in gc.garbagehook implies having a separate piece of code that knows how to break the reference cycles, whereas the __del__ method puts the code right there in the code that has the problem. Actually, *ANY* solution to this problem implies having __del__ able to cope with the cycle being broken. Here's an example, perhaps a silly one, but not far different in nature from some things I've done in C++. (Granted, all the Python implementations of those same algorithms have involved built-in types rather than linked lists, but still.) class DLCircList: def __init__(self,payload): self.payload=payload self.next=self.prev=self print(Creating node: %s%self.payload) def __del__(self): print(Deleting node %s from cycle %s%(self.payload,self.enum())) self.prev.next=self.next self.next.prev=self.prev def attach(self,other): assert(self.next==self) # Don't attach twice self.prev=other self.next=other.next other.next=self self.next.prev=self print(Adding node %s to cycle %s%(self.payload,self.enum())) def enum(self): Return a list of all node payloads in this cycle. ptr=self.next nodes=[self.payload] while ptr!=self: nodes.append(ptr.payload) ptr=ptr.next return nodes lst=DLCircList(foo) DLCircList(bar).attach(lst) DLCircList(quux).attach(lst) DLCircList(asdf).attach(lst) DLCircList(qwer).attach(lst) DLCircList(zxcv).attach(lst) print(Enumerating list: %s%lst.enum()) del lst import gc gbg=gc.collect() print(And we have garbage: %s%gbg) print(gc.garbage) Supposing you did this many many times, and you wanted decent garbage collection. How would you write a __del__ method, how would you write something to clean up gc.garbage? One way or another, something will have to deal with the possibility that the invariants have been broken, so my theory is that that possibility should be entirely within __del__. (Since __del__ calls enum(), it's possible for enum() to throw DestructedObject or whatever, but standard exception handling will deal with that.) ChrisA -- http://mail.python.org/mailman/listinfo/python-list
Re: CPython's cyclic garbage collector (was [Python-ideas] Automatic context managers)
From the Zen of Python: In the face of ambiguity, refuse the temptation to guess. I believe the reason something isn't already done to break cycles is that the authors of the cyclic garbage collector considered the above aphorism. They rely on the author of the code with the cycles to figure out how to break them. All I was suggesting was that Python could provide a hook where the programmer could codify his algorithm for breaking cycles. Skip -- http://mail.python.org/mailman/listinfo/python-list
Re: CPython's cyclic garbage collector (was [Python-ideas] Automatic context managers)
On Sat, Apr 27, 2013 at 12:12 PM, Skip Montanaro s...@pobox.com wrote: From the Zen of Python: In the face of ambiguity, refuse the temptation to guess. I believe the reason something isn't already done to break cycles is that the authors of the cyclic garbage collector considered the above aphorism. They rely on the author of the code with the cycles to figure out how to break them. All I was suggesting was that Python could provide a hook where the programmer could codify his algorithm for breaking cycles. Sure, and that makes good sense. The hook would be an improvement (maybe it gets passed a list of garbage that's about to be added to gc.garbage, but will be re-checked for cycles after the hook, so all you have to do is break the cycle in the hook and it'll work), but the implication is that an external piece of code knows about the possible cycles and how to deal with them. I'd really rather have something right there in the class. Effectively, something like this: whenever gc.garbage has content: turn_into(gc.garbage[0],None) # or a special I am destroyed object del gc.garbage[:] It doesn't matter which one gets taken out. Yes, I suppose that's a form of guessing, but whatever rules get put in place - whether by the language or by your own script - it'll always be possible to conjure a scenario where tossing a coin is the only way to pick which one goes. ChrisA -- http://mail.python.org/mailman/listinfo/python-list