Re: Question about garbage collection
So, here's some info about how to see what's going on with Python's memory allocation: https://docs.python.org/3/library/tracemalloc.html . I haven't looked into this in a long time, but it used to be the case that you needed to compile native modules (and probably Python itself?) so that instrumentation is possible (I think incref / decref macros should give you a hint, because they would have to naturally report some of that info). Anyways. The problem of tracing memory allocation / deallocation in Python can be roughly split into these categories: 1. Memory legitimately claimed by objects created through Python runtime, but not reclaimed due to programmer error. I.e. the programmer wrote a program that keeps references to objects which it will never use again. 2. Memory claimed through native objects obtained by means of interacting with Python's allocator. When working with Python C API it's best to interface with Python allocator to deal with dynamic memory allocation and release. However, it's somewhat cumbersome, and some module authors simply might not know about it, or wouldn't want to use it because they prefer a different allocator. Sometimes library authors don't implement memory deallocation well. Which brings us to: 3. Memory claimed by any user-space code that is associated with the Python process. This can be for example shared libraries loaded by means of Python bindings, that is on top of the situation described above. 4. System memory associated with the process. Some system calls need to allocate memory on the system side. Typical examples are opening files, creating sockets etc. Typically, the system will limit the number of such objects, and the user program will hit the numerical limit before it hits the memory limit, but it can also happen that this will manifest as a memory problem (one example I ran into was trying to run conda-build and it would fail due to enormous amounts of memory it requested, but the specifics of the failure were due to it trying to create new sub-processes -- another system resource that requires memory allocation). There isn't a universal strategy to cover all these cases. But, if you have reasons to suspect (4), for example, you'd probably start by using strace utility (on Linux) to see what system calls are executed. For something like the (3), you could try to utilize Valgrind (but it's a lot of work to set it up). It's also possible to use jemalloc to profile a program, but you would have to build Python with its allocator modified to use jemalloc (I've seen an issue in the Python bug tracker where someone wrote a script to do that, so it should be possible). Both of these are quite labor intensive and not trivial to set up. (2) could be often diagnosed with tracemalloc Python module and (1) is something that can be helped with Python's gc module. It's always better though to have an actual error and work from there. Or, at least, have some monitoring data that suggests that your application memory use increases over time. Otherwise you could be spending a lot of time chasing problems you don't have. -- https://mail.python.org/mailman/listinfo/python-list
Re: Question about garbage collection
On 2024-01-17 3:01 AM, Greg Ewing via Python-list wrote: On 17/01/24 1:01 am, Frank Millman wrote: I sometimes need to keep a reference from a transient object to a more permanent structure in my app. To save myself the extra step of removing all these references when the transient object is deleted, I make them weak references. I don't see how weak references help here at all. If the transient object goes away, all references from it to the permanent objects also go away. A weak reference would only be of use if the reference went the other way, i.e. from the permanent object to the transient object. You are right. I got my description above back-to-front. It is a pub/sub scenario. A transient object makes a request to the permanent object to be notified of any changes. The permanent object stores a reference to the transient object and executes a callback on each change. When the transient object goes away, the reference must be removed. Frank -- https://mail.python.org/mailman/listinfo/python-list
Re: Question about garbage collection
On 17/01/24 1:01 am, Frank Millman wrote: I sometimes need to keep a reference from a transient object to a more permanent structure in my app. To save myself the extra step of removing all these references when the transient object is deleted, I make them weak references. I don't see how weak references help here at all. If the transient object goes away, all references from it to the permanent objects also go away. A weak reference would only be of use if the reference went the other way, i.e. from the permanent object to the transient object. -- Greg -- https://mail.python.org/mailman/listinfo/python-list
Re: Question about garbage collection
On 17/01/24 4:00 am, Chris Angelico wrote: class Form: def __init__(self): self.elements = [] class Element: def __init__(self, form): self.form = form form.elements.append(self) If you make the reference from Element to Form a weak reference, it won't keep the Form alive after it's been closed. -- Greg -- https://mail.python.org/mailman/listinfo/python-list
Re: Question about garbage collection
> On 16 Jan 2024, at 12:10, Frank Millman via Python-list > wrote: > > My problem is that my app is quite complex, and it is easy to leave a > reference dangling somewhere which prevents an object from being gc'd. What I do to track these problems down is use gc.get_objects() then summerize the number of each type. Part 2 is to print the delta after an interval of a 2nd summary. Leaks of objects show up as the count of a type increasing every time you sample. Barry -- https://mail.python.org/mailman/listinfo/python-list
Re: Question about garbage collection
> On 16 Jan 2024, at 13:17, Thomas Passin via Python-list > wrote: > > The usual advice is to call deleteLater() on objects derived from PyQt > classes. I don't know enough about PyQt to know if this takes care of all > dangling reference problems, though. It works well and robustly. Barry -- https://mail.python.org/mailman/listinfo/python-list
Re: Question about garbage collection
On Wed, 17 Jan 2024 at 01:45, Frank Millman via Python-list wrote: > > On 2024-01-16 2:15 PM, Chris Angelico via Python-list wrote: > > > > Where do you tend to "leave a reference dangling somewhere"? How is > > this occurring? Is it a result of an incomplete transaction (like an > > HTTP request that never finishes), or a regular part of the operation > > of the server? > > > > I have a class that represents a database table, and another class that > represents a database column. There is a one-to-many relationship and > they maintain references to each other. > > In another part of the app, there is a class that represents a form, and > another class that represents the gui elements on the form. Again there > is a one-to-many relationship. I don't know when you'd be "done" with the table, so I won't try to give an example, but I'll try this one and maybe it'll give some ideas that could apply to both. When you open the form, you initialize it, display it, etc, etc. This presumably includes something broadly like this: class Form: def __init__(self): self.elements = [] class Element: def __init__(self, form): self.form = form form.elements.append(self) frm = Form(...) Element(frm, ...) # as many as needed frm.show() # present it to the user This is a pretty classic refloop. I don't know exactly what your setup is, but most likely it's going to look something like this. Feel free to correct me if it doesn't. The solution here would be to trap the "form is no longer being displayed" moment. That'll be some sort of GUI event like a "close" or "delete" signal. When that comes through (and maybe after doing other processing), you no longer need the form, and can dispose of it. The simplest solution here is: Empty out frm.elements. That immediately leaves the form itself as a leaf (no references to anything relevant), and the elements still refer back to it, but once nothing ELSE refers to the form, everything can be disposed of. > A gui element that represents a piece of data has to maintain a link to > its database column object. There can be a many-to-one relationship, as > there could be more than one gui element referring to the same column. Okay, so the Element also refers to the corresponding Column. If the Form and Element aren't in a refloop, this shouldn't be a problem. However, if this is the same Table and Column that you referred to above, that might be the answer to my question. Are you "done" with the Table at the same time that the form is no longer visible? If so, you would probably have something similar where the Form refers to the Table, and the Table and Columns refer to each other... so the same solution hopefully should work: wipe out the Table's list of columns. > There are added complications which I won't go into here. The bottom > line is that on some occasions a form which has been closed does not get > gc'd. > > I have been trying to reproduce the problem in my toy app, but I cannot > get it to fail. There is a clue there! I think I have just > over-complicated things. Definitely possible. > I will start with a fresh approach tomorrow. If you don't hear from me > again, you will know that I have solved it! > > Thanks for the input, it definitely helped. Cool cool, happy to help. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Question about garbage collection
On 2024-01-16 2:15 PM, Chris Angelico via Python-list wrote: Where do you tend to "leave a reference dangling somewhere"? How is this occurring? Is it a result of an incomplete transaction (like an HTTP request that never finishes), or a regular part of the operation of the server? I have a class that represents a database table, and another class that represents a database column. There is a one-to-many relationship and they maintain references to each other. In another part of the app, there is a class that represents a form, and another class that represents the gui elements on the form. Again there is a one-to-many relationship. A gui element that represents a piece of data has to maintain a link to its database column object. There can be a many-to-one relationship, as there could be more than one gui element referring to the same column. There are added complications which I won't go into here. The bottom line is that on some occasions a form which has been closed does not get gc'd. I have been trying to reproduce the problem in my toy app, but I cannot get it to fail. There is a clue there! I think I have just over-complicated things. I will start with a fresh approach tomorrow. If you don't hear from me again, you will know that I have solved it! Thanks for the input, it definitely helped. Frank -- https://mail.python.org/mailman/listinfo/python-list
Re: Question about garbage collection
On 1/16/2024 4:17 AM, Barry wrote: On 16 Jan 2024, at 03:49, Thomas Passin via Python-list wrote: This kind of thing can happen with PyQt, also. There are ways to minimize it but I don't know if you can ever be sure all Qt C++ objects will get deleted. It depends on the type of object and the circumstances. When this has been seen in the past it has been promptly fixed by the maintainer. The usual advice is to call deleteLater() on objects derived from PyQt classes. I don't know enough about PyQt to know if this takes care of all dangling reference problems, though. -- https://mail.python.org/mailman/listinfo/python-list
Re: Question about garbage collection
On Tue, 16 Jan 2024 at 23:08, Frank Millman via Python-list wrote: > > On 2024-01-15 3:51 PM, Frank Millman via Python-list wrote: > > Hi all > > > > I have read that one should not have to worry about garbage collection > > in modern versions of Python - it 'just works'. > > > > I don't want to rely on that. My app is a long-running server, with > > multiple clients logging on, doing stuff, and logging off. They can > > create many objects, some of them long-lasting. I want to be sure that > > all objects created are gc'd when the session ends. > > > > I did not explain myself very well. Sorry about that. > > My problem is that my app is quite complex, and it is easy to leave a > reference dangling somewhere which prevents an object from being gc'd. > > This can create (at least) two problems. The obvious one is a memory > leak. The second is that I sometimes need to keep a reference from a > transient object to a more permanent structure in my app. To save myself > the extra step of removing all these references when the transient > object is deleted, I make them weak references. This works, unless the > transient object is kept alive by mistake and the weak ref is never removed. > > I feel it is important to find these dangling references and fix them, > rather than wait for problems to appear in production. The only method I > can come up with is to use the 'delwatcher' class that I used in my toy > program in my original post. > > I am surprised that this issue does not crop up more often. Does nobody > else have these problems? > It really depends on how big those dangling objects are. My personal habit is to not worry about a few loose objects, by virtue of ensuring that everything either has its reference loops deliberately broken at some point in time, or by keeping things small. An example of deliberately breaking a refloop would be when I track websockets. Usually I'll tag the socket object itself with some kind of back-reference to my own state, but I also need to be able to iterate over all of my own state objects (let's say they're dictionaries for simplicity) and send a message to each socket. So there'll be a reference loop between the socket and the state. But at some point, I will be notified that the socket has been disconnected, and that's when I go to its state object and wipe out its back-reference. It can then be disposed of promptly, since there's no loop. It takes a bit of care, but in general, large state objects won't have these kinds of loops, and dangling references haven't caused me any sort of major issues in production. Where do you tend to "leave a reference dangling somewhere"? How is this occurring? Is it a result of an incomplete transaction (like an HTTP request that never finishes), or a regular part of the operation of the server? ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Question about garbage collection
On 2024-01-15 3:51 PM, Frank Millman via Python-list wrote: Hi all I have read that one should not have to worry about garbage collection in modern versions of Python - it 'just works'. I don't want to rely on that. My app is a long-running server, with multiple clients logging on, doing stuff, and logging off. They can create many objects, some of them long-lasting. I want to be sure that all objects created are gc'd when the session ends. I did not explain myself very well. Sorry about that. My problem is that my app is quite complex, and it is easy to leave a reference dangling somewhere which prevents an object from being gc'd. This can create (at least) two problems. The obvious one is a memory leak. The second is that I sometimes need to keep a reference from a transient object to a more permanent structure in my app. To save myself the extra step of removing all these references when the transient object is deleted, I make them weak references. This works, unless the transient object is kept alive by mistake and the weak ref is never removed. I feel it is important to find these dangling references and fix them, rather than wait for problems to appear in production. The only method I can come up with is to use the 'delwatcher' class that I used in my toy program in my original post. I am surprised that this issue does not crop up more often. Does nobody else have these problems? Frank -- https://mail.python.org/mailman/listinfo/python-list
Re: Question about garbage collection
> On 16 Jan 2024, at 03:49, Thomas Passin via Python-list > wrote: > > This kind of thing can happen with PyQt, also. There are ways to minimize it > but I don't know if you can ever be sure all Qt C++ objects will get deleted. > It depends on the type of object and the circumstances. When this has been seen in the past it has been promptly fixed by the maintainer. Barry -- https://mail.python.org/mailman/listinfo/python-list
Re: Question about garbage collection
On 1/15/2024 9:47 PM, Akkana Peck via Python-list wrote: I wrote: Also be warned that some modules (particularly if they're based on libraries not written in Python) might not garbage collect, so you may need to use other methods of cleaning up after those objects. Chris Angelico writes: Got any examples of that? The big one for me was gdk-pixbuf, part of GTK. When you do something like gtk.gdk.pixbuf_new_from_file(), there's a Python object that gets created, but there's also the underlying C code that allocates memory for the pixbuf. When the object went out of scope, the Python object was automatically garbage collected, but the pixbuf data leaked. This kind of thing can happen with PyQt, also. There are ways to minimize it but I don't know if you can ever be sure all Qt C++ objects will get deleted. It depends on the type of object and the circumstances. Calling gc.collect() caused the pixbuf data to be garbage collected too. There used to be a post explaining this on the pygtk mailing list: the link was http://www.daa.com.au/pipermail/pygtk/2003-December/006499.html but that page is gone now and I can't seem to find any other archives of that list (it's not on archive.org either). And this was from GTK2; I never checked whether the extra gc.collect() is still necessary in GTK3, but I figure leaving it in doesn't hurt anything. I use pixbufs in a tiled map application, so there are a lot of small pixbufs being repeatedly read and then deallocated. ...Akkana -- https://mail.python.org/mailman/listinfo/python-list
Re: Question about garbage collection
On Tue, 16 Jan 2024 at 13:49, Akkana Peck via Python-list wrote: > > I wrote: > > > Also be warned that some modules (particularly if they're based on > > > libraries not written in Python) might not garbage collect, so you may > > > need to use other methods of cleaning up after those objects. > > Chris Angelico writes: > > Got any examples of that? > > The big one for me was gdk-pixbuf, part of GTK. When you do something like > gtk.gdk.pixbuf_new_from_file(), there's a Python object that gets created, > but there's also the underlying C code that allocates memory for the pixbuf. > When the object went out of scope, the Python object was automatically > garbage collected, but the pixbuf data leaked. Calling gc.collect() caused > the pixbuf data to be garbage collected too. > > There used to be a post explaining this on the pygtk mailing list: the link > was > http://www.daa.com.au/pipermail/pygtk/2003-December/006499.html > but that page is gone now and I can't seem to find any other archives of that > list (it's not on archive.org either). And this was from GTK2; I never > checked whether the extra gc.collect() is still necessary in GTK3, but I > figure leaving it in doesn't hurt anything. I use pixbufs in a tiled map > application, so there are a lot of small pixbufs being repeatedly read and > then deallocated. > Okay, so to clarify: the Python object will always be garbage collected correctly, but a buggy third-party module might have *external* resources (in that case, the pixbuf) that aren't properly released. Either that, or there is a reference loop, which doesn't necessarily mean you NEED to call gc.collect(), but it can help if you want to get rid of them more promptly. (Python will detect such loops at some point, but not always immediately.) But these are bugs in the module, particularly the first case, and should be considered as such. 2003 is fully two decades ago now, and I would not expect that a serious bug like that has been copied into PyGObject (the newer way of using GTK from Python). So, Python's garbage collection CAN be assumed to "just work", unless you find evidence to the contrary. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Question about garbage collection
I wrote: > > Also be warned that some modules (particularly if they're based on > > libraries not written in Python) might not garbage collect, so you may need > > to use other methods of cleaning up after those objects. Chris Angelico writes: > Got any examples of that? The big one for me was gdk-pixbuf, part of GTK. When you do something like gtk.gdk.pixbuf_new_from_file(), there's a Python object that gets created, but there's also the underlying C code that allocates memory for the pixbuf. When the object went out of scope, the Python object was automatically garbage collected, but the pixbuf data leaked. Calling gc.collect() caused the pixbuf data to be garbage collected too. There used to be a post explaining this on the pygtk mailing list: the link was http://www.daa.com.au/pipermail/pygtk/2003-December/006499.html but that page is gone now and I can't seem to find any other archives of that list (it's not on archive.org either). And this was from GTK2; I never checked whether the extra gc.collect() is still necessary in GTK3, but I figure leaving it in doesn't hurt anything. I use pixbufs in a tiled map application, so there are a lot of small pixbufs being repeatedly read and then deallocated. ...Akkana -- https://mail.python.org/mailman/listinfo/python-list
Re: Question about garbage collection
On Tue, 16 Jan 2024 at 06:32, Akkana Peck via Python-list wrote: > > > Frank Millman wrote at 2024-1-15 15:51 +0200: > > >I have read that one should not have to worry about garbage collection > > >in modern versions of Python - it 'just works'. > > Dieter Maurer via Python-list writes: > > There are still some isolated cases when not all objects > > in an unreachable cycle are destroyed > > (see e.g. step 2 of > > "https://devguide.python.org/internals/garbage-collector/index.html#destroying-unreachable-objects";). > > Also be warned that some modules (particularly if they're based on libraries > not written in Python) might not garbage collect, so you may need to use > other methods of cleaning up after those objects. > Got any examples of that? ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Question about garbage collection
> Frank Millman wrote at 2024-1-15 15:51 +0200: > >I have read that one should not have to worry about garbage collection > >in modern versions of Python - it 'just works'. Dieter Maurer via Python-list writes: > There are still some isolated cases when not all objects > in an unreachable cycle are destroyed > (see e.g. step 2 of > "https://devguide.python.org/internals/garbage-collector/index.html#destroying-unreachable-objects";). Also be warned that some modules (particularly if they're based on libraries not written in Python) might not garbage collect, so you may need to use other methods of cleaning up after those objects. ...Akkana -- https://mail.python.org/mailman/listinfo/python-list
Re: Question about garbage collection
Frank Millman wrote at 2024-1-15 15:51 +0200: >I have read that one should not have to worry about garbage collection >in modern versions of Python - it 'just works'. There are still some isolated cases when not all objects in an unreachable cycle are destroyed (see e.g. step 2 of "https://devguide.python.org/internals/garbage-collector/index.html#destroying-unreachable-objects";). But Python's own objects (e.g. traceback cycles) or instances of classes implemented in Python should no longer be affected. Thus, unless you use extensions implemented in C (with "legacy finalizer"s), garbage collection should not make problems. On the other hand, your application, too, must avoid memory leaks. Caches of various forms (with data for several sessions) might introduce them. -- https://mail.python.org/mailman/listinfo/python-list
Re: Question about garbage collection
> I do have several circular references. My experience is that if I do not > take some action to break the references when closing the session, the > objects remain alive. Below is a very simple program to illustrate this. > > Am I missing something? All comments appreciated. Python has normal reference counting, but also has a cyclic garbage collector. Here's plenty of detail about how it works: https://devguide.python.org/internals/garbage-collector/index.html Skip -- https://mail.python.org/mailman/listinfo/python-list
Question about garbage collection
Hi all I have read that one should not have to worry about garbage collection in modern versions of Python - it 'just works'. I don't want to rely on that. My app is a long-running server, with multiple clients logging on, doing stuff, and logging off. They can create many objects, some of them long-lasting. I want to be sure that all objects created are gc'd when the session ends. I do have several circular references. My experience is that if I do not take some action to break the references when closing the session, the objects remain alive. Below is a very simple program to illustrate this. Am I missing something? All comments appreciated. Frank Millman == import gc class delwatcher: # This stores enough information to identify the object being watched. # It does not store a reference to the object itself. def __init__(self, obj): self.id = (obj.type, obj.name, id(obj)) print('***', *self.id, 'created ***') def __del__(self): print('***', *self.id, 'deleted ***') class Parent: def __init__(self, name): self.type = 'parent' self.name = name self.children = [] self._del = delwatcher(self) class Child: def __init__(self, parent, name): self.type = 'child' self.parent = parent self.name = name parent.children.append(self) self._del = delwatcher(self) p1 = Parent('P1') p2 = Parent('P2') c1_1 = Child(p1, 'C1_1') c1_2 = Child(p1, 'C1_2') c2_1 = Child(p2, 'C2_1') c2_2 = Child(p2, 'C2_2') input('waiting ...') # if next 2 lines are included, parent and child can be gc'd # for ch in p1.children: # ch.parent = None # if next line is included, child can be gc'd, but not parent # p1.children = None del c1_1 del p1 gc.collect() input('wait some more ...') -- https://mail.python.org/mailman/listinfo/python-list