Bug#748922: python-apt: TagFile doesnt close file
On Thu, May 22, 2014 at 11:57:12AM +0200, Johannes Schauer wrote: Package: python-apt Version: 0.9.3.5 Severity: normal Thanks for your bugreport. Consider the following snippet: --%--- import gc import os import sys import apt_pkg print os.listdir(/proc/self/fd/) f = apt_pkg.TagFile(sys.argv[1]) print os.listdir(/proc/self/fd/) del f print os.listdir(/proc/self/fd/) gc.collect print os.listdir(/proc/self/fd/) --%--- There is a small typo in the above script. gc.collect should be gc.collect(). I verified that the following works and does not leak fds: class LeakTestCase(unittest.TestCase): def test_leak(self): # clenaup gc first import gc gc.collect() # see what fds we have fds = os.listdir(/proc/self/fd) testfile = __file__ tagf = apt_pkg.TagFile(testfile) tagf.step() del tagf import gc gc.collect() # ensure fd is closed self.assertEqual(fds, os.listdir(/proc/self/fd)) Unfortunately just doing a del tagf is not enough, the gc call is needed afterwards. The reason that the del is not enough is that there is there is a cyclic reference from the tagf to tagf.section. The garbage collector breaks it, but a simple del sees a refcount 0. This particular case could maybe fixed by copying the data from the pkgTagFile to a pkgTagSection instead of letting it operator on the Buffer of pkgTagFile. But that requires somework (plus additional memory for the copied data). Cheers, Michael -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#748922: python-apt: TagFile doesnt close file
Hi Quoting Michael Vogt (2014-06-06 11:15:29) There is a small typo in the above script. gc.collect should be gc.collect(). right. I noticed this too but it was too late and I already sent in my bugreport. See my other submission to it for my updated results. I verified that the following works and does not leak fds: class LeakTestCase(unittest.TestCase): def test_leak(self): # clenaup gc first import gc gc.collect() # see what fds we have fds = os.listdir(/proc/self/fd) testfile = __file__ tagf = apt_pkg.TagFile(testfile) tagf.step() del tagf import gc gc.collect() # ensure fd is closed self.assertEqual(fds, os.listdir(/proc/self/fd)) Unfortunately just doing a del tagf is not enough, the gc call is needed afterwards. The reason that the del is not enough is that there is there is a cyclic reference from the tagf to tagf.section. The garbage collector breaks it, but a simple del sees a refcount 0. This particular case could maybe fixed by copying the data from the pkgTagFile to a pkgTagSection instead of letting it operator on the Buffer of pkgTagFile. But that requires somework (plus additional memory for the copied data). The problem is, as you also identified above, that as long as the Python object for apt_pkg.TagFile is around, the file stays open. I switched from apt_pkg to debian.deb822 because in my use case I want to read from file A, modify the data and want to write back to A again. For that I obviously should not have the original fd from reading A open when I write back to A. One possible workaround would be to copy all of A into a StringIO and then pass that to apt_pkg.TagFile. This would probably work but I think the expectation is that after doing: mypkgs = list(apt_pkg.TagFile(Packages)) or: mypkgs = [] for pkg in apt_pkg.TagFile(Packages): mypkgs.append(pkgs) that there are no files left open. Currently in both cases, the fd is still around. So after the last pkgTagSection is retrieved, the file should be closed. Maybe I find some time at some point to implement this but unfortunately the debian.deb822 module seems to work quite well (even though it's orders of magnitude slower). cheers, josch -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#748922: python-apt: TagFile doesnt close file
On Fri, Jun 06, 2014 at 11:25:52AM +0200, Johannes Schauer wrote: Hi Quoting Michael Vogt (2014-06-06 11:15:29) There is a small typo in the above script. gc.collect should be gc.collect(). right. I noticed this too but it was too late and I already sent in my bugreport. See my other submission to it for my updated results. I verified that the following works and does not leak fds: class LeakTestCase(unittest.TestCase): def test_leak(self): # clenaup gc first import gc gc.collect() # see what fds we have fds = os.listdir(/proc/self/fd) testfile = __file__ tagf = apt_pkg.TagFile(testfile) tagf.step() del tagf import gc gc.collect() # ensure fd is closed self.assertEqual(fds, os.listdir(/proc/self/fd)) Unfortunately just doing a del tagf is not enough, the gc call is needed afterwards. The reason that the del is not enough is that there is there is a cyclic reference from the tagf to tagf.section. The garbage collector breaks it, but a simple del sees a refcount 0. This particular case could maybe fixed by copying the data from the pkgTagFile to a pkgTagSection instead of letting it operator on the Buffer of pkgTagFile. But that requires somework (plus additional memory for the copied data). The problem is, as you also identified above, that as long as the Python object for apt_pkg.TagFile is around, the file stays open. I switched from apt_pkg to debian.deb822 because in my use case I want to read from file A, modify the data and want to write back to A again. For that I obviously should not have the original fd from reading A open when I write back to A. One possible workaround would be to copy all of A into a StringIO and then pass that to apt_pkg.TagFile. This would probably work but I think the expectation is that after doing: mypkgs = list(apt_pkg.TagFile(Packages)) or: mypkgs = [] for pkg in apt_pkg.TagFile(Packages): mypkgs.append(pkgs) that there are no files left open. Currently in both cases, the fd is still around. So after the last pkgTagSection is retrieved, the file should be closed. The easiest way to make this work would be to make the reference from TagFile to TagSection weak, but this creates a small performance issue, because we'd need to re-create the TagSection for each section, unless it is referenced somewhere outside. So I'm not sure that's a good idea (same for mvo's idea). What we really should do is add (1) a close() method, and (2) context manager support to TagFile. This way you can just explicitly close the file when you're done with it. As a workaround, open the file yourself and pass the open file to the tag file. The best way would be: with open(Packages) as fobj: for pkg in apt_pkg.TagFile(fobj): mypkgs.append(pkgs) You need to keep the file object open as long as the tag file exists, so just TagFile(open(Packages)) would fail. -- Julian Andres Klode - Debian Developer, Ubuntu Member See http://wiki.debian.org/JulianAndresKlode and http://jak-linux.org/. Please do not top-post if possible. -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#748922: python-apt: TagFile doesnt close file
Package: python-apt Version: 0.9.3.5 Severity: normal Consider the following snippet: --%--- import gc import os import sys import apt_pkg print os.listdir(/proc/self/fd/) f = apt_pkg.TagFile(sys.argv[1]) print os.listdir(/proc/self/fd/) del f print os.listdir(/proc/self/fd/) gc.collect print os.listdir(/proc/self/fd/) --%--- calling apt_pkg.TagFile() opens a file descriptor but that descriptor is not closed when the object gets deleted or even garbage collection is explicitly run. Here the output: ['0', '1', '2', '3'] ['0', '1', '2', '3', '4'] ['0', '1', '2', '3', '4'] ['0', '1', '2', '3', '4'] The opened fd should be closed when the python object is destroyed. cheers, josch -- System Information: Debian Release: jessie/sid APT prefers testing APT policy: (990, 'testing'), (500, 'unstable'), (1, 'experimental') Architecture: amd64 (x86_64) Foreign Architectures: i386 Kernel: Linux 3.11-2-amd64 (SMP w/4 CPU cores) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Versions of packages python-apt depends on: ii libapt-inst1.5 1.0.1 ii libapt-pkg4.12 1.0.1 ii libc6 2.18-4 ii libgcc11:4.8.2-16 ii libstdc++6 4.8.2-16 ii python 2.7.5-5 ii python-apt-common 0.9.3.5 Versions of packages python-apt recommends: ii iso-codes3.52-1 ii lsb-release 4.1+Debian12 ii xz-utils 5.1.1alpha+20120614-2 Versions of packages python-apt suggests: pn python-apt-dbg none pn python-apt-doc none ii python-gtk2 2.24.0-3+b1 pn python-vte none -- debconf-show failed -- debsums errors found: sh: 1: /usr/sbin/dpkg-divert: not found -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org