Bug#748922: python-apt: TagFile doesnt close file

2014-06-06 Thread Michael Vogt
On Thu, May 22, 2014 at 11:57:12AM +0200, Johannes Schauer wrote:
 Package: python-apt
 Version: 0.9.3.5
 Severity: normal

Thanks for your bugreport.
 
 Consider the following snippet:
 
 --%---
 import gc
 import os
 import sys
 import apt_pkg
 
 print os.listdir(/proc/self/fd/)
 f = apt_pkg.TagFile(sys.argv[1])
 print os.listdir(/proc/self/fd/)
 del f
 print os.listdir(/proc/self/fd/)
 gc.collect
 print os.listdir(/proc/self/fd/)
 --%---

There is a small typo in the above script. gc.collect should be 
gc.collect().

I verified that the following works and does not leak fds:

class LeakTestCase(unittest.TestCase):
def test_leak(self):
# clenaup gc first
import gc
gc.collect()
# see what fds we have
fds = os.listdir(/proc/self/fd)
testfile = __file__
tagf = apt_pkg.TagFile(testfile)
tagf.step()
del tagf
import gc
gc.collect()
# ensure fd is closed
self.assertEqual(fds, os.listdir(/proc/self/fd))


Unfortunately just doing a del tagf is not enough, the gc call is
needed afterwards. The reason that the del is not enough is that there
is there is a cyclic reference from the tagf to tagf.section. The
garbage collector breaks it, but a simple del sees a refcount 
0. This particular case could maybe fixed by copying the data from the
pkgTagFile to a pkgTagSection instead of letting it operator on the
Buffer of pkgTagFile. But that requires somework (plus additional
memory for the copied data).

Cheers,
 Michael


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#748922: python-apt: TagFile doesnt close file

2014-06-06 Thread Johannes Schauer
Hi

Quoting Michael Vogt (2014-06-06 11:15:29)
 There is a small typo in the above script. gc.collect should be gc.collect().

right. I noticed this too but it was too late and I already sent in my
bugreport. See my other submission to it for my updated results.

 I verified that the following works and does not leak fds:
 
 class LeakTestCase(unittest.TestCase):
 def test_leak(self):
 # clenaup gc first
 import gc
 gc.collect()
 # see what fds we have
 fds = os.listdir(/proc/self/fd)
 testfile = __file__
 tagf = apt_pkg.TagFile(testfile)
 tagf.step()
 del tagf
 import gc
 gc.collect()
 # ensure fd is closed
 self.assertEqual(fds, os.listdir(/proc/self/fd))
 
 
 Unfortunately just doing a del tagf is not enough, the gc call is
 needed afterwards. The reason that the del is not enough is that there
 is there is a cyclic reference from the tagf to tagf.section. The
 garbage collector breaks it, but a simple del sees a refcount 
 0. This particular case could maybe fixed by copying the data from the
 pkgTagFile to a pkgTagSection instead of letting it operator on the
 Buffer of pkgTagFile. But that requires somework (plus additional
 memory for the copied data).

The problem is, as you also identified above, that as long as the Python object
for apt_pkg.TagFile is around, the file stays open.

I switched from apt_pkg to debian.deb822 because in my use case I want to read
from file A, modify the data and want to write back to A again. For that I
obviously should not have the original fd from reading A open when I write back
to A. One possible workaround would be to copy all of A into a StringIO and
then pass that to apt_pkg.TagFile. This would probably work but I think the
expectation is that after doing:

mypkgs = list(apt_pkg.TagFile(Packages))

or:

mypkgs = []
for pkg in apt_pkg.TagFile(Packages):
mypkgs.append(pkgs)

that there are no files left open. Currently in both cases, the fd is still
around. So after the last pkgTagSection is retrieved, the file should be
closed.

Maybe I find some time at some point to implement this but unfortunately the
debian.deb822 module seems to work quite well (even though it's orders of
magnitude slower).

cheers, josch


--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#748922: python-apt: TagFile doesnt close file

2014-06-06 Thread Julian Andres Klode
On Fri, Jun 06, 2014 at 11:25:52AM +0200, Johannes Schauer wrote:
 Hi
 
 Quoting Michael Vogt (2014-06-06 11:15:29)
  There is a small typo in the above script. gc.collect should be 
  gc.collect().
 
 right. I noticed this too but it was too late and I already sent in my
 bugreport. See my other submission to it for my updated results.
 
  I verified that the following works and does not leak fds:
  
  class LeakTestCase(unittest.TestCase):
  def test_leak(self):
  # clenaup gc first
  import gc
  gc.collect()
  # see what fds we have
  fds = os.listdir(/proc/self/fd)
  testfile = __file__
  tagf = apt_pkg.TagFile(testfile)
  tagf.step()
  del tagf
  import gc
  gc.collect()
  # ensure fd is closed
  self.assertEqual(fds, os.listdir(/proc/self/fd))
  
  
  Unfortunately just doing a del tagf is not enough, the gc call is
  needed afterwards. The reason that the del is not enough is that there
  is there is a cyclic reference from the tagf to tagf.section. The
  garbage collector breaks it, but a simple del sees a refcount 
  0. This particular case could maybe fixed by copying the data from the
  pkgTagFile to a pkgTagSection instead of letting it operator on the
  Buffer of pkgTagFile. But that requires somework (plus additional
  memory for the copied data).
 
 The problem is, as you also identified above, that as long as the Python 
 object
 for apt_pkg.TagFile is around, the file stays open.
 
 I switched from apt_pkg to debian.deb822 because in my use case I want to read
 from file A, modify the data and want to write back to A again. For that I
 obviously should not have the original fd from reading A open when I write 
 back
 to A. One possible workaround would be to copy all of A into a StringIO and
 then pass that to apt_pkg.TagFile. This would probably work but I think the
 expectation is that after doing:
 
   mypkgs = list(apt_pkg.TagFile(Packages))
 
 or:
 
   mypkgs = []
   for pkg in apt_pkg.TagFile(Packages):
   mypkgs.append(pkgs)
 
 that there are no files left open. Currently in both cases, the fd is still
 around. So after the last pkgTagSection is retrieved, the file should be
 closed.
 

The easiest way to make this work would be to make the reference from TagFile 
to 
TagSection weak, but this creates a small performance issue, because we'd need 
to 
re-create the TagSection for each section, unless it is referenced somewhere 
outside. So I'm not sure that's a good idea (same for mvo's idea).

What we really should do is add
(1) a close() method, and
(2) context manager support
to TagFile. This way you can just explicitly close the
file when you're done with it.

As a workaround, open the file yourself and pass the open file to the
tag file. The best way would be:

with open(Packages) as fobj:
for pkg in apt_pkg.TagFile(fobj):
mypkgs.append(pkgs)

You need to keep the file object open as long as the tag file
exists, so just TagFile(open(Packages)) would fail.


-- 
Julian Andres Klode  - Debian Developer, Ubuntu Member

See http://wiki.debian.org/JulianAndresKlode and http://jak-linux.org/.

Please do not top-post if possible.


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#748922: python-apt: TagFile doesnt close file

2014-05-22 Thread Johannes Schauer
Package: python-apt
Version: 0.9.3.5
Severity: normal

Consider the following snippet:

--%---
import gc
import os
import sys
import apt_pkg

print os.listdir(/proc/self/fd/)
f = apt_pkg.TagFile(sys.argv[1])
print os.listdir(/proc/self/fd/)
del f
print os.listdir(/proc/self/fd/)
gc.collect
print os.listdir(/proc/self/fd/)
--%---

calling apt_pkg.TagFile() opens a file descriptor but that descriptor is
not closed when the object gets deleted or even garbage collection is
explicitly run. Here the output:

['0', '1', '2', '3']
['0', '1', '2', '3', '4']
['0', '1', '2', '3', '4']
['0', '1', '2', '3', '4']

The opened fd should be closed when the python object is destroyed.

cheers, josch


-- System Information:
Debian Release: jessie/sid
  APT prefers testing
  APT policy: (990, 'testing'), (500, 'unstable'), (1, 'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 3.11-2-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages python-apt depends on:
ii  libapt-inst1.5 1.0.1
ii  libapt-pkg4.12 1.0.1
ii  libc6  2.18-4
ii  libgcc11:4.8.2-16
ii  libstdc++6 4.8.2-16
ii  python 2.7.5-5
ii  python-apt-common  0.9.3.5

Versions of packages python-apt recommends:
ii  iso-codes3.52-1
ii  lsb-release  4.1+Debian12
ii  xz-utils 5.1.1alpha+20120614-2

Versions of packages python-apt suggests:
pn  python-apt-dbg  none
pn  python-apt-doc  none
ii  python-gtk2 2.24.0-3+b1
pn  python-vte  none

-- debconf-show failed

-- debsums errors found:
sh: 1: /usr/sbin/dpkg-divert: not found


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org