Bug#548987: xapian-omega: omindex allows too little memory for the programs handling the file formats

2009-11-03 Thread Olly Betts
tags 548987 + upstream pending
thanks

On Wed, Sep 30, 2009 at 01:31:06PM +0200, Rune Kock wrote:
 On Wed, Sep 30, 2009 at 11:10, Olly Betts o...@survex.com wrote:
  I think this is probably the cause of upstream #358:
 
  http://trac.xapian.org/ticket/358
 
 Yes, #358 is probably two bugs, this one and a memory leak.
 evoisard has omindex using 360 MB, which seems excessive, but
 shouldn't be a problem on his 1 GB machine.

Indexing large documents is fairly memory hungry, so I don't think this is
a leak, just the C++ STL hording memory, probably plus some heap fragmentation.

The reporter never responded to my request for further investigation so it's
hard to be totally sure, but there's only one place in omindex which explicitly
allocates memory dynamically, and that is only called once.

  Debian bug 404528 decribes a similar bug in another package, and
  suggests using _SC_PHYS_PAGES instead of _SC_AVPHYS_PAGES.
 
  _SC_PHYS_PAGES isn't ideal as other processes might be using that
  memory, but _SC_AVPHYS_PAGES clearly isn't suitable, and since this is
  mostly a catch for filters spiralling out of control, it looks like
  _SC_AVPHYS_PAGES is probably the best option, perhaps with a lower ratio
  than 7/8.
 
 I agree.  Or maybe even just a fixed limit of, say, 50 MB.

Some document filters will need more than 50MB to extract a large document.
We really just want to ensure that an out of control filter doesn't cause
problems.

I've committed a fix to upstream SVN for this.

Cheers,
Olly



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#548987: xapian-omega: omindex allows too little memory for the programs handling the file formats

2009-09-30 Thread Olly Betts
On Wed, Sep 30, 2009 at 05:54:45AM +0200, Rune Kock wrote:
 As far as I can tell, the problem is that runfilter.cc sets an rlimit
 of 7/8 of the free memory.  And that freemem.cc calculates that using
 sysconf(_SC_AVPHYS_PAGES), which doesn't include the memory that the
 kernel is using for temporary caching, even though that really is
 available.

I think this is probably the cause of upstream #358:

http://trac.xapian.org/ticket/358

 Debian bug 404528 decribes a similar bug in another package, and
 suggests using _SC_PHYS_PAGES instead of _SC_AVPHYS_PAGES.

_SC_PHYS_PAGES isn't ideal as other processes might be using that
memory, but _SC_AVPHYS_PAGES clearly isn't suitable, and since this is
mostly a catch for filters spiralling out of control, it looks like
_SC_AVPHYS_PAGES is probably the best option, perhaps with a lower ratio
than 7/8.

Thanks for the report and especially the detective work.

Cheers,
Olly



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#548987: xapian-omega: omindex allows too little memory for the programs handling the file formats

2009-09-30 Thread Rune Kock
On Wed, Sep 30, 2009 at 11:10, Olly Betts o...@survex.com wrote:
 I think this is probably the cause of upstream #358:

 http://trac.xapian.org/ticket/358

Yes, #358 is probably two bugs, this one and a memory leak.
evoisard has omindex using 360 MB, which seems excessive, but
shouldn't be a problem on his 1 GB machine.

 Debian bug 404528 decribes a similar bug in another package, and
 suggests using _SC_PHYS_PAGES instead of _SC_AVPHYS_PAGES.

 _SC_PHYS_PAGES isn't ideal as other processes might be using that
 memory, but _SC_AVPHYS_PAGES clearly isn't suitable, and since this is
 mostly a catch for filters spiralling out of control, it looks like
 _SC_AVPHYS_PAGES is probably the best option, perhaps with a lower ratio
 than 7/8.

I agree.  Or maybe even just a fixed limit of, say, 50 MB.



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#548987: xapian-omega: omindex allows too little memory for the programs handling the file formats

2009-09-29 Thread Rune Kock
Package: xapian-omega
Version: 1.0.16-1
Severity: important

I get the following error when trying to index a PDF file:

Indexing /x.pdf as application/pdf ... pdftotext: error while
loading shared libraries: libc.so.6: failed to map segment from shared
object: Cannot allocate memory
Filter for application/pdf not installed - ignoring extension pdf

As far as I can tell, the problem is that runfilter.cc sets an rlimit
of 7/8 of the free memory.  And that freemem.cc calculates that using
sysconf(_SC_AVPHYS_PAGES), which doesn't include the memory that the
kernel is using for temporary caching, even though that really is
available.

Debian bug 404528 decribes a similar bug in another package, and
suggests using _SC_PHYS_PAGES instead of _SC_AVPHYS_PAGES.

Here is the output of 'free' on my system:
 total   used   free sharedbuffers cached
Mem:249644 246080   3564  0  0 182332
-/+ buffers/cache:  63748 185896
Swap:   979956  11720 968236


-- System Information:
Debian Release: squeeze/sid
  APT prefers testing
  APT policy: (500, 'testing')
Architecture: i386 (i586)

Kernel: Linux 2.6.30-1-486
Locale: LANG=en_DK.UTF-8, LC_CTYPE=en_DK.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages xapian-omega depends on:
ii  libc6 2.9-25 GNU C Library: Shared libraries
ii  libgcc1   1:4.4.1-1  GCC support library
ii  libstdc++64.4.1-1The GNU Standard C++ Library v3
ii  libxapian15   1.0.16-3   Search engine library

Versions of packages xapian-omega recommends:
ii  apache2   2.2.13-2   Apache HTTP Server metapackage
ii  apache2-mpm-worker [httpd-cgi 2.2.13-2   Apache HTTP Server - high speed th

Versions of packages xapian-omega suggests:
ii  antiword 0.37-6  Converts MS Word files to text, PS
ii  catdoc   0.94.2-1MS-Word to TeX or plain text conve
pn  catdvi   none  (no description available)
pn  djvulibre-binnone  (no description available)
pn  ghostscript  none  (no description available)
pn  libwpd-tools none  (no description available)
pn  libwps-tools none  (no description available)
ii  perl 5.10.0-25   Larry Wall's Practical Extraction
ii  unrtf0.19.3-1.1  RTF to other formats converter
ii  unzip6.0-1   De-archiver for .zip files
ii  xpdf-utils   3.02-1.4+lenny1 Portable Document Format (PDF) sui

-- debconf-show failed



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org