Package: poppler-utils Version: 0.12.4-1.2 Severity: important I am trying to convert the following two columns PDF document to HTML:
$ wget http://www.hpca.ual.es/~vruiz/papers/ORTIZ04b.pdf pdftohtml completely messes up the output since it is reading in 'layout' mode, using line1 from col1, then line1 from col2... $ pdftohtml -noframes ORTIZ04b.pdf 1.html pdftotext does implement -raw mode and properly dumps the text in 'order' (not in layout mode): $ pdftotext -raw -htmlmeta ORTIZ04b.pdf 2.html pdftotext -htmlmeta is too crude to be used as an HTML generator, therefore -raw option needs to be present in pdftohtml to convert two columns PDF into HTML document properly. Thanks -- System Information: Debian Release: 6.0.6 APT prefers stable-updates APT policy: (500, 'stable-updates'), (500, 'stable'), (200, 'testing'), (100, 'unstable') Architecture: amd64 (x86_64) Kernel: Linux 3.2.0-0.bpo.3-amd64 (SMP w/8 CPU cores) Locale: LANG=en_US.utf8, LC_CTYPE=en_US.utf8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Versions of packages poppler-utils depends on: ii libc6 2.11.3-4 Embedded GNU C Library: Shared lib ii libfontconfig1 2.8.0-2.1 generic font configuration library ii libgcc1 1:4.4.5-8 GCC support library ii libpoppler5 0.12.4-1.2 PDF rendering library ii libstdc++6 4.4.5-8 The GNU Standard C++ Library v3 ii libxml2 2.7.8.dfsg-2+squeeze5 GNOME XML library Versions of packages poppler-utils recommends: ii ghostscript 8.71~dfsg2-9 The GPL Ghostscript PostScript/PDF poppler-utils suggests no packages. -- no debconf information -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org