Package: poppler-utils
Version: 0.12.4-1.2
Severity: important

I am trying to convert the following two columns PDF document to HTML:

$ wget http://www.hpca.ual.es/~vruiz/papers/ORTIZ04b.pdf

pdftohtml completely messes up the output since it is reading in 'layout' mode, 
using line1 from col1, then line1 from col2...

$ pdftohtml -noframes ORTIZ04b.pdf 1.html  

pdftotext does implement -raw mode and properly dumps the text in 'order' (not 
in layout mode):

$ pdftotext -raw -htmlmeta ORTIZ04b.pdf 2.html

pdftotext -htmlmeta is too crude to be used as an HTML generator, therefore 
-raw option needs to be present in pdftohtml to convert two columns PDF into 
HTML document properly.

Thanks

-- System Information:
Debian Release: 6.0.6
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable'), (200, 'testing'), (100, 
'unstable')
Architecture: amd64 (x86_64)

Kernel: Linux 3.2.0-0.bpo.3-amd64 (SMP w/8 CPU cores)
Locale: LANG=en_US.utf8, LC_CTYPE=en_US.utf8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages poppler-utils depends on:
ii  libc6              2.11.3-4              Embedded GNU C Library: Shared lib
ii  libfontconfig1     2.8.0-2.1             generic font configuration library
ii  libgcc1            1:4.4.5-8             GCC support library
ii  libpoppler5        0.12.4-1.2            PDF rendering library
ii  libstdc++6         4.4.5-8               The GNU Standard C++ Library v3
ii  libxml2            2.7.8.dfsg-2+squeeze5 GNOME XML library

Versions of packages poppler-utils recommends:
ii  ghostscript                 8.71~dfsg2-9 The GPL Ghostscript PostScript/PDF

poppler-utils suggests no packages.

-- no debconf information


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to