[poppler] No text extracted by pdftohtml

Jaime Gómez Obregón Sun, 09 May 2010 07:39:23 -0700

Hi everybody,

It seems poppler is being unable to extract text in some PDF files:


http://iteisa.com/tmp/poppler-sample.pdf (11 Mb)

pdftohtml from poppler 0.12.4 and 0.12.2 is not able to extract thetext, and evince shows the document correctly but it's unable to selectit's text. However acroread shows and selects the text correctly (soit's normal, editable text and not an image).


Is it normal? Is there any workaround for this?

Everything seems ok with the file:

$ pdfinfo poppler-sample.pdf
Title:          untitled
Creator:        Adobe InDesign CS4 (6.0.4)
Producer:       Acrobat Distiller 9.0.0 (Windows)
CreationDate:   Wed May  5 09:35:12 2010
ModDate:        Wed May  5 09:35:12 2010
Tagged:         no
Pages:          208
Encrypted:      no
Page size:      595.276 x 841.89 pts (A4)
File size:      10536602 bytes
Optimized:      no
PDF version:    1.4

Best regards,

--
Jaime GÓMEZ OBREGÓN (ja...@iteisa.com)
http://www.iteisa.com
Teléfono: +34 902055277
ITEISA DESARROLLO Y SISTEMAS, S.L
Benidorm, 8 bajo. 39005 Santander.
España
_______________________________________________
poppler mailing list
poppler@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/poppler

[poppler] No text extracted by pdftohtml

Reply via email to