On 2015-09-13 20:06, Rob Hawkins wrote:
Greetings all,
Can pdftohtml produce output for Burmese, Khmer, Indonesian, Thai and
Vietnamese? I didn't see a language pack for any except Thai, and
that one doesn't produce properly formatted characters for my source
files. They're missing the vowel marks. The other languages fail
completely on my setup. I've tried on OS X and Ubuntu 12.
My source files are here:
https://github.com/robhawkins/drive-taiwan/tree/master/input/pdf
Chinese seems to work fine.
I found out that PDF.js will produce good output, though I already
have code based on pdftohtml output and would rather not switch if not
necessary. I wonder if there is something wrong with my setup.
Thanks for any help even if it's just a "nope, that's not possible"
kind of reply =)
Rob
pdftohtml can work with those languages but it depends the ability to
extract the plain text from the document. From the couple of PDFs I've
looked at, they have problems with text extraction. Possibly poppler
could do a better job, but as several application I tried have problems
extracting text from those documents, it's probably just a problem with
those documents.
I assume that PDF.js just works in a different way and doesn't require
the extracted text to be correct.
_______________________________________________
poppler mailing list
poppler@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/poppler