Daniel Haglund wrote:

Hi.

I tried searching the archive but could not find a suitable answer.

That's because there is none.

I would like to know if it is possible to convert a simple (i.e. no images) PDF-file to text? I have tried using a utility called pdftotext. I does a pretty good job when invoked with the -layout switch. That switch preserves the document layout. However pdftotext produces garbage characters for some fonts it seems.

That's very normal. It's 'in the nature' of PDF.
PDF is a one-way process. The PDF is the end product.
You are not supposed to convert it to text.

Anyway, iText looks like a great product and maybe it can convert all PDFs to text regardless of fonts used? If anyone would have some sample code for this it would be even better.

You need an OCR tool.
br,
Bruno


-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

Reply via email to