Mark,

Have you upgraded to the latest FontBox ? now at 0.8

I think it is a good idea to pull the latest SVN ( then you can hack away at the nice Java code :~))

Cheers +++ Iain

Mark Kerzner wrote:
Hi,
I have compared the PDFBox-to-text to the pdftohtml (in Linux) - then to
text conversion, and I found the second one a little clearer. For example,
the bottom lines in a PDF (Copyrights, etc) were combined into one line by
the PDFBox conversion, and had three separate pieces in the other way.

I am using the last stable PDFBox jar, which dates back to 2006, and the
pdftohtml utility is from about the same time, so I can understand this.

My question then is twofold: does the comparison make sense, and should I
use the pdftohtml combined with text converter, or should I try to build the
latest from SVN?

Thank you,
Mark


Reply via email to