Mark, Have you upgraded to the latest FontBox ? now at 0.8
I think it is a good idea to pull the latest SVN ( then you can hack away at the nice Java code :~))
Cheers +++ Iain Mark Kerzner wrote:
Hi, I have compared the PDFBox-to-text to the pdftohtml (in Linux) - then to text conversion, and I found the second one a little clearer. For example, the bottom lines in a PDF (Copyrights, etc) were combined into one line by the PDFBox conversion, and had three separate pieces in the other way. I am using the last stable PDFBox jar, which dates back to 2006, and the pdftohtml utility is from about the same time, so I can understand this. My question then is twofold: does the comparison make sense, and should I use the pdftohtml combined with text converter, or should I try to build the latest from SVN? Thank you, Mark
