On Tue, Jan 10, 2012 at 10:18 AM, Kim Haase <[email protected]> wrote: > I just tried it. Each line of the extracted text is wrapped in a paragraph > tag, and each page is wrapped in div tags. That's all. No other HTML tags > are used.
Yep, I put a copy of the output of PDFBox's Getting Started manual here if other people want to see how it looks in the browser. I think the display problems could easily be fixed with a simple CSS file. Losing the links is a shame, though. File is here: http://minotaur.apache.org/~fuzzylogic/getstart.html regards, -andrew
