Hello Luceners

I have started a new project and need to index pdf documents.
There are several projects around, which allow to extract the content,
like pdfbox, xpdf and pjclassic.

As far as I studied the FAQ's and examples, all these
tools allow simple text extraction.

Which of these open source tool can you recommend the most?

My pdf documents are quite long (in average more than 60 pages long).
Therefore I would like to have additional structure information for indexing.
This allows that the user not only gets the whole document as a result,
he also gets additional information like the page or the chapter, where
the relevant information is.

As anyone have similar requirements? Which of these tools
are the best to fit my requirements?

Thanks for your help
Thomas


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to