Ariel wrote:

The problem I have is that my application spends a lot of time to index all
the documents, the delay to index 10 gb of pdf documents is about 2 days (to
convert pdf to text I am using pdfbox) that is of course a lot of time,
others applications based in lucene, for instance ibm omnifind only takes 5
hours to index the same amount of pdfs documents. I would like to find out

If you are using log4j, make sure you have the pdfbox log4j categories set to info or higher, otherwise this really slows it down (factor of 10) or make sure you are using the non log4j version. See


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to