I would like to announce the next release of PDFBox.  PDFBox allows for
PDF documents to be indexed using lucene through a simple interface.
Please take a look at org.pdfbox.searchengine.lucene.LucenePDFDocument,
which will extract all text and PDF document summary properties as lucene
fields.

You can obtain the latest release from http://www.pdfbox.org

Please send all bug reports to me and attach the PDF document when
possible.

RELEASE 0.6.0
-Massive improvements to memory footprint.
-Must call close() on the COSDocument(LucenePDFDocument does this for you)
-Really fixed the bug where small documents were not being indexed.
-Fixed bug where no whitespace existed between obj and start of object.
    Exception in thread "main" java.io.IOException: expected='obj'
    actual='obj<</Pro
-Fixed issue with spacing where textLineMatrix was not being copied
 properly
-Fixed 'bug' where parsing would fail with some pdfs with double endobj
 definitions
-Added PDF document summary fields to the lucene document


Thank you,
Ben Litchfield
http://www.pdfbox.org



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to