Ben- In attempting to use the PDFBox-0.6.0, I rec'd the following error when attempting to scan a reasonably sized PDF repository.
Any thoughts? caught a class java.io.EOFException with message: Unexpected end of ZLIB input stream Eric Anderson LanRx Network Solutions Quoting Ben Litchfield <[EMAIL PROTECTED]>: > I would like to announce the next release of PDFBox. PDFBox allows for > PDF documents to be indexed using lucene through a simple interface. > Please take a look at org.pdfbox.searchengine.lucene.LucenePDFDocument, > which will extract all text and PDF document summary properties as lucene > fields. > > You can obtain the latest release from http://www.pdfbox.org > > Please send all bug reports to me and attach the PDF document when > possible. > > RELEASE 0.6.0 > -Massive improvements to memory footprint. > -Must call close() on the COSDocument(LucenePDFDocument does this for you) > -Really fixed the bug where small documents were not being indexed. > -Fixed bug where no whitespace existed between obj and start of object. > Exception in thread "main" java.io.IOException: expected='obj' > actual='obj<</Pro > -Fixed issue with spacing where textLineMatrix was not being copied > properly > -Fixed 'bug' where parsing would fail with some pdfs with double endobj > definitions > -Added PDF document summary fields to the lucene document > > > Thank you, > Ben Litchfield > http://www.pdfbox.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > LanRx Network Solutions, Inc. Providing Enterprise Level Solutions...On A Small Business Budget --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]