Amit, I'll reiterate Adam's comments, I've notice a huge difference in the number of PDFs which can be indexed between 0.7.x and 0.8.
--Phil On Fri, Jan 22, 2010 at 10:25 AM, <[email protected]> wrote: > Amit, > > Attachments don't make it through to the mailing list. I'd suggest trying > with PDFBox 0.8.0 or, even better, using the latest checkout from SVN. A > lot of bugs have been fixed, features added, and the performance has > gotten better. Let us know if you are still having issues with the latest > code. > > --Adam > > > > From: > Amit Lole <[email protected]> > To: > [email protected] > Date: > 01/22/2010 00:53 > Subject: > PDF Text extraction problem > > > > Hi, > > I am trying to extract text from the pdf file using pdfbox 0.7.3, but the > output file is not complete. > Some pages are missing in the output. > > can you please help me in resolving this issue. I have attached sample pdf > with this mail. > > Thanks > Amit > > > ? Click here to submit conditions > > This email and any content within or attached hereto from Sun West Mortgage > Company, Inc. is confidential and/or legally privileged. The information is > intended only for the use of the individual or entity named on this email. If > you are not the intended recipient, you are hereby notified that any > disclosure, copying, distribution or the taking of any action in reliance on > the contents of this email information is strictly prohibited, and that the > documents should be returned to this office immediately by email. Receipt by > anyone other than the intended recipient is not a waiver of any privilege. > Please do not include your social security number, account number, or any > other personal or financial information in the content of the email. Should > you have any questions, please call (800) 453 7884. -- Machines might be interesting, but people are fascinating. -- K.P.
