Am 12.06.2015 um 15:37 schrieb Uday Venkatadasari:
Hi,
I am using tika 1.3 for parsing the pdf but I am getting error for one of
my pdf file. below is the error.
pdfbox 1.3.1
1.3.1 is from 2010, we're now at 1.8.9. TIKA ist now at 1.8. So please
try with these versions.
If it doesn't work, try also to configure TIKA to use the non sequential
parser of PDFBox.
If it still doesn't work, please open an issue in JIRA and attach your
PDF file.
Tilman
PS: you posted to the dev list. This is for PDFBox developers. Next
time, please post to the user list.
java.io.IOException: expected='obj' actual='655'
org.apache.pdfbox.io.PushBackInputStream@fe7591
at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:511)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:172)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:859)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:826)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:53)
Please help me to solve this issue.
Thanks
Uday Venkatadasari
Senior Consultant | Avalon Consulting, LLC
<http://www.avalonconsult.com/>P: 703 635 3302 | M: 631 332 1595
LinkedIn <http://www.linkedin.com/company/avalon-consulting-llc> | Google+
<http://www.google.com/+AvalonConsultingLLC> | Twitter
<https://twitter.com/avalonconsult>
-------------------------------------------------------------------------------------------------------------
This message (including any attachments) contains confidential information
intended for a specific individual and purpose, and is protected by law. If
you are not the intended recipient, you should delete this message. Any
disclosure, copying, or distribution of this message, or the taking of any
action based on it, is strictly prohibited.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org