[
https://issues.apache.org/jira/browse/PDFBOX-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13109696#comment-13109696
]
Andreas Lehmkühler commented on PDFBOX-1122:
--------------------------------------------
I can't reproduce the described behaviour using PDFReader and ExtractText. I've
tried different versions (1.6, 1.5, 1.4 and 1.3.1). Are you sure about the
PDFBox version? Which version of tika are you using?
> Parsing Error, Skipping Object
> ------------------------------
>
> Key: PDFBOX-1122
> URL: https://issues.apache.org/jira/browse/PDFBOX-1122
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Affects Versions: 1.6.0
> Environment: Working with Windows 7 in eclipse.
> Reporter: Raihan Jamal
> Labels: pdfbox
> Fix For: 1.7.0
>
> Original Estimate: 336h
> Remaining Estimate: 336h
>
> Parsing Error, Skipping Object
> java.io.IOException: expected='endstream' actual=''
> org.apache.pdfbox.io.PushBackInputStream@38011d45
> at
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:439)
> at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:552)
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1088)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1053)
> at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:74)
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
> at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
> at org.apache.tika.Tika.parseToString(Tika.java:357)
> at
> edu.uci.ics.crawler4j.crawler.BinaryParser.parse(BinaryParser.java:37)
> at
> edu.uci.ics.crawler4j.crawler.WebCrawler.handleBinary(WebCrawler.java:223)
> at
> edu.uci.ics.crawler4j.crawler.WebCrawler.processPage(WebCrawler.java:462)
> at edu.uci.ics.crawler4j.crawler.WebCrawler.run(WebCrawler.java:129)
> at java.lang.Thread.run(Thread.java:662)
> Did not found XRef object at specified startxref position 0
> This is the sample URL where I am facing this problem:-
> http://www.qualcomm.com/documents/files/rev-b-enhanced-mobile-broadband-for-all.pdf
> Any suggestions why is it happening...!! Or its a bug??
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira