[
https://issues.apache.org/jira/browse/PDFBOX-3955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated PDFBOX-3955:
--------------------------------
Description:
In the latest regression run with PDFBox's 2.x branch, we're now getting very
slow processing on a truncated PDF with PDFBox app's {{ExtractText}}:
http://162.242.228.174/docs/truncated_pdfs/commoncrawl2_likely_broken/7K/7KK53NK5PVKOUGDSQ4FK6542BNPC4SWB
Turns out this is not an infinite loop. After 4.5 minutes, {{ExtractText}}
eventually ended with:
{noformat}
Exception in thread "main" java.io.IOException: Missing root object
specification in trailer.
at
org.apache.pdfbox.pdfparser.COSParser.parseTrailerValuesDynamically(COSParser.java:2508)
at
org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:193)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:240)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1012)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:950)
at
org.apache.pdfbox.tools.ExtractText.startExtraction(ExtractText.java:192)
at org.apache.pdfbox.tools.ExtractText.main(ExtractText.java:82)
at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:60)
{noformat}
.
was:
In the latest regression run with PDFBox's 2.x branch, we're now getting an
infinite loop on a truncated PDF with PDFBox app's {{ExtractText}}:
http://162.242.228.174/docs/truncated_pdfs/commoncrawl2_likely_broken/7K/7KK53NK5PVKOUGDSQ4FK6542BNPC4SWB
.
> new infinite loop on truncated PDF
> ----------------------------------
>
> Key: PDFBOX-3955
> URL: https://issues.apache.org/jira/browse/PDFBOX-3955
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Reporter: Tim Allison
> Assignee: Andreas Lehmkühler
>
> In the latest regression run with PDFBox's 2.x branch, we're now getting very
> slow processing on a truncated PDF with PDFBox app's {{ExtractText}}:
> http://162.242.228.174/docs/truncated_pdfs/commoncrawl2_likely_broken/7K/7KK53NK5PVKOUGDSQ4FK6542BNPC4SWB
> Turns out this is not an infinite loop. After 4.5 minutes, {{ExtractText}}
> eventually ended with:
> {noformat}
> Exception in thread "main" java.io.IOException: Missing root object
> specification in trailer.
> at
> org.apache.pdfbox.pdfparser.COSParser.parseTrailerValuesDynamically(COSParser.java:2508)
> at
> org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:193)
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:240)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1012)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:950)
> at
> org.apache.pdfbox.tools.ExtractText.startExtraction(ExtractText.java:192)
> at org.apache.pdfbox.tools.ExtractText.main(ExtractText.java:82)
> at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:60)
> {noformat}
> .
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]