[ https://issues.apache.org/jira/browse/PDFBOX-4446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755103#comment-16755103 ]
Michael Klink edited comment on PDFBOX-4446 at 1/29/19 3:14 PM: ---------------------------------------------------------------- If you add some additional leniency, please only conditionally, e.g. only if {{isLenient}}. Apparently the PDF in question is broken, and attempts to repair broken PDFs may change their contents (or more to the point, different PDF processors might fix the error differently resulting in different PDF contents). So the default behavior should be an exception, and only if the leading application explicitly allows a repaired version. That being said, here that code implicitly is conditional, it only can make a difference if {{isLenient}}, otherwise the exception in line 718 would have been thrown. But it should be more obvious that it is conditional, e.g. by making it an {{else continue}} of the {{if( fileOffset != null ) \{...\}}} from line 709 to line 714. was (Author: mkl): If you add some additional leniency, please only conditionally, e.g. only if {{isLenient}}. Apparently the PDF in question is broken, and attempts to repair broken PDFs may change their contents (or more to the point, different PDF processors might fix the error differently resulting in different PDF contents). So the default behavior should be an exception, and only if the leading application explicitly allows a repaired version. That being said, here that code implicitly is conditional, it only can make a difference if {{isLenient}}, otherwise the exception in line 718 would have been thrown. But it should be more obvious that it is conditional, e.g. by making it an {{else continue}} of the {{if( fileOffset != null ) {...}}} from line 709 to line 714. > Tolerate some incorrect Xref in PDF file > ---------------------------------------- > > Key: PDFBOX-4446 > URL: https://issues.apache.org/jira/browse/PDFBOX-4446 > Project: PDFBox > Issue Type: Bug > Components: Parsing > Affects Versions: 2.0.14, 3.0.0 PDFBox > Reporter: Derek Liu > Priority: Major > Attachments: Reproduce_Step.png > > > Some PDF file may not have correct Xref, and we should tolerate them. Or just > log an error but not raise exception. > {code} > pdfbox/src/main/java/org/apache/pdfbox/pdfparser/COSParser.java | 3 +++ > 1 file changed, 3 insertions(+) > diff --git a/pdfbox/src/main/java/org/apache/pdfbox/pdfparser/COSParser.java > b/pdfbox/src/main/java/org/apache/pdfbox/pdfparser/COSParser.java > index 8ca955ed2..b2b28b258 100644 > --- a/pdfbox/src/main/java/org/apache/pdfbox/pdfparser/COSParser.java > +++ b/pdfbox/src/main/java/org/apache/pdfbox/pdfparser/COSParser.java > @@ -721,6 +721,9 @@ public class COSParser extends BaseParser > } > } > > + if( fileOffset == null ) { > + continue; > + } > List<COSObject> stmObjects = > objToBeParsed.get(fileOffset); > if (stmObjects == null) > { > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org