[ https://issues.apache.org/jira/browse/PDFBOX-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16180927#comment-16180927 ]
Tilman Hausherr commented on PDFBOX-3940: ----------------------------------------- This regression first occured because of r187622 in PDFBOX-3923. One of the offsets is incorrect (points within table) so exception is thrown and the trailer is rebuilt. When rebuilding, this piece of code is hit: {code} // info dictionary else if (dictionary.containsKey(COSName.MOD_DATE) && (dictionary.containsKey(COSName.TITLE) || dictionary.containsKey(COSName.AUTHOR) || dictionary.containsKey(COSName.SUBJECT) || dictionary.containsKey(COSName.KEYWORDS) || dictionary.containsKey(COSName.CREATOR) || dictionary.containsKey(COSName.PRODUCER) || dictionary.containsKey(COSName.CREATION_DATE))) { trailer.setItem(COSName.INFO, document.getObjectFromPool(entry.getKey())); } {code} The "&&" was introduced in PDFBOX-3208 ("ModDate is mandatory for an info dictionary"). In file 079977.pdf there is no /Info/ModDate. According to the PDF specification /ModDate is not mandatory. In PDFBOX-3208 the problem was that without the change there, an outline dictionary was used as /Info because it had a /Title. I suggest check for /Parent to decide it's not an /Info. If there are other dictionaries that have items that are found in /Info then we'd have to add that as well. > Lost metadata in 2.0.8-SNAPSHOT > ------------------------------- > > Key: PDFBOX-3940 > URL: https://issues.apache.org/jira/browse/PDFBOX-3940 > Project: PDFBox > Issue Type: Bug > Components: Parsing > Affects Versions: 2.0.8 > Reporter: Tim Allison > Labels: regression > Attachments: 079977.pdf, 2_0_7_079977.pdf.json, > 2_0_8-SNAPSHOT_079977.pdf.json > > > We noticed some missing metadata values in the recent large scale regression > testing. I finally had a chance to look. It looks like a genuine regression. > The diff btwn 2.0.7 and 2.0.8-SNAPSHOT in metadata values is often -2. > However, in some files, the problem is more pronounced. > In the attached file, when we call {{PDDocument.getDocumentInformation()}}, > the returned {{PDDocumentInformation info}} is empty in 2.0.8-SNAPSHOT but > not in 2.0.7. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org