[ 
https://issues.apache.org/jira/browse/PDFBOX-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16180927#comment-16180927
 ] 

Tilman Hausherr commented on PDFBOX-3940:
-----------------------------------------

This regression first occured because of r187622 in PDFBOX-3923. One of the 
offsets is incorrect (points within table) so exception is thrown and the 
trailer is rebuilt. When rebuilding, this piece of code is hit:
{code}
// info dictionary
else if (dictionary.containsKey(COSName.MOD_DATE)
        && (dictionary.containsKey(COSName.TITLE)
                || dictionary.containsKey(COSName.AUTHOR)
                || dictionary.containsKey(COSName.SUBJECT)
                || dictionary.containsKey(COSName.KEYWORDS)
                || dictionary.containsKey(COSName.CREATOR)
                || dictionary.containsKey(COSName.PRODUCER)
                || dictionary.containsKey(COSName.CREATION_DATE)))
{
    trailer.setItem(COSName.INFO, document.getObjectFromPool(entry.getKey()));
}
{code}
The "&&" was introduced in PDFBOX-3208 ("ModDate is mandatory for an info 
dictionary"). In file 079977.pdf there is no /Info/ModDate. According to the 
PDF specification /ModDate is not mandatory.

In PDFBOX-3208 the problem was that without the change there, an outline 
dictionary was used as /Info because it had a /Title. I suggest check for 
/Parent to decide it's not an /Info. If there are other dictionaries that have 
items that are found in /Info then we'd have to add that as well.

> Lost metadata in 2.0.8-SNAPSHOT
> -------------------------------
>
>                 Key: PDFBOX-3940
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3940
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 2.0.8
>            Reporter: Tim Allison
>              Labels: regression
>         Attachments: 079977.pdf, 2_0_7_079977.pdf.json, 
> 2_0_8-SNAPSHOT_079977.pdf.json
>
>
> We noticed some missing metadata values in the recent large scale regression 
> testing.  I finally had a chance to look.  It looks like a genuine regression.
> The diff btwn 2.0.7 and 2.0.8-SNAPSHOT in metadata values is often -2.  
> However, in some files, the problem is more pronounced.
> In the attached file, when we call {{PDDocument.getDocumentInformation()}}, 
> the returned {{PDDocumentInformation info}} is empty in 2.0.8-SNAPSHOT but 
> not in 2.0.7.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to