[jira] [Commented] (PDFBOX-3940) Lost metadata in 2.0.8-SNAPSHOT
[ https://issues.apache.org/jira/browse/PDFBOX-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16199180#comment-16199180 ] ASF subversion and git services commented on PDFBOX-3940: - Commit 1811760 from [~tilman] in branch 'pdfbox/branches/2.0' [ https://svn.apache.org/r1811760 ] PDFBOX-3940: /Info dictionary can't have a /A or /Dest item > Lost metadata in 2.0.8-SNAPSHOT > --- > > Key: PDFBOX-3940 > URL: https://issues.apache.org/jira/browse/PDFBOX-3940 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Affects Versions: 2.0.8 >Reporter: Tim Allison >Assignee: Tilman Hausherr > Labels: regression > Fix For: 2.0.8, 3.0.0 > > Attachments: 079977.pdf, 2_0_7_079977.pdf.json, > 2_0_8-SNAPSHOT_079977.pdf.json, J4S6TTBZEDXOJ77USE3HTUDSAXU2CRR4.pdf > > > We noticed some missing metadata values in the recent large scale regression > testing. I finally had a chance to look. It looks like a genuine regression. > The diff btwn 2.0.7 and 2.0.8-SNAPSHOT in metadata values is often -2. > However, in some files, the problem is more pronounced. > In the attached file, when we call {{PDDocument.getDocumentInformation()}}, > the returned {{PDDocumentInformation info}} is empty in 2.0.8-SNAPSHOT but > not in 2.0.7. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3940) Lost metadata in 2.0.8-SNAPSHOT
[ https://issues.apache.org/jira/browse/PDFBOX-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16199179#comment-16199179 ] ASF subversion and git services commented on PDFBOX-3940: - Commit 1811759 from [~tilman] in branch 'pdfbox/trunk' [ https://svn.apache.org/r1811759 ] PDFBOX-3940: /Info dictionary can't have a /A or /Dest item > Lost metadata in 2.0.8-SNAPSHOT > --- > > Key: PDFBOX-3940 > URL: https://issues.apache.org/jira/browse/PDFBOX-3940 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Affects Versions: 2.0.8 >Reporter: Tim Allison >Assignee: Tilman Hausherr > Labels: regression > Fix For: 2.0.8, 3.0.0 > > Attachments: 079977.pdf, 2_0_7_079977.pdf.json, > 2_0_8-SNAPSHOT_079977.pdf.json, J4S6TTBZEDXOJ77USE3HTUDSAXU2CRR4.pdf > > > We noticed some missing metadata values in the recent large scale regression > testing. I finally had a chance to look. It looks like a genuine regression. > The diff btwn 2.0.7 and 2.0.8-SNAPSHOT in metadata values is often -2. > However, in some files, the problem is more pronounced. > In the attached file, when we call {{PDDocument.getDocumentInformation()}}, > the returned {{PDDocumentInformation info}} is empty in 2.0.8-SNAPSHOT but > not in 2.0.7. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3940) Lost metadata in 2.0.8-SNAPSHOT
[ https://issues.apache.org/jira/browse/PDFBOX-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16182730#comment-16182730 ] ASF subversion and git services commented on PDFBOX-3940: - Commit 1809861 from [~tilman] in branch 'pdfbox/branches/2.0' [ https://svn.apache.org/r1809861 ] PDFBOX-3940: add test > Lost metadata in 2.0.8-SNAPSHOT > --- > > Key: PDFBOX-3940 > URL: https://issues.apache.org/jira/browse/PDFBOX-3940 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Affects Versions: 2.0.8 >Reporter: Tim Allison >Assignee: Tilman Hausherr > Labels: regression > Fix For: 2.0.8, 3.0.0 > > Attachments: 079977.pdf, 2_0_7_079977.pdf.json, > 2_0_8-SNAPSHOT_079977.pdf.json > > > We noticed some missing metadata values in the recent large scale regression > testing. I finally had a chance to look. It looks like a genuine regression. > The diff btwn 2.0.7 and 2.0.8-SNAPSHOT in metadata values is often -2. > However, in some files, the problem is more pronounced. > In the attached file, when we call {{PDDocument.getDocumentInformation()}}, > the returned {{PDDocumentInformation info}} is empty in 2.0.8-SNAPSHOT but > not in 2.0.7. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3940) Lost metadata in 2.0.8-SNAPSHOT
[ https://issues.apache.org/jira/browse/PDFBOX-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16182729#comment-16182729 ] ASF subversion and git services commented on PDFBOX-3940: - Commit 1809860 from [~tilman] in branch 'pdfbox/trunk' [ https://svn.apache.org/r1809860 ] PDFBOX-3940: add test > Lost metadata in 2.0.8-SNAPSHOT > --- > > Key: PDFBOX-3940 > URL: https://issues.apache.org/jira/browse/PDFBOX-3940 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Affects Versions: 2.0.8 >Reporter: Tim Allison >Assignee: Tilman Hausherr > Labels: regression > Fix For: 2.0.8, 3.0.0 > > Attachments: 079977.pdf, 2_0_7_079977.pdf.json, > 2_0_8-SNAPSHOT_079977.pdf.json > > > We noticed some missing metadata values in the recent large scale regression > testing. I finally had a chance to look. It looks like a genuine regression. > The diff btwn 2.0.7 and 2.0.8-SNAPSHOT in metadata values is often -2. > However, in some files, the problem is more pronounced. > In the attached file, when we call {{PDDocument.getDocumentInformation()}}, > the returned {{PDDocumentInformation info}} is empty in 2.0.8-SNAPSHOT but > not in 2.0.7. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3940) Lost metadata in 2.0.8-SNAPSHOT
[ https://issues.apache.org/jira/browse/PDFBOX-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16180953#comment-16180953 ] ASF subversion and git services commented on PDFBOX-3940: - Commit 1809755 from [~tilman] in branch 'pdfbox/trunk' [ https://svn.apache.org/r1809755 ] PDFBOX-3940: /Info dictionary can't have a /Parent item, and /ModDate is not mandatory > Lost metadata in 2.0.8-SNAPSHOT > --- > > Key: PDFBOX-3940 > URL: https://issues.apache.org/jira/browse/PDFBOX-3940 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Affects Versions: 2.0.8 >Reporter: Tim Allison > Labels: regression > Attachments: 079977.pdf, 2_0_7_079977.pdf.json, > 2_0_8-SNAPSHOT_079977.pdf.json > > > We noticed some missing metadata values in the recent large scale regression > testing. I finally had a chance to look. It looks like a genuine regression. > The diff btwn 2.0.7 and 2.0.8-SNAPSHOT in metadata values is often -2. > However, in some files, the problem is more pronounced. > In the attached file, when we call {{PDDocument.getDocumentInformation()}}, > the returned {{PDDocumentInformation info}} is empty in 2.0.8-SNAPSHOT but > not in 2.0.7. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3940) Lost metadata in 2.0.8-SNAPSHOT
[ https://issues.apache.org/jira/browse/PDFBOX-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16180954#comment-16180954 ] ASF subversion and git services commented on PDFBOX-3940: - Commit 1809756 from [~tilman] in branch 'pdfbox/branches/2.0' [ https://svn.apache.org/r1809756 ] PDFBOX-3940: /Info dictionary can't have a /Parent item, and /ModDate is not mandatory > Lost metadata in 2.0.8-SNAPSHOT > --- > > Key: PDFBOX-3940 > URL: https://issues.apache.org/jira/browse/PDFBOX-3940 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Affects Versions: 2.0.8 >Reporter: Tim Allison > Labels: regression > Attachments: 079977.pdf, 2_0_7_079977.pdf.json, > 2_0_8-SNAPSHOT_079977.pdf.json > > > We noticed some missing metadata values in the recent large scale regression > testing. I finally had a chance to look. It looks like a genuine regression. > The diff btwn 2.0.7 and 2.0.8-SNAPSHOT in metadata values is often -2. > However, in some files, the problem is more pronounced. > In the attached file, when we call {{PDDocument.getDocumentInformation()}}, > the returned {{PDDocumentInformation info}} is empty in 2.0.8-SNAPSHOT but > not in 2.0.7. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3940) Lost metadata in 2.0.8-SNAPSHOT
[ https://issues.apache.org/jira/browse/PDFBOX-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16180927#comment-16180927 ] Tilman Hausherr commented on PDFBOX-3940: - This regression first occured because of r187622 in PDFBOX-3923. One of the offsets is incorrect (points within table) so exception is thrown and the trailer is rebuilt. When rebuilding, this piece of code is hit: {code} // info dictionary else if (dictionary.containsKey(COSName.MOD_DATE) && (dictionary.containsKey(COSName.TITLE) || dictionary.containsKey(COSName.AUTHOR) || dictionary.containsKey(COSName.SUBJECT) || dictionary.containsKey(COSName.KEYWORDS) || dictionary.containsKey(COSName.CREATOR) || dictionary.containsKey(COSName.PRODUCER) || dictionary.containsKey(COSName.CREATION_DATE))) { trailer.setItem(COSName.INFO, document.getObjectFromPool(entry.getKey())); } {code} The "&&" was introduced in PDFBOX-3208 ("ModDate is mandatory for an info dictionary"). In file 079977.pdf there is no /Info/ModDate. According to the PDF specification /ModDate is not mandatory. In PDFBOX-3208 the problem was that without the change there, an outline dictionary was used as /Info because it had a /Title. I suggest check for /Parent to decide it's not an /Info. If there are other dictionaries that have items that are found in /Info then we'd have to add that as well. > Lost metadata in 2.0.8-SNAPSHOT > --- > > Key: PDFBOX-3940 > URL: https://issues.apache.org/jira/browse/PDFBOX-3940 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Affects Versions: 2.0.8 >Reporter: Tim Allison > Labels: regression > Attachments: 079977.pdf, 2_0_7_079977.pdf.json, > 2_0_8-SNAPSHOT_079977.pdf.json > > > We noticed some missing metadata values in the recent large scale regression > testing. I finally had a chance to look. It looks like a genuine regression. > The diff btwn 2.0.7 and 2.0.8-SNAPSHOT in metadata values is often -2. > However, in some files, the problem is more pronounced. > In the attached file, when we call {{PDDocument.getDocumentInformation()}}, > the returned {{PDDocumentInformation info}} is empty in 2.0.8-SNAPSHOT but > not in 2.0.7. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3940) Lost metadata in 2.0.8-SNAPSHOT
[ https://issues.apache.org/jira/browse/PDFBOX-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16180319#comment-16180319 ] Tilman Hausherr commented on PDFBOX-3940: - The offset of the info object (1 0 obj) is 10641 in the table but is really at 10493. > Lost metadata in 2.0.8-SNAPSHOT > --- > > Key: PDFBOX-3940 > URL: https://issues.apache.org/jira/browse/PDFBOX-3940 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Affects Versions: 2.0.8 >Reporter: Tim Allison > Labels: regression > Attachments: 079977.pdf, 2_0_7_079977.pdf.json, > 2_0_8-SNAPSHOT_079977.pdf.json > > > We noticed some missing metadata values in the recent large scale regression > testing. I finally had a chance to look. It looks like a genuine regression. > The diff btwn 2.0.7 and 2.0.8-SNAPSHOT in metadata values is often -2. > However, in some files, the problem is more pronounced. > In the attached file, when we call {{PDDocument.getDocumentInformation()}}, > the returned {{PDDocumentInformation info}} is empty in 2.0.8-SNAPSHOT but > not in 2.0.7. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org