[jira] [Commented] (PDFBOX-3940) Lost metadata in 2.0.8-SNAPSHOT

2017-10-10 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16199180#comment-16199180
 ] 

ASF subversion and git services commented on PDFBOX-3940:
-

Commit 1811760 from [~tilman] in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1811760 ]

PDFBOX-3940: /Info dictionary can't have a /A or /Dest item

> Lost metadata in 2.0.8-SNAPSHOT
> ---
>
> Key: PDFBOX-3940
> URL: https://issues.apache.org/jira/browse/PDFBOX-3940
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 2.0.8
>Reporter: Tim Allison
>Assignee: Tilman Hausherr
>  Labels: regression
> Fix For: 2.0.8, 3.0.0
>
> Attachments: 079977.pdf, 2_0_7_079977.pdf.json, 
> 2_0_8-SNAPSHOT_079977.pdf.json, J4S6TTBZEDXOJ77USE3HTUDSAXU2CRR4.pdf
>
>
> We noticed some missing metadata values in the recent large scale regression 
> testing.  I finally had a chance to look.  It looks like a genuine regression.
> The diff btwn 2.0.7 and 2.0.8-SNAPSHOT in metadata values is often -2.  
> However, in some files, the problem is more pronounced.
> In the attached file, when we call {{PDDocument.getDocumentInformation()}}, 
> the returned {{PDDocumentInformation info}} is empty in 2.0.8-SNAPSHOT but 
> not in 2.0.7.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3940) Lost metadata in 2.0.8-SNAPSHOT

2017-10-10 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16199179#comment-16199179
 ] 

ASF subversion and git services commented on PDFBOX-3940:
-

Commit 1811759 from [~tilman] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1811759 ]

PDFBOX-3940: /Info dictionary can't have a /A or /Dest item

> Lost metadata in 2.0.8-SNAPSHOT
> ---
>
> Key: PDFBOX-3940
> URL: https://issues.apache.org/jira/browse/PDFBOX-3940
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 2.0.8
>Reporter: Tim Allison
>Assignee: Tilman Hausherr
>  Labels: regression
> Fix For: 2.0.8, 3.0.0
>
> Attachments: 079977.pdf, 2_0_7_079977.pdf.json, 
> 2_0_8-SNAPSHOT_079977.pdf.json, J4S6TTBZEDXOJ77USE3HTUDSAXU2CRR4.pdf
>
>
> We noticed some missing metadata values in the recent large scale regression 
> testing.  I finally had a chance to look.  It looks like a genuine regression.
> The diff btwn 2.0.7 and 2.0.8-SNAPSHOT in metadata values is often -2.  
> However, in some files, the problem is more pronounced.
> In the attached file, when we call {{PDDocument.getDocumentInformation()}}, 
> the returned {{PDDocumentInformation info}} is empty in 2.0.8-SNAPSHOT but 
> not in 2.0.7.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3940) Lost metadata in 2.0.8-SNAPSHOT

2017-09-27 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16182730#comment-16182730
 ] 

ASF subversion and git services commented on PDFBOX-3940:
-

Commit 1809861 from [~tilman] in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1809861 ]

PDFBOX-3940: add test

> Lost metadata in 2.0.8-SNAPSHOT
> ---
>
> Key: PDFBOX-3940
> URL: https://issues.apache.org/jira/browse/PDFBOX-3940
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 2.0.8
>Reporter: Tim Allison
>Assignee: Tilman Hausherr
>  Labels: regression
> Fix For: 2.0.8, 3.0.0
>
> Attachments: 079977.pdf, 2_0_7_079977.pdf.json, 
> 2_0_8-SNAPSHOT_079977.pdf.json
>
>
> We noticed some missing metadata values in the recent large scale regression 
> testing.  I finally had a chance to look.  It looks like a genuine regression.
> The diff btwn 2.0.7 and 2.0.8-SNAPSHOT in metadata values is often -2.  
> However, in some files, the problem is more pronounced.
> In the attached file, when we call {{PDDocument.getDocumentInformation()}}, 
> the returned {{PDDocumentInformation info}} is empty in 2.0.8-SNAPSHOT but 
> not in 2.0.7.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3940) Lost metadata in 2.0.8-SNAPSHOT

2017-09-27 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16182729#comment-16182729
 ] 

ASF subversion and git services commented on PDFBOX-3940:
-

Commit 1809860 from [~tilman] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1809860 ]

PDFBOX-3940: add test

> Lost metadata in 2.0.8-SNAPSHOT
> ---
>
> Key: PDFBOX-3940
> URL: https://issues.apache.org/jira/browse/PDFBOX-3940
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 2.0.8
>Reporter: Tim Allison
>Assignee: Tilman Hausherr
>  Labels: regression
> Fix For: 2.0.8, 3.0.0
>
> Attachments: 079977.pdf, 2_0_7_079977.pdf.json, 
> 2_0_8-SNAPSHOT_079977.pdf.json
>
>
> We noticed some missing metadata values in the recent large scale regression 
> testing.  I finally had a chance to look.  It looks like a genuine regression.
> The diff btwn 2.0.7 and 2.0.8-SNAPSHOT in metadata values is often -2.  
> However, in some files, the problem is more pronounced.
> In the attached file, when we call {{PDDocument.getDocumentInformation()}}, 
> the returned {{PDDocumentInformation info}} is empty in 2.0.8-SNAPSHOT but 
> not in 2.0.7.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3940) Lost metadata in 2.0.8-SNAPSHOT

2017-09-26 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16180953#comment-16180953
 ] 

ASF subversion and git services commented on PDFBOX-3940:
-

Commit 1809755 from [~tilman] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1809755 ]

PDFBOX-3940: /Info dictionary can't have a /Parent item, and /ModDate is not 
mandatory

> Lost metadata in 2.0.8-SNAPSHOT
> ---
>
> Key: PDFBOX-3940
> URL: https://issues.apache.org/jira/browse/PDFBOX-3940
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 2.0.8
>Reporter: Tim Allison
>  Labels: regression
> Attachments: 079977.pdf, 2_0_7_079977.pdf.json, 
> 2_0_8-SNAPSHOT_079977.pdf.json
>
>
> We noticed some missing metadata values in the recent large scale regression 
> testing.  I finally had a chance to look.  It looks like a genuine regression.
> The diff btwn 2.0.7 and 2.0.8-SNAPSHOT in metadata values is often -2.  
> However, in some files, the problem is more pronounced.
> In the attached file, when we call {{PDDocument.getDocumentInformation()}}, 
> the returned {{PDDocumentInformation info}} is empty in 2.0.8-SNAPSHOT but 
> not in 2.0.7.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3940) Lost metadata in 2.0.8-SNAPSHOT

2017-09-26 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16180954#comment-16180954
 ] 

ASF subversion and git services commented on PDFBOX-3940:
-

Commit 1809756 from [~tilman] in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1809756 ]

PDFBOX-3940: /Info dictionary can't have a /Parent item, and /ModDate is not 
mandatory

> Lost metadata in 2.0.8-SNAPSHOT
> ---
>
> Key: PDFBOX-3940
> URL: https://issues.apache.org/jira/browse/PDFBOX-3940
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 2.0.8
>Reporter: Tim Allison
>  Labels: regression
> Attachments: 079977.pdf, 2_0_7_079977.pdf.json, 
> 2_0_8-SNAPSHOT_079977.pdf.json
>
>
> We noticed some missing metadata values in the recent large scale regression 
> testing.  I finally had a chance to look.  It looks like a genuine regression.
> The diff btwn 2.0.7 and 2.0.8-SNAPSHOT in metadata values is often -2.  
> However, in some files, the problem is more pronounced.
> In the attached file, when we call {{PDDocument.getDocumentInformation()}}, 
> the returned {{PDDocumentInformation info}} is empty in 2.0.8-SNAPSHOT but 
> not in 2.0.7.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3940) Lost metadata in 2.0.8-SNAPSHOT

2017-09-26 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16180927#comment-16180927
 ] 

Tilman Hausherr commented on PDFBOX-3940:
-

This regression first occured because of r187622 in PDFBOX-3923. One of the 
offsets is incorrect (points within table) so exception is thrown and the 
trailer is rebuilt. When rebuilding, this piece of code is hit:
{code}
// info dictionary
else if (dictionary.containsKey(COSName.MOD_DATE)
&& (dictionary.containsKey(COSName.TITLE)
|| dictionary.containsKey(COSName.AUTHOR)
|| dictionary.containsKey(COSName.SUBJECT)
|| dictionary.containsKey(COSName.KEYWORDS)
|| dictionary.containsKey(COSName.CREATOR)
|| dictionary.containsKey(COSName.PRODUCER)
|| dictionary.containsKey(COSName.CREATION_DATE)))
{
trailer.setItem(COSName.INFO, document.getObjectFromPool(entry.getKey()));
}
{code}
The "&&" was introduced in PDFBOX-3208 ("ModDate is mandatory for an info 
dictionary"). In file 079977.pdf there is no /Info/ModDate. According to the 
PDF specification /ModDate is not mandatory.

In PDFBOX-3208 the problem was that without the change there, an outline 
dictionary was used as /Info because it had a /Title. I suggest check for 
/Parent to decide it's not an /Info. If there are other dictionaries that have 
items that are found in /Info then we'd have to add that as well.

> Lost metadata in 2.0.8-SNAPSHOT
> ---
>
> Key: PDFBOX-3940
> URL: https://issues.apache.org/jira/browse/PDFBOX-3940
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 2.0.8
>Reporter: Tim Allison
>  Labels: regression
> Attachments: 079977.pdf, 2_0_7_079977.pdf.json, 
> 2_0_8-SNAPSHOT_079977.pdf.json
>
>
> We noticed some missing metadata values in the recent large scale regression 
> testing.  I finally had a chance to look.  It looks like a genuine regression.
> The diff btwn 2.0.7 and 2.0.8-SNAPSHOT in metadata values is often -2.  
> However, in some files, the problem is more pronounced.
> In the attached file, when we call {{PDDocument.getDocumentInformation()}}, 
> the returned {{PDDocumentInformation info}} is empty in 2.0.8-SNAPSHOT but 
> not in 2.0.7.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3940) Lost metadata in 2.0.8-SNAPSHOT

2017-09-25 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16180319#comment-16180319
 ] 

Tilman Hausherr commented on PDFBOX-3940:
-

The offset of the info object (1 0 obj) is 10641 in the table but is really at 
10493.

> Lost metadata in 2.0.8-SNAPSHOT
> ---
>
> Key: PDFBOX-3940
> URL: https://issues.apache.org/jira/browse/PDFBOX-3940
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 2.0.8
>Reporter: Tim Allison
>  Labels: regression
> Attachments: 079977.pdf, 2_0_7_079977.pdf.json, 
> 2_0_8-SNAPSHOT_079977.pdf.json
>
>
> We noticed some missing metadata values in the recent large scale regression 
> testing.  I finally had a chance to look.  It looks like a genuine regression.
> The diff btwn 2.0.7 and 2.0.8-SNAPSHOT in metadata values is often -2.  
> However, in some files, the problem is more pronounced.
> In the attached file, when we call {{PDDocument.getDocumentInformation()}}, 
> the returned {{PDDocumentInformation info}} is empty in 2.0.8-SNAPSHOT but 
> not in 2.0.7.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org