[jira] [Commented] (PDFBOX-4407) ParentTree Objects do not match KArray objects after merge

2018-12-29 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16730750#comment-16730750
 ] 

ASF subversion and git services commented on PDFBOX-4407:
-

Commit 1849929 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1849929 ]

PDFBOX-4417, PDFBOX-4407: check for clone-cloning no longer needed here because 
it is done in the cloner

> ParentTree Objects do not match KArray objects after merge
> --
>
> Key: PDFBOX-4407
> URL: https://issues.apache.org/jira/browse/PDFBOX-4407
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 2.0.13
>Reporter: Dan Anderson
>Assignee: Tilman Hausherr
>Priority: Major
>  Labels: StructureTree
> Fix For: 2.0.14, 3.0.0 PDFBox
>
> Attachments: 4407.patch, reading-order-merged-bad.pdf, 
> reading-order-merged-good.pdf, reading-order.pdf
>
>
> After merging tagged documents together, the second page of the resulting 
> document is no longer valid.  When the field objects are cloned in 
> PDFMergerUtility, the new and old objects are stored in a map named 
> objMapping.  This is used to replace the old references with the new 
> references for the acroform, k array, and annotation list.  However the 
> ParentTree is not updated to this new object reference.  This results in the 
> K Array and the Parent Tree having different references to the same object.  
> This causes issues when using an a11y reader like Jaws, and also causes 
> problems displaying the tags in Adobe DC.
> Here is a failing unit test that was created in PDFMergerUtilityTest to 
> demonstrate the issue.  It was created using an example from W3: 
> https://www.w3.org/WAI/WCAG20/Techniques/working-examples/PDF3/reading-order.pdf
> {code:java}
> public void testStructureTreeMerge3() throws IOException
> {
> PDFMergerUtility pdfMergerUtility = new PDFMergerUtility();
> PDDocument src = PDDocument.load(new File(SRCDIR, "reading-order.pdf"));
> PDDocument dst = PDDocument.load(new File(SRCDIR, "reading-order.pdf"));
> pdfMergerUtility.appendDocument(dst, src);
> src.close();
> dst.save(new File(TARGETTESTDIR, "reading-order-merged.pdf"));
> dst.close();
> PDDocument doc = PDDocument.load(new File(TARGETTESTDIR, 
> "reading-order-merged.pdf"));
> 
> assertTrue(checkAnnotationMatches(doc.getDocumentCatalog().getStructureTreeRoot().getKArray(),
>  doc.getDocumentCatalog().getAcroForm().getFields(), 
> (COSArray)doc.getDocumentCatalog().getStructureTreeRoot().getParentTree().getCOSObject().getDictionaryObject(COSName.NUMS)));
> }
> private boolean checkAnnotationMatches(COSArray kArray, List 
> acroformFields, COSArray numbersArray) {
> for (int i = 0; i < kArray.size(); i++) {
> COSBase entry = kArray.get(i);
> if (entry instanceof COSArray){
> COSArray entryAsArray = (COSArray) entry;
> if (!checkAnnotationMatches(entryAsArray, acroformFields, 
> numbersArray)) {
> return false;
> }
> } else if (entry instanceof COSInteger) {
> //do nothing, just need to screen these out so next line doesn't 
> blow up
> } else if (((COSObject) entry).getObject() instanceof COSDictionary){
> COSDictionary entryDictionary = (COSDictionary)((COSObject) 
> entry).getObject();
> if (entryDictionary.getItem(COSName.K) != null) {
> COSBase kids = entryDictionary.getItem(COSName.K);
> if (kids != null) {
> if (kids instanceof COSInteger) {
> //do nothing, don't care about marked content tags
> } else if (kids instanceof COSDictionary) {
> COSDictionary kidsAsDictionary = (COSDictionary) kids;
> if 
> (!checkForMatches(kidsAsDictionary.getDictionaryObject(COSName.OBJ), 
> acroformFields, numbersArray)) {
> return false;
> }
> } else if (kids instanceof COSArray) {
> COSArray kidsAsArray = (COSArray) kids;
> if (!checkAnnotationMatches(kidsAsArray, 
> acroformFields, numbersArray)) {
> return false;
> }
> }
> }
> } else if (entryDictionary.getDictionaryObject(COSName.OBJ) != 
> null) {
> if 
> (!checkForMatches(entryDictionary.getDictionaryObject(COSName.OBJ), 
> acroformFields, numbersArray)) {
> return false;
> }
> }
> }
> }
> return true;
> }
> private boolean checkForMatches(C

[jira] [Commented] (PDFBOX-4407) ParentTree Objects do not match KArray objects after merge

2018-12-29 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16730752#comment-16730752
 ] 

ASF subversion and git services commented on PDFBOX-4407:
-

Commit 1849930 from Tilman Hausherr in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1849930 ]

PDFBOX-4417, PDFBOX-4407: check for clone-cloning no longer needed here because 
it is done in the cloner

> ParentTree Objects do not match KArray objects after merge
> --
>
> Key: PDFBOX-4407
> URL: https://issues.apache.org/jira/browse/PDFBOX-4407
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 2.0.13
>Reporter: Dan Anderson
>Assignee: Tilman Hausherr
>Priority: Major
>  Labels: StructureTree
> Fix For: 2.0.14, 3.0.0 PDFBox
>
> Attachments: 4407.patch, reading-order-merged-bad.pdf, 
> reading-order-merged-good.pdf, reading-order.pdf
>
>
> After merging tagged documents together, the second page of the resulting 
> document is no longer valid.  When the field objects are cloned in 
> PDFMergerUtility, the new and old objects are stored in a map named 
> objMapping.  This is used to replace the old references with the new 
> references for the acroform, k array, and annotation list.  However the 
> ParentTree is not updated to this new object reference.  This results in the 
> K Array and the Parent Tree having different references to the same object.  
> This causes issues when using an a11y reader like Jaws, and also causes 
> problems displaying the tags in Adobe DC.
> Here is a failing unit test that was created in PDFMergerUtilityTest to 
> demonstrate the issue.  It was created using an example from W3: 
> https://www.w3.org/WAI/WCAG20/Techniques/working-examples/PDF3/reading-order.pdf
> {code:java}
> public void testStructureTreeMerge3() throws IOException
> {
> PDFMergerUtility pdfMergerUtility = new PDFMergerUtility();
> PDDocument src = PDDocument.load(new File(SRCDIR, "reading-order.pdf"));
> PDDocument dst = PDDocument.load(new File(SRCDIR, "reading-order.pdf"));
> pdfMergerUtility.appendDocument(dst, src);
> src.close();
> dst.save(new File(TARGETTESTDIR, "reading-order-merged.pdf"));
> dst.close();
> PDDocument doc = PDDocument.load(new File(TARGETTESTDIR, 
> "reading-order-merged.pdf"));
> 
> assertTrue(checkAnnotationMatches(doc.getDocumentCatalog().getStructureTreeRoot().getKArray(),
>  doc.getDocumentCatalog().getAcroForm().getFields(), 
> (COSArray)doc.getDocumentCatalog().getStructureTreeRoot().getParentTree().getCOSObject().getDictionaryObject(COSName.NUMS)));
> }
> private boolean checkAnnotationMatches(COSArray kArray, List 
> acroformFields, COSArray numbersArray) {
> for (int i = 0; i < kArray.size(); i++) {
> COSBase entry = kArray.get(i);
> if (entry instanceof COSArray){
> COSArray entryAsArray = (COSArray) entry;
> if (!checkAnnotationMatches(entryAsArray, acroformFields, 
> numbersArray)) {
> return false;
> }
> } else if (entry instanceof COSInteger) {
> //do nothing, just need to screen these out so next line doesn't 
> blow up
> } else if (((COSObject) entry).getObject() instanceof COSDictionary){
> COSDictionary entryDictionary = (COSDictionary)((COSObject) 
> entry).getObject();
> if (entryDictionary.getItem(COSName.K) != null) {
> COSBase kids = entryDictionary.getItem(COSName.K);
> if (kids != null) {
> if (kids instanceof COSInteger) {
> //do nothing, don't care about marked content tags
> } else if (kids instanceof COSDictionary) {
> COSDictionary kidsAsDictionary = (COSDictionary) kids;
> if 
> (!checkForMatches(kidsAsDictionary.getDictionaryObject(COSName.OBJ), 
> acroformFields, numbersArray)) {
> return false;
> }
> } else if (kids instanceof COSArray) {
> COSArray kidsAsArray = (COSArray) kids;
> if (!checkAnnotationMatches(kidsAsArray, 
> acroformFields, numbersArray)) {
> return false;
> }
> }
> }
> } else if (entryDictionary.getDictionaryObject(COSName.OBJ) != 
> null) {
> if 
> (!checkForMatches(entryDictionary.getDictionaryObject(COSName.OBJ), 
> acroformFields, numbersArray)) {
> return false;
> }
> }
> }
> }
> return true;
> }
> private boolean checkForMa

[jira] [Commented] (PDFBOX-4407) ParentTree Objects do not match KArray objects after merge

2018-12-21 Thread Tilman Hausherr (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16726742#comment-16726742
 ] 

Tilman Hausherr commented on PDFBOX-4407:
-

Yes, I guess so, that is the usual rhythm. The last release was earlier this 
month.

> ParentTree Objects do not match KArray objects after merge
> --
>
> Key: PDFBOX-4407
> URL: https://issues.apache.org/jira/browse/PDFBOX-4407
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 2.0.13
>Reporter: Dan Anderson
>Assignee: Tilman Hausherr
>Priority: Major
>  Labels: StructureTree
> Fix For: 2.0.14, 3.0.0 PDFBox
>
> Attachments: 4407.patch, reading-order-merged-bad.pdf, 
> reading-order-merged-good.pdf, reading-order.pdf
>
>
> After merging tagged documents together, the second page of the resulting 
> document is no longer valid.  When the field objects are cloned in 
> PDFMergerUtility, the new and old objects are stored in a map named 
> objMapping.  This is used to replace the old references with the new 
> references for the acroform, k array, and annotation list.  However the 
> ParentTree is not updated to this new object reference.  This results in the 
> K Array and the Parent Tree having different references to the same object.  
> This causes issues when using an a11y reader like Jaws, and also causes 
> problems displaying the tags in Adobe DC.
> Here is a failing unit test that was created in PDFMergerUtilityTest to 
> demonstrate the issue.  It was created using an example from W3: 
> https://www.w3.org/WAI/WCAG20/Techniques/working-examples/PDF3/reading-order.pdf
> {code:java}
> public void testStructureTreeMerge3() throws IOException
> {
> PDFMergerUtility pdfMergerUtility = new PDFMergerUtility();
> PDDocument src = PDDocument.load(new File(SRCDIR, "reading-order.pdf"));
> PDDocument dst = PDDocument.load(new File(SRCDIR, "reading-order.pdf"));
> pdfMergerUtility.appendDocument(dst, src);
> src.close();
> dst.save(new File(TARGETTESTDIR, "reading-order-merged.pdf"));
> dst.close();
> PDDocument doc = PDDocument.load(new File(TARGETTESTDIR, 
> "reading-order-merged.pdf"));
> 
> assertTrue(checkAnnotationMatches(doc.getDocumentCatalog().getStructureTreeRoot().getKArray(),
>  doc.getDocumentCatalog().getAcroForm().getFields(), 
> (COSArray)doc.getDocumentCatalog().getStructureTreeRoot().getParentTree().getCOSObject().getDictionaryObject(COSName.NUMS)));
> }
> private boolean checkAnnotationMatches(COSArray kArray, List 
> acroformFields, COSArray numbersArray) {
> for (int i = 0; i < kArray.size(); i++) {
> COSBase entry = kArray.get(i);
> if (entry instanceof COSArray){
> COSArray entryAsArray = (COSArray) entry;
> if (!checkAnnotationMatches(entryAsArray, acroformFields, 
> numbersArray)) {
> return false;
> }
> } else if (entry instanceof COSInteger) {
> //do nothing, just need to screen these out so next line doesn't 
> blow up
> } else if (((COSObject) entry).getObject() instanceof COSDictionary){
> COSDictionary entryDictionary = (COSDictionary)((COSObject) 
> entry).getObject();
> if (entryDictionary.getItem(COSName.K) != null) {
> COSBase kids = entryDictionary.getItem(COSName.K);
> if (kids != null) {
> if (kids instanceof COSInteger) {
> //do nothing, don't care about marked content tags
> } else if (kids instanceof COSDictionary) {
> COSDictionary kidsAsDictionary = (COSDictionary) kids;
> if 
> (!checkForMatches(kidsAsDictionary.getDictionaryObject(COSName.OBJ), 
> acroformFields, numbersArray)) {
> return false;
> }
> } else if (kids instanceof COSArray) {
> COSArray kidsAsArray = (COSArray) kids;
> if (!checkAnnotationMatches(kidsAsArray, 
> acroformFields, numbersArray)) {
> return false;
> }
> }
> }
> } else if (entryDictionary.getDictionaryObject(COSName.OBJ) != 
> null) {
> if 
> (!checkForMatches(entryDictionary.getDictionaryObject(COSName.OBJ), 
> acroformFields, numbersArray)) {
> return false;
> }
> }
> }
> }
> return true;
> }
> private boolean checkForMatches(COSBase objectReference, List 
> acroformFields, COSArray numbersArray) {
> boolean result = false;
> for (PDField field : acroformFields) {
>  

[jira] [Commented] (PDFBOX-4407) ParentTree Objects do not match KArray objects after merge

2018-12-21 Thread Dan Anderson (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16726737#comment-16726737
 ] 

Dan Anderson commented on PDFBOX-4407:
--

>From your comment on another issue, it looks like 2.0.14 release will probably 
>be in about 3 months?

> ParentTree Objects do not match KArray objects after merge
> --
>
> Key: PDFBOX-4407
> URL: https://issues.apache.org/jira/browse/PDFBOX-4407
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 2.0.13
>Reporter: Dan Anderson
>Assignee: Tilman Hausherr
>Priority: Major
>  Labels: StructureTree
> Fix For: 2.0.14, 3.0.0 PDFBox
>
> Attachments: 4407.patch, reading-order-merged-bad.pdf, 
> reading-order-merged-good.pdf, reading-order.pdf
>
>
> After merging tagged documents together, the second page of the resulting 
> document is no longer valid.  When the field objects are cloned in 
> PDFMergerUtility, the new and old objects are stored in a map named 
> objMapping.  This is used to replace the old references with the new 
> references for the acroform, k array, and annotation list.  However the 
> ParentTree is not updated to this new object reference.  This results in the 
> K Array and the Parent Tree having different references to the same object.  
> This causes issues when using an a11y reader like Jaws, and also causes 
> problems displaying the tags in Adobe DC.
> Here is a failing unit test that was created in PDFMergerUtilityTest to 
> demonstrate the issue.  It was created using an example from W3: 
> https://www.w3.org/WAI/WCAG20/Techniques/working-examples/PDF3/reading-order.pdf
> {code:java}
> public void testStructureTreeMerge3() throws IOException
> {
> PDFMergerUtility pdfMergerUtility = new PDFMergerUtility();
> PDDocument src = PDDocument.load(new File(SRCDIR, "reading-order.pdf"));
> PDDocument dst = PDDocument.load(new File(SRCDIR, "reading-order.pdf"));
> pdfMergerUtility.appendDocument(dst, src);
> src.close();
> dst.save(new File(TARGETTESTDIR, "reading-order-merged.pdf"));
> dst.close();
> PDDocument doc = PDDocument.load(new File(TARGETTESTDIR, 
> "reading-order-merged.pdf"));
> 
> assertTrue(checkAnnotationMatches(doc.getDocumentCatalog().getStructureTreeRoot().getKArray(),
>  doc.getDocumentCatalog().getAcroForm().getFields(), 
> (COSArray)doc.getDocumentCatalog().getStructureTreeRoot().getParentTree().getCOSObject().getDictionaryObject(COSName.NUMS)));
> }
> private boolean checkAnnotationMatches(COSArray kArray, List 
> acroformFields, COSArray numbersArray) {
> for (int i = 0; i < kArray.size(); i++) {
> COSBase entry = kArray.get(i);
> if (entry instanceof COSArray){
> COSArray entryAsArray = (COSArray) entry;
> if (!checkAnnotationMatches(entryAsArray, acroformFields, 
> numbersArray)) {
> return false;
> }
> } else if (entry instanceof COSInteger) {
> //do nothing, just need to screen these out so next line doesn't 
> blow up
> } else if (((COSObject) entry).getObject() instanceof COSDictionary){
> COSDictionary entryDictionary = (COSDictionary)((COSObject) 
> entry).getObject();
> if (entryDictionary.getItem(COSName.K) != null) {
> COSBase kids = entryDictionary.getItem(COSName.K);
> if (kids != null) {
> if (kids instanceof COSInteger) {
> //do nothing, don't care about marked content tags
> } else if (kids instanceof COSDictionary) {
> COSDictionary kidsAsDictionary = (COSDictionary) kids;
> if 
> (!checkForMatches(kidsAsDictionary.getDictionaryObject(COSName.OBJ), 
> acroformFields, numbersArray)) {
> return false;
> }
> } else if (kids instanceof COSArray) {
> COSArray kidsAsArray = (COSArray) kids;
> if (!checkAnnotationMatches(kidsAsArray, 
> acroformFields, numbersArray)) {
> return false;
> }
> }
> }
> } else if (entryDictionary.getDictionaryObject(COSName.OBJ) != 
> null) {
> if 
> (!checkForMatches(entryDictionary.getDictionaryObject(COSName.OBJ), 
> acroformFields, numbersArray)) {
> return false;
> }
> }
> }
> }
> return true;
> }
> private boolean checkForMatches(COSBase objectReference, List 
> acroformFields, COSArray numbersArray) {
> boolean result = false;
> for (PDField field : acroform

[jira] [Commented] (PDFBOX-4407) ParentTree Objects do not match KArray objects after merge

2018-12-21 Thread Dan Anderson (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16726720#comment-16726720
 ] 

Dan Anderson commented on PDFBOX-4407:
--

That works!  JAWS was able to read the appended document, and the reading order 
looked correct in Adobe.  Thank you!

> ParentTree Objects do not match KArray objects after merge
> --
>
> Key: PDFBOX-4407
> URL: https://issues.apache.org/jira/browse/PDFBOX-4407
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 2.0.13
>Reporter: Dan Anderson
>Assignee: Tilman Hausherr
>Priority: Major
>  Labels: StructureTree
> Fix For: 2.0.14, 3.0.0 PDFBox
>
> Attachments: 4407.patch, reading-order-merged-bad.pdf, 
> reading-order-merged-good.pdf, reading-order.pdf
>
>
> After merging tagged documents together, the second page of the resulting 
> document is no longer valid.  When the field objects are cloned in 
> PDFMergerUtility, the new and old objects are stored in a map named 
> objMapping.  This is used to replace the old references with the new 
> references for the acroform, k array, and annotation list.  However the 
> ParentTree is not updated to this new object reference.  This results in the 
> K Array and the Parent Tree having different references to the same object.  
> This causes issues when using an a11y reader like Jaws, and also causes 
> problems displaying the tags in Adobe DC.
> Here is a failing unit test that was created in PDFMergerUtilityTest to 
> demonstrate the issue.  It was created using an example from W3: 
> https://www.w3.org/WAI/WCAG20/Techniques/working-examples/PDF3/reading-order.pdf
> {code:java}
> public void testStructureTreeMerge3() throws IOException
> {
> PDFMergerUtility pdfMergerUtility = new PDFMergerUtility();
> PDDocument src = PDDocument.load(new File(SRCDIR, "reading-order.pdf"));
> PDDocument dst = PDDocument.load(new File(SRCDIR, "reading-order.pdf"));
> pdfMergerUtility.appendDocument(dst, src);
> src.close();
> dst.save(new File(TARGETTESTDIR, "reading-order-merged.pdf"));
> dst.close();
> PDDocument doc = PDDocument.load(new File(TARGETTESTDIR, 
> "reading-order-merged.pdf"));
> 
> assertTrue(checkAnnotationMatches(doc.getDocumentCatalog().getStructureTreeRoot().getKArray(),
>  doc.getDocumentCatalog().getAcroForm().getFields(), 
> (COSArray)doc.getDocumentCatalog().getStructureTreeRoot().getParentTree().getCOSObject().getDictionaryObject(COSName.NUMS)));
> }
> private boolean checkAnnotationMatches(COSArray kArray, List 
> acroformFields, COSArray numbersArray) {
> for (int i = 0; i < kArray.size(); i++) {
> COSBase entry = kArray.get(i);
> if (entry instanceof COSArray){
> COSArray entryAsArray = (COSArray) entry;
> if (!checkAnnotationMatches(entryAsArray, acroformFields, 
> numbersArray)) {
> return false;
> }
> } else if (entry instanceof COSInteger) {
> //do nothing, just need to screen these out so next line doesn't 
> blow up
> } else if (((COSObject) entry).getObject() instanceof COSDictionary){
> COSDictionary entryDictionary = (COSDictionary)((COSObject) 
> entry).getObject();
> if (entryDictionary.getItem(COSName.K) != null) {
> COSBase kids = entryDictionary.getItem(COSName.K);
> if (kids != null) {
> if (kids instanceof COSInteger) {
> //do nothing, don't care about marked content tags
> } else if (kids instanceof COSDictionary) {
> COSDictionary kidsAsDictionary = (COSDictionary) kids;
> if 
> (!checkForMatches(kidsAsDictionary.getDictionaryObject(COSName.OBJ), 
> acroformFields, numbersArray)) {
> return false;
> }
> } else if (kids instanceof COSArray) {
> COSArray kidsAsArray = (COSArray) kids;
> if (!checkAnnotationMatches(kidsAsArray, 
> acroformFields, numbersArray)) {
> return false;
> }
> }
> }
> } else if (entryDictionary.getDictionaryObject(COSName.OBJ) != 
> null) {
> if 
> (!checkForMatches(entryDictionary.getDictionaryObject(COSName.OBJ), 
> acroformFields, numbersArray)) {
> return false;
> }
> }
> }
> }
> return true;
> }
> private boolean checkForMatches(COSBase objectReference, List 
> acroformFields, COSArray numbersArray) {
> boolean result = false;
> for (PDField fi

[jira] [Commented] (PDFBOX-4407) ParentTree Objects do not match KArray objects after merge

2018-12-21 Thread Tilman Hausherr (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16726516#comment-16726516
 ] 

Tilman Hausherr commented on PDFBOX-4407:
-

[~tschusssl] it passes your test, however it also passes your test when undoing 
the fix. Could you please test whether the file I just attached works with your 
screen reader?

> ParentTree Objects do not match KArray objects after merge
> --
>
> Key: PDFBOX-4407
> URL: https://issues.apache.org/jira/browse/PDFBOX-4407
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 3.0.0 PDFBox
>Reporter: Dan Anderson
>Priority: Major
>  Labels: StructureTree
> Attachments: 4407.patch, reading-order-merged-bad.pdf, 
> reading-order-merged-good.pdf, reading-order.pdf
>
>
> After merging tagged documents together, the second page of the resulting 
> document is no longer valid.  When the field objects are cloned in 
> PDFMergerUtility, the new and old objects are stored in a map named 
> objMapping.  This is used to replace the old references with the new 
> references for the acroform, k array, and annotation list.  However the 
> ParentTree is not updated to this new object reference.  This results in the 
> K Array and the Parent Tree having different references to the same object.  
> This causes issues when using an a11y reader like Jaws, and also causes 
> problems displaying the tags in Adobe DC.
> Here is a failing unit test that was created in PDFMergerUtilityTest to 
> demonstrate the issue.  It was created using an example from W3: 
> https://www.w3.org/WAI/WCAG20/Techniques/working-examples/PDF3/reading-order.pdf
> {code:java}
> public void testStructureTreeMerge3() throws IOException
> {
> PDFMergerUtility pdfMergerUtility = new PDFMergerUtility();
> PDDocument src = PDDocument.load(new File(SRCDIR, "reading-order.pdf"));
> PDDocument dst = PDDocument.load(new File(SRCDIR, "reading-order.pdf"));
> pdfMergerUtility.appendDocument(dst, src);
> src.close();
> dst.save(new File(TARGETTESTDIR, "reading-order-merged.pdf"));
> dst.close();
> PDDocument doc = PDDocument.load(new File(TARGETTESTDIR, 
> "reading-order-merged.pdf"));
> 
> assertTrue(checkAnnotationMatches(doc.getDocumentCatalog().getStructureTreeRoot().getKArray(),
>  doc.getDocumentCatalog().getAcroForm().getFields(), 
> (COSArray)doc.getDocumentCatalog().getStructureTreeRoot().getParentTree().getCOSObject().getDictionaryObject(COSName.NUMS)));
> }
> private boolean checkAnnotationMatches(COSArray kArray, List 
> acroformFields, COSArray numbersArray) {
> for (int i = 0; i < kArray.size(); i++) {
> COSBase entry = kArray.get(i);
> if (entry instanceof COSArray){
> COSArray entryAsArray = (COSArray) entry;
> if (!checkAnnotationMatches(entryAsArray, acroformFields, 
> numbersArray)) {
> return false;
> }
> } else if (entry instanceof COSInteger) {
> //do nothing, just need to screen these out so next line doesn't 
> blow up
> } else if (((COSObject) entry).getObject() instanceof COSDictionary){
> COSDictionary entryDictionary = (COSDictionary)((COSObject) 
> entry).getObject();
> if (entryDictionary.getItem(COSName.K) != null) {
> COSBase kids = entryDictionary.getItem(COSName.K);
> if (kids != null) {
> if (kids instanceof COSInteger) {
> //do nothing, don't care about marked content tags
> } else if (kids instanceof COSDictionary) {
> COSDictionary kidsAsDictionary = (COSDictionary) kids;
> if 
> (!checkForMatches(kidsAsDictionary.getDictionaryObject(COSName.OBJ), 
> acroformFields, numbersArray)) {
> return false;
> }
> } else if (kids instanceof COSArray) {
> COSArray kidsAsArray = (COSArray) kids;
> if (!checkAnnotationMatches(kidsAsArray, 
> acroformFields, numbersArray)) {
> return false;
> }
> }
> }
> } else if (entryDictionary.getDictionaryObject(COSName.OBJ) != 
> null) {
> if 
> (!checkForMatches(entryDictionary.getDictionaryObject(COSName.OBJ), 
> acroformFields, numbersArray)) {
> return false;
> }
> }
> }
> }
> return true;
> }
> private boolean checkForMatches(COSBase objectReference, List 
> acroformFields, COSArray numbersArray) {
> boolean result = false;
> for (PDField field : acroformFields) {
> if (field.get

[jira] [Commented] (PDFBOX-4407) ParentTree Objects do not match KArray objects after merge

2018-12-20 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16726439#comment-16726439
 ] 

ASF subversion and git services commented on PDFBOX-4407:
-

Commit 1849451 from Tilman Hausherr in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1849451 ]

PDFBOX-4407: more detailed log message

> ParentTree Objects do not match KArray objects after merge
> --
>
> Key: PDFBOX-4407
> URL: https://issues.apache.org/jira/browse/PDFBOX-4407
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 3.0.0 PDFBox
>Reporter: Dan Anderson
>Priority: Major
>  Labels: StructureTree
> Attachments: 4407.patch, reading-order-merged-bad.pdf, 
> reading-order.pdf
>
>
> After merging tagged documents together, the second page of the resulting 
> document is no longer valid.  When the field objects are cloned in 
> PDFMergerUtility, the new and old objects are stored in a map named 
> objMapping.  This is used to replace the old references with the new 
> references for the acroform, k array, and annotation list.  However the 
> ParentTree is not updated to this new object reference.  This results in the 
> K Array and the Parent Tree having different references to the same object.  
> This causes issues when using an a11y reader like Jaws, and also causes 
> problems displaying the tags in Adobe DC.
> Here is a failing unit test that was created in PDFMergerUtilityTest to 
> demonstrate the issue.  It was created using an example from W3: 
> https://www.w3.org/WAI/WCAG20/Techniques/working-examples/PDF3/reading-order.pdf
> {code:java}
> public void testStructureTreeMerge3() throws IOException
> {
> PDFMergerUtility pdfMergerUtility = new PDFMergerUtility();
> PDDocument src = PDDocument.load(new File(SRCDIR, "reading-order.pdf"));
> PDDocument dst = PDDocument.load(new File(SRCDIR, "reading-order.pdf"));
> pdfMergerUtility.appendDocument(dst, src);
> src.close();
> dst.save(new File(TARGETTESTDIR, "reading-order-merged.pdf"));
> dst.close();
> PDDocument doc = PDDocument.load(new File(TARGETTESTDIR, 
> "reading-order-merged.pdf"));
> 
> assertTrue(checkAnnotationMatches(doc.getDocumentCatalog().getStructureTreeRoot().getKArray(),
>  doc.getDocumentCatalog().getAcroForm().getFields(), 
> (COSArray)doc.getDocumentCatalog().getStructureTreeRoot().getParentTree().getCOSObject().getDictionaryObject(COSName.NUMS)));
> }
> private boolean checkAnnotationMatches(COSArray kArray, List 
> acroformFields, COSArray numbersArray) {
> for (int i = 0; i < kArray.size(); i++) {
> COSBase entry = kArray.get(i);
> if (entry instanceof COSArray){
> COSArray entryAsArray = (COSArray) entry;
> if (!checkAnnotationMatches(entryAsArray, acroformFields, 
> numbersArray)) {
> return false;
> }
> } else if (entry instanceof COSInteger) {
> //do nothing, just need to screen these out so next line doesn't 
> blow up
> } else if (((COSObject) entry).getObject() instanceof COSDictionary){
> COSDictionary entryDictionary = (COSDictionary)((COSObject) 
> entry).getObject();
> if (entryDictionary.getItem(COSName.K) != null) {
> COSBase kids = entryDictionary.getItem(COSName.K);
> if (kids != null) {
> if (kids instanceof COSInteger) {
> //do nothing, don't care about marked content tags
> } else if (kids instanceof COSDictionary) {
> COSDictionary kidsAsDictionary = (COSDictionary) kids;
> if 
> (!checkForMatches(kidsAsDictionary.getDictionaryObject(COSName.OBJ), 
> acroformFields, numbersArray)) {
> return false;
> }
> } else if (kids instanceof COSArray) {
> COSArray kidsAsArray = (COSArray) kids;
> if (!checkAnnotationMatches(kidsAsArray, 
> acroformFields, numbersArray)) {
> return false;
> }
> }
> }
> } else if (entryDictionary.getDictionaryObject(COSName.OBJ) != 
> null) {
> if 
> (!checkForMatches(entryDictionary.getDictionaryObject(COSName.OBJ), 
> acroformFields, numbersArray)) {
> return false;
> }
> }
> }
> }
> return true;
> }
> private boolean checkForMatches(COSBase objectReference, List 
> acroformFields, COSArray numbersArray) {
> boolean result = false;
> for (PDField field : acroformFields) {
> if (field.getCOSObject() == objectReference &

[jira] [Commented] (PDFBOX-4407) ParentTree Objects do not match KArray objects after merge

2018-12-20 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16726438#comment-16726438
 ] 

ASF subversion and git services commented on PDFBOX-4407:
-

Commit 1849450 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1849450 ]

PDFBOX-4407: more detailed log message

> ParentTree Objects do not match KArray objects after merge
> --
>
> Key: PDFBOX-4407
> URL: https://issues.apache.org/jira/browse/PDFBOX-4407
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 3.0.0 PDFBox
>Reporter: Dan Anderson
>Priority: Major
>  Labels: StructureTree
> Attachments: 4407.patch, reading-order-merged-bad.pdf, 
> reading-order.pdf
>
>
> After merging tagged documents together, the second page of the resulting 
> document is no longer valid.  When the field objects are cloned in 
> PDFMergerUtility, the new and old objects are stored in a map named 
> objMapping.  This is used to replace the old references with the new 
> references for the acroform, k array, and annotation list.  However the 
> ParentTree is not updated to this new object reference.  This results in the 
> K Array and the Parent Tree having different references to the same object.  
> This causes issues when using an a11y reader like Jaws, and also causes 
> problems displaying the tags in Adobe DC.
> Here is a failing unit test that was created in PDFMergerUtilityTest to 
> demonstrate the issue.  It was created using an example from W3: 
> https://www.w3.org/WAI/WCAG20/Techniques/working-examples/PDF3/reading-order.pdf
> {code:java}
> public void testStructureTreeMerge3() throws IOException
> {
> PDFMergerUtility pdfMergerUtility = new PDFMergerUtility();
> PDDocument src = PDDocument.load(new File(SRCDIR, "reading-order.pdf"));
> PDDocument dst = PDDocument.load(new File(SRCDIR, "reading-order.pdf"));
> pdfMergerUtility.appendDocument(dst, src);
> src.close();
> dst.save(new File(TARGETTESTDIR, "reading-order-merged.pdf"));
> dst.close();
> PDDocument doc = PDDocument.load(new File(TARGETTESTDIR, 
> "reading-order-merged.pdf"));
> 
> assertTrue(checkAnnotationMatches(doc.getDocumentCatalog().getStructureTreeRoot().getKArray(),
>  doc.getDocumentCatalog().getAcroForm().getFields(), 
> (COSArray)doc.getDocumentCatalog().getStructureTreeRoot().getParentTree().getCOSObject().getDictionaryObject(COSName.NUMS)));
> }
> private boolean checkAnnotationMatches(COSArray kArray, List 
> acroformFields, COSArray numbersArray) {
> for (int i = 0; i < kArray.size(); i++) {
> COSBase entry = kArray.get(i);
> if (entry instanceof COSArray){
> COSArray entryAsArray = (COSArray) entry;
> if (!checkAnnotationMatches(entryAsArray, acroformFields, 
> numbersArray)) {
> return false;
> }
> } else if (entry instanceof COSInteger) {
> //do nothing, just need to screen these out so next line doesn't 
> blow up
> } else if (((COSObject) entry).getObject() instanceof COSDictionary){
> COSDictionary entryDictionary = (COSDictionary)((COSObject) 
> entry).getObject();
> if (entryDictionary.getItem(COSName.K) != null) {
> COSBase kids = entryDictionary.getItem(COSName.K);
> if (kids != null) {
> if (kids instanceof COSInteger) {
> //do nothing, don't care about marked content tags
> } else if (kids instanceof COSDictionary) {
> COSDictionary kidsAsDictionary = (COSDictionary) kids;
> if 
> (!checkForMatches(kidsAsDictionary.getDictionaryObject(COSName.OBJ), 
> acroformFields, numbersArray)) {
> return false;
> }
> } else if (kids instanceof COSArray) {
> COSArray kidsAsArray = (COSArray) kids;
> if (!checkAnnotationMatches(kidsAsArray, 
> acroformFields, numbersArray)) {
> return false;
> }
> }
> }
> } else if (entryDictionary.getDictionaryObject(COSName.OBJ) != 
> null) {
> if 
> (!checkForMatches(entryDictionary.getDictionaryObject(COSName.OBJ), 
> acroformFields, numbersArray)) {
> return false;
> }
> }
> }
> }
> return true;
> }
> private boolean checkForMatches(COSBase objectReference, List 
> acroformFields, COSArray numbersArray) {
> boolean result = false;
> for (PDField field : acroformFields) {
> if (field.getCOSObject() == objectReference && 
> nu

[jira] [Commented] (PDFBOX-4407) ParentTree Objects do not match KArray objects after merge

2018-12-20 Thread Tilman Hausherr (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16726220#comment-16726220
 ] 

Tilman Hausherr commented on PDFBOX-4407:
-

That's it for today, I'm going to sleep. It sounds too good to be true but it 
passes the test I created earlier today. I have not yet tested whether it 
passes your test.

> ParentTree Objects do not match KArray objects after merge
> --
>
> Key: PDFBOX-4407
> URL: https://issues.apache.org/jira/browse/PDFBOX-4407
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 3.0.0 PDFBox
>Reporter: Dan Anderson
>Priority: Major
>  Labels: StructureTree
> Attachments: 4407.patch, reading-order-merged-bad.pdf, 
> reading-order.pdf
>
>
> After merging tagged documents together, the second page of the resulting 
> document is no longer valid.  When the field objects are cloned in 
> PDFMergerUtility, the new and old objects are stored in a map named 
> objMapping.  This is used to replace the old references with the new 
> references for the acroform, k array, and annotation list.  However the 
> ParentTree is not updated to this new object reference.  This results in the 
> K Array and the Parent Tree having different references to the same object.  
> This causes issues when using an a11y reader like Jaws, and also causes 
> problems displaying the tags in Adobe DC.
> Here is a failing unit test that was created in PDFMergerUtilityTest to 
> demonstrate the issue.  It was created using an example from W3: 
> https://www.w3.org/WAI/WCAG20/Techniques/working-examples/PDF3/reading-order.pdf
> {code:java}
> public void testStructureTreeMerge3() throws IOException
> {
> PDFMergerUtility pdfMergerUtility = new PDFMergerUtility();
> PDDocument src = PDDocument.load(new File(SRCDIR, "reading-order.pdf"));
> PDDocument dst = PDDocument.load(new File(SRCDIR, "reading-order.pdf"));
> pdfMergerUtility.appendDocument(dst, src);
> src.close();
> dst.save(new File(TARGETTESTDIR, "reading-order-merged.pdf"));
> dst.close();
> PDDocument doc = PDDocument.load(new File(TARGETTESTDIR, 
> "reading-order-merged.pdf"));
> 
> assertTrue(checkAnnotationMatches(doc.getDocumentCatalog().getStructureTreeRoot().getKArray(),
>  doc.getDocumentCatalog().getAcroForm().getFields(), 
> (COSArray)doc.getDocumentCatalog().getStructureTreeRoot().getParentTree().getCOSObject().getDictionaryObject(COSName.NUMS)));
> }
> private boolean checkAnnotationMatches(COSArray kArray, List 
> acroformFields, COSArray numbersArray) {
> for (int i = 0; i < kArray.size(); i++) {
> COSBase entry = kArray.get(i);
> if (entry instanceof COSArray){
> COSArray entryAsArray = (COSArray) entry;
> if (!checkAnnotationMatches(entryAsArray, acroformFields, 
> numbersArray)) {
> return false;
> }
> } else if (entry instanceof COSInteger) {
> //do nothing, just need to screen these out so next line doesn't 
> blow up
> } else if (((COSObject) entry).getObject() instanceof COSDictionary){
> COSDictionary entryDictionary = (COSDictionary)((COSObject) 
> entry).getObject();
> if (entryDictionary.getItem(COSName.K) != null) {
> COSBase kids = entryDictionary.getItem(COSName.K);
> if (kids != null) {
> if (kids instanceof COSInteger) {
> //do nothing, don't care about marked content tags
> } else if (kids instanceof COSDictionary) {
> COSDictionary kidsAsDictionary = (COSDictionary) kids;
> if 
> (!checkForMatches(kidsAsDictionary.getDictionaryObject(COSName.OBJ), 
> acroformFields, numbersArray)) {
> return false;
> }
> } else if (kids instanceof COSArray) {
> COSArray kidsAsArray = (COSArray) kids;
> if (!checkAnnotationMatches(kidsAsArray, 
> acroformFields, numbersArray)) {
> return false;
> }
> }
> }
> } else if (entryDictionary.getDictionaryObject(COSName.OBJ) != 
> null) {
> if 
> (!checkForMatches(entryDictionary.getDictionaryObject(COSName.OBJ), 
> acroformFields, numbersArray)) {
> return false;
> }
> }
> }
> }
> return true;
> }
> private boolean checkForMatches(COSBase objectReference, List 
> acroformFields, COSArray numbersArray) {
> boolean result = false;
> for (PDField field : acroformFields) {
> if (field.getCOSObject() == objectReference && 
>

[jira] [Commented] (PDFBOX-4407) ParentTree Objects do not match KArray objects after merge

2018-12-20 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16726219#comment-16726219
 ] 

ASF subversion and git services commented on PDFBOX-4407:
-

Commit 1849438 from Tilman Hausherr in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1849438 ]

PDFBOX-4407: don't clone a clone; activate test fully and improve message

> ParentTree Objects do not match KArray objects after merge
> --
>
> Key: PDFBOX-4407
> URL: https://issues.apache.org/jira/browse/PDFBOX-4407
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 3.0.0 PDFBox
>Reporter: Dan Anderson
>Priority: Major
>  Labels: StructureTree
> Attachments: 4407.patch, reading-order-merged-bad.pdf, 
> reading-order.pdf
>
>
> After merging tagged documents together, the second page of the resulting 
> document is no longer valid.  When the field objects are cloned in 
> PDFMergerUtility, the new and old objects are stored in a map named 
> objMapping.  This is used to replace the old references with the new 
> references for the acroform, k array, and annotation list.  However the 
> ParentTree is not updated to this new object reference.  This results in the 
> K Array and the Parent Tree having different references to the same object.  
> This causes issues when using an a11y reader like Jaws, and also causes 
> problems displaying the tags in Adobe DC.
> Here is a failing unit test that was created in PDFMergerUtilityTest to 
> demonstrate the issue.  It was created using an example from W3: 
> https://www.w3.org/WAI/WCAG20/Techniques/working-examples/PDF3/reading-order.pdf
> {code:java}
> public void testStructureTreeMerge3() throws IOException
> {
> PDFMergerUtility pdfMergerUtility = new PDFMergerUtility();
> PDDocument src = PDDocument.load(new File(SRCDIR, "reading-order.pdf"));
> PDDocument dst = PDDocument.load(new File(SRCDIR, "reading-order.pdf"));
> pdfMergerUtility.appendDocument(dst, src);
> src.close();
> dst.save(new File(TARGETTESTDIR, "reading-order-merged.pdf"));
> dst.close();
> PDDocument doc = PDDocument.load(new File(TARGETTESTDIR, 
> "reading-order-merged.pdf"));
> 
> assertTrue(checkAnnotationMatches(doc.getDocumentCatalog().getStructureTreeRoot().getKArray(),
>  doc.getDocumentCatalog().getAcroForm().getFields(), 
> (COSArray)doc.getDocumentCatalog().getStructureTreeRoot().getParentTree().getCOSObject().getDictionaryObject(COSName.NUMS)));
> }
> private boolean checkAnnotationMatches(COSArray kArray, List 
> acroformFields, COSArray numbersArray) {
> for (int i = 0; i < kArray.size(); i++) {
> COSBase entry = kArray.get(i);
> if (entry instanceof COSArray){
> COSArray entryAsArray = (COSArray) entry;
> if (!checkAnnotationMatches(entryAsArray, acroformFields, 
> numbersArray)) {
> return false;
> }
> } else if (entry instanceof COSInteger) {
> //do nothing, just need to screen these out so next line doesn't 
> blow up
> } else if (((COSObject) entry).getObject() instanceof COSDictionary){
> COSDictionary entryDictionary = (COSDictionary)((COSObject) 
> entry).getObject();
> if (entryDictionary.getItem(COSName.K) != null) {
> COSBase kids = entryDictionary.getItem(COSName.K);
> if (kids != null) {
> if (kids instanceof COSInteger) {
> //do nothing, don't care about marked content tags
> } else if (kids instanceof COSDictionary) {
> COSDictionary kidsAsDictionary = (COSDictionary) kids;
> if 
> (!checkForMatches(kidsAsDictionary.getDictionaryObject(COSName.OBJ), 
> acroformFields, numbersArray)) {
> return false;
> }
> } else if (kids instanceof COSArray) {
> COSArray kidsAsArray = (COSArray) kids;
> if (!checkAnnotationMatches(kidsAsArray, 
> acroformFields, numbersArray)) {
> return false;
> }
> }
> }
> } else if (entryDictionary.getDictionaryObject(COSName.OBJ) != 
> null) {
> if 
> (!checkForMatches(entryDictionary.getDictionaryObject(COSName.OBJ), 
> acroformFields, numbersArray)) {
> return false;
> }
> }
> }
> }
> return true;
> }
> private boolean checkForMatches(COSBase objectReference, List 
> acroformFields, COSArray numbersArray) {
> boolean result = false;
> for (PDField field : acroformFields) {
> if (field.

[jira] [Commented] (PDFBOX-4407) ParentTree Objects do not match KArray objects after merge

2018-12-20 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16726218#comment-16726218
 ] 

ASF subversion and git services commented on PDFBOX-4407:
-

Commit 1849437 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1849437 ]

PDFBOX-4407: don't clone a clone; activate test fully and improve message

> ParentTree Objects do not match KArray objects after merge
> --
>
> Key: PDFBOX-4407
> URL: https://issues.apache.org/jira/browse/PDFBOX-4407
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 3.0.0 PDFBox
>Reporter: Dan Anderson
>Priority: Major
>  Labels: StructureTree
> Attachments: 4407.patch, reading-order-merged-bad.pdf, 
> reading-order.pdf
>
>
> After merging tagged documents together, the second page of the resulting 
> document is no longer valid.  When the field objects are cloned in 
> PDFMergerUtility, the new and old objects are stored in a map named 
> objMapping.  This is used to replace the old references with the new 
> references for the acroform, k array, and annotation list.  However the 
> ParentTree is not updated to this new object reference.  This results in the 
> K Array and the Parent Tree having different references to the same object.  
> This causes issues when using an a11y reader like Jaws, and also causes 
> problems displaying the tags in Adobe DC.
> Here is a failing unit test that was created in PDFMergerUtilityTest to 
> demonstrate the issue.  It was created using an example from W3: 
> https://www.w3.org/WAI/WCAG20/Techniques/working-examples/PDF3/reading-order.pdf
> {code:java}
> public void testStructureTreeMerge3() throws IOException
> {
> PDFMergerUtility pdfMergerUtility = new PDFMergerUtility();
> PDDocument src = PDDocument.load(new File(SRCDIR, "reading-order.pdf"));
> PDDocument dst = PDDocument.load(new File(SRCDIR, "reading-order.pdf"));
> pdfMergerUtility.appendDocument(dst, src);
> src.close();
> dst.save(new File(TARGETTESTDIR, "reading-order-merged.pdf"));
> dst.close();
> PDDocument doc = PDDocument.load(new File(TARGETTESTDIR, 
> "reading-order-merged.pdf"));
> 
> assertTrue(checkAnnotationMatches(doc.getDocumentCatalog().getStructureTreeRoot().getKArray(),
>  doc.getDocumentCatalog().getAcroForm().getFields(), 
> (COSArray)doc.getDocumentCatalog().getStructureTreeRoot().getParentTree().getCOSObject().getDictionaryObject(COSName.NUMS)));
> }
> private boolean checkAnnotationMatches(COSArray kArray, List 
> acroformFields, COSArray numbersArray) {
> for (int i = 0; i < kArray.size(); i++) {
> COSBase entry = kArray.get(i);
> if (entry instanceof COSArray){
> COSArray entryAsArray = (COSArray) entry;
> if (!checkAnnotationMatches(entryAsArray, acroformFields, 
> numbersArray)) {
> return false;
> }
> } else if (entry instanceof COSInteger) {
> //do nothing, just need to screen these out so next line doesn't 
> blow up
> } else if (((COSObject) entry).getObject() instanceof COSDictionary){
> COSDictionary entryDictionary = (COSDictionary)((COSObject) 
> entry).getObject();
> if (entryDictionary.getItem(COSName.K) != null) {
> COSBase kids = entryDictionary.getItem(COSName.K);
> if (kids != null) {
> if (kids instanceof COSInteger) {
> //do nothing, don't care about marked content tags
> } else if (kids instanceof COSDictionary) {
> COSDictionary kidsAsDictionary = (COSDictionary) kids;
> if 
> (!checkForMatches(kidsAsDictionary.getDictionaryObject(COSName.OBJ), 
> acroformFields, numbersArray)) {
> return false;
> }
> } else if (kids instanceof COSArray) {
> COSArray kidsAsArray = (COSArray) kids;
> if (!checkAnnotationMatches(kidsAsArray, 
> acroformFields, numbersArray)) {
> return false;
> }
> }
> }
> } else if (entryDictionary.getDictionaryObject(COSName.OBJ) != 
> null) {
> if 
> (!checkForMatches(entryDictionary.getDictionaryObject(COSName.OBJ), 
> acroformFields, numbersArray)) {
> return false;
> }
> }
> }
> }
> return true;
> }
> private boolean checkForMatches(COSBase objectReference, List 
> acroformFields, COSArray numbersArray) {
> boolean result = false;
> for (PDField field : acroformFields) {
> if (field.getCOSO

[jira] [Commented] (PDFBOX-4407) ParentTree Objects do not match KArray objects after merge

2018-12-20 Thread Dan Anderson (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16726018#comment-16726018
 ] 

Dan Anderson commented on PDFBOX-4407:
--

>From your comment it looks like you've got a handle on the issue.  My test was 
>just an attempt to highlight the bug.  If your test catches the error that's 
>all I care about.

Thanks for jumping on this right away, it really helps me.

> ParentTree Objects do not match KArray objects after merge
> --
>
> Key: PDFBOX-4407
> URL: https://issues.apache.org/jira/browse/PDFBOX-4407
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 3.0.0 PDFBox
>Reporter: Dan Anderson
>Priority: Major
>  Labels: StructureTree
> Attachments: 4407.patch, reading-order-merged-bad.pdf, 
> reading-order.pdf
>
>
> After merging tagged documents together, the second page of the resulting 
> document is no longer valid.  When the field objects are cloned in 
> PDFMergerUtility, the new and old objects are stored in a map named 
> objMapping.  This is used to replace the old references with the new 
> references for the acroform, k array, and annotation list.  However the 
> ParentTree is not updated to this new object reference.  This results in the 
> K Array and the Parent Tree having different references to the same object.  
> This causes issues when using an a11y reader like Jaws, and also causes 
> problems displaying the tags in Adobe DC.
> Here is a failing unit test that was created in PDFMergerUtilityTest to 
> demonstrate the issue.  It was created using an example from W3: 
> https://www.w3.org/WAI/WCAG20/Techniques/working-examples/PDF3/reading-order.pdf
> {code:java}
> public void testStructureTreeMerge3() throws IOException
> {
> PDFMergerUtility pdfMergerUtility = new PDFMergerUtility();
> PDDocument src = PDDocument.load(new File(SRCDIR, "reading-order.pdf"));
> PDDocument dst = PDDocument.load(new File(SRCDIR, "reading-order.pdf"));
> pdfMergerUtility.appendDocument(dst, src);
> src.close();
> dst.save(new File(TARGETTESTDIR, "reading-order-merged.pdf"));
> dst.close();
> PDDocument doc = PDDocument.load(new File(TARGETTESTDIR, 
> "reading-order-merged.pdf"));
> 
> assertTrue(checkAnnotationMatches(doc.getDocumentCatalog().getStructureTreeRoot().getKArray(),
>  doc.getDocumentCatalog().getAcroForm().getFields(), 
> (COSArray)doc.getDocumentCatalog().getStructureTreeRoot().getParentTree().getCOSObject().getDictionaryObject(COSName.NUMS)));
> }
> private boolean checkAnnotationMatches(COSArray kArray, List 
> acroformFields, COSArray numbersArray) {
> for (int i = 0; i < kArray.size(); i++) {
> COSBase entry = kArray.get(i);
> if (entry instanceof COSArray){
> COSArray entryAsArray = (COSArray) entry;
> if (!checkAnnotationMatches(entryAsArray, acroformFields, 
> numbersArray)) {
> return false;
> }
> } else if (entry instanceof COSInteger) {
> //do nothing, just need to screen these out so next line doesn't 
> blow up
> } else if (((COSObject) entry).getObject() instanceof COSDictionary){
> COSDictionary entryDictionary = (COSDictionary)((COSObject) 
> entry).getObject();
> if (entryDictionary.getItem(COSName.K) != null) {
> COSBase kids = entryDictionary.getItem(COSName.K);
> if (kids != null) {
> if (kids instanceof COSInteger) {
> //do nothing, don't care about marked content tags
> } else if (kids instanceof COSDictionary) {
> COSDictionary kidsAsDictionary = (COSDictionary) kids;
> if 
> (!checkForMatches(kidsAsDictionary.getDictionaryObject(COSName.OBJ), 
> acroformFields, numbersArray)) {
> return false;
> }
> } else if (kids instanceof COSArray) {
> COSArray kidsAsArray = (COSArray) kids;
> if (!checkAnnotationMatches(kidsAsArray, 
> acroformFields, numbersArray)) {
> return false;
> }
> }
> }
> } else if (entryDictionary.getDictionaryObject(COSName.OBJ) != 
> null) {
> if 
> (!checkForMatches(entryDictionary.getDictionaryObject(COSName.OBJ), 
> acroformFields, numbersArray)) {
> return false;
> }
> }
> }
> }
> return true;
> }
> private boolean checkForMatches(COSBase objectReference, List 
> acroformFields, COSArray numbersArray) {
> boolean result = false;
> for (PDField field : acroformFields) 

[jira] [Commented] (PDFBOX-4407) ParentTree Objects do not match KArray objects after merge

2018-12-20 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16725982#comment-16725982
 ] 

ASF subversion and git services commented on PDFBOX-4407:
-

Commit 1849412 from Tilman Hausherr in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1849412 ]

PDFBOX-4407: add orphan check for object reference dictionary

> ParentTree Objects do not match KArray objects after merge
> --
>
> Key: PDFBOX-4407
> URL: https://issues.apache.org/jira/browse/PDFBOX-4407
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 3.0.0 PDFBox
>Reporter: Dan Anderson
>Priority: Major
>  Labels: StructureTree
> Attachments: 4407.patch, reading-order-merged-bad.pdf, 
> reading-order.pdf
>
>
> After merging tagged documents together, the second page of the resulting 
> document is no longer valid.  When the field objects are cloned in 
> PDFMergerUtility, the new and old objects are stored in a map named 
> objMapping.  This is used to replace the old references with the new 
> references for the acroform, k array, and annotation list.  However the 
> ParentTree is not updated to this new object reference.  This results in the 
> K Array and the Parent Tree having different references to the same object.  
> This causes issues when using an a11y reader like Jaws, and also causes 
> problems displaying the tags in Adobe DC.
> Here is a failing unit test that was created in PDFMergerUtilityTest to 
> demonstrate the issue.  It was created using an example from W3: 
> https://www.w3.org/WAI/WCAG20/Techniques/working-examples/PDF3/reading-order.pdf
> {code:java}
> public void testStructureTreeMerge3() throws IOException
> {
> PDFMergerUtility pdfMergerUtility = new PDFMergerUtility();
> PDDocument src = PDDocument.load(new File(SRCDIR, "reading-order.pdf"));
> PDDocument dst = PDDocument.load(new File(SRCDIR, "reading-order.pdf"));
> pdfMergerUtility.appendDocument(dst, src);
> src.close();
> dst.save(new File(TARGETTESTDIR, "reading-order-merged.pdf"));
> dst.close();
> PDDocument doc = PDDocument.load(new File(TARGETTESTDIR, 
> "reading-order-merged.pdf"));
> 
> assertTrue(checkAnnotationMatches(doc.getDocumentCatalog().getStructureTreeRoot().getKArray(),
>  doc.getDocumentCatalog().getAcroForm().getFields(), 
> (COSArray)doc.getDocumentCatalog().getStructureTreeRoot().getParentTree().getCOSObject().getDictionaryObject(COSName.NUMS)));
> }
> private boolean checkAnnotationMatches(COSArray kArray, List 
> acroformFields, COSArray numbersArray) {
> for (int i = 0; i < kArray.size(); i++) {
> COSBase entry = kArray.get(i);
> if (entry instanceof COSArray){
> COSArray entryAsArray = (COSArray) entry;
> if (!checkAnnotationMatches(entryAsArray, acroformFields, 
> numbersArray)) {
> return false;
> }
> } else if (entry instanceof COSInteger) {
> //do nothing, just need to screen these out so next line doesn't 
> blow up
> } else if (((COSObject) entry).getObject() instanceof COSDictionary){
> COSDictionary entryDictionary = (COSDictionary)((COSObject) 
> entry).getObject();
> if (entryDictionary.getItem(COSName.K) != null) {
> COSBase kids = entryDictionary.getItem(COSName.K);
> if (kids != null) {
> if (kids instanceof COSInteger) {
> //do nothing, don't care about marked content tags
> } else if (kids instanceof COSDictionary) {
> COSDictionary kidsAsDictionary = (COSDictionary) kids;
> if 
> (!checkForMatches(kidsAsDictionary.getDictionaryObject(COSName.OBJ), 
> acroformFields, numbersArray)) {
> return false;
> }
> } else if (kids instanceof COSArray) {
> COSArray kidsAsArray = (COSArray) kids;
> if (!checkAnnotationMatches(kidsAsArray, 
> acroformFields, numbersArray)) {
> return false;
> }
> }
> }
> } else if (entryDictionary.getDictionaryObject(COSName.OBJ) != 
> null) {
> if 
> (!checkForMatches(entryDictionary.getDictionaryObject(COSName.OBJ), 
> acroformFields, numbersArray)) {
> return false;
> }
> }
> }
> }
> return true;
> }
> private boolean checkForMatches(COSBase objectReference, List 
> acroformFields, COSArray numbersArray) {
> boolean result = false;
> for (PDField field : acroformFields) {
> if (field.getCOSObject

[jira] [Commented] (PDFBOX-4407) ParentTree Objects do not match KArray objects after merge

2018-12-20 Thread Tilman Hausherr (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16725987#comment-16725987
 ] 

Tilman Hausherr commented on PDFBOX-4407:
-

I haven't understood your code, but I haven't made had the time. I added an 
inactive (only prints a message) test based on my observation in my previous 
comment. This may or may not be the same than your test, although it seems the 
approaches are different, you check something for deep equality, and I just 
look whether a page is an orphan.

> ParentTree Objects do not match KArray objects after merge
> --
>
> Key: PDFBOX-4407
> URL: https://issues.apache.org/jira/browse/PDFBOX-4407
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 3.0.0 PDFBox
>Reporter: Dan Anderson
>Priority: Major
>  Labels: StructureTree
> Attachments: 4407.patch, reading-order-merged-bad.pdf, 
> reading-order.pdf
>
>
> After merging tagged documents together, the second page of the resulting 
> document is no longer valid.  When the field objects are cloned in 
> PDFMergerUtility, the new and old objects are stored in a map named 
> objMapping.  This is used to replace the old references with the new 
> references for the acroform, k array, and annotation list.  However the 
> ParentTree is not updated to this new object reference.  This results in the 
> K Array and the Parent Tree having different references to the same object.  
> This causes issues when using an a11y reader like Jaws, and also causes 
> problems displaying the tags in Adobe DC.
> Here is a failing unit test that was created in PDFMergerUtilityTest to 
> demonstrate the issue.  It was created using an example from W3: 
> https://www.w3.org/WAI/WCAG20/Techniques/working-examples/PDF3/reading-order.pdf
> {code:java}
> public void testStructureTreeMerge3() throws IOException
> {
> PDFMergerUtility pdfMergerUtility = new PDFMergerUtility();
> PDDocument src = PDDocument.load(new File(SRCDIR, "reading-order.pdf"));
> PDDocument dst = PDDocument.load(new File(SRCDIR, "reading-order.pdf"));
> pdfMergerUtility.appendDocument(dst, src);
> src.close();
> dst.save(new File(TARGETTESTDIR, "reading-order-merged.pdf"));
> dst.close();
> PDDocument doc = PDDocument.load(new File(TARGETTESTDIR, 
> "reading-order-merged.pdf"));
> 
> assertTrue(checkAnnotationMatches(doc.getDocumentCatalog().getStructureTreeRoot().getKArray(),
>  doc.getDocumentCatalog().getAcroForm().getFields(), 
> (COSArray)doc.getDocumentCatalog().getStructureTreeRoot().getParentTree().getCOSObject().getDictionaryObject(COSName.NUMS)));
> }
> private boolean checkAnnotationMatches(COSArray kArray, List 
> acroformFields, COSArray numbersArray) {
> for (int i = 0; i < kArray.size(); i++) {
> COSBase entry = kArray.get(i);
> if (entry instanceof COSArray){
> COSArray entryAsArray = (COSArray) entry;
> if (!checkAnnotationMatches(entryAsArray, acroformFields, 
> numbersArray)) {
> return false;
> }
> } else if (entry instanceof COSInteger) {
> //do nothing, just need to screen these out so next line doesn't 
> blow up
> } else if (((COSObject) entry).getObject() instanceof COSDictionary){
> COSDictionary entryDictionary = (COSDictionary)((COSObject) 
> entry).getObject();
> if (entryDictionary.getItem(COSName.K) != null) {
> COSBase kids = entryDictionary.getItem(COSName.K);
> if (kids != null) {
> if (kids instanceof COSInteger) {
> //do nothing, don't care about marked content tags
> } else if (kids instanceof COSDictionary) {
> COSDictionary kidsAsDictionary = (COSDictionary) kids;
> if 
> (!checkForMatches(kidsAsDictionary.getDictionaryObject(COSName.OBJ), 
> acroformFields, numbersArray)) {
> return false;
> }
> } else if (kids instanceof COSArray) {
> COSArray kidsAsArray = (COSArray) kids;
> if (!checkAnnotationMatches(kidsAsArray, 
> acroformFields, numbersArray)) {
> return false;
> }
> }
> }
> } else if (entryDictionary.getDictionaryObject(COSName.OBJ) != 
> null) {
> if 
> (!checkForMatches(entryDictionary.getDictionaryObject(COSName.OBJ), 
> acroformFields, numbersArray)) {
> return false;
> }
> }
> }
> }
> return true;
> }
> private boolean checkForMatches(COSBase objectReference, List 
> a

[jira] [Commented] (PDFBOX-4407) ParentTree Objects do not match KArray objects after merge

2018-12-20 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16725981#comment-16725981
 ] 

ASF subversion and git services commented on PDFBOX-4407:
-

Commit 1849411 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1849411 ]

PDFBOX-4407: add orphan check for object reference dictionary

> ParentTree Objects do not match KArray objects after merge
> --
>
> Key: PDFBOX-4407
> URL: https://issues.apache.org/jira/browse/PDFBOX-4407
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 3.0.0 PDFBox
>Reporter: Dan Anderson
>Priority: Major
>  Labels: StructureTree
> Attachments: 4407.patch, reading-order-merged-bad.pdf, 
> reading-order.pdf
>
>
> After merging tagged documents together, the second page of the resulting 
> document is no longer valid.  When the field objects are cloned in 
> PDFMergerUtility, the new and old objects are stored in a map named 
> objMapping.  This is used to replace the old references with the new 
> references for the acroform, k array, and annotation list.  However the 
> ParentTree is not updated to this new object reference.  This results in the 
> K Array and the Parent Tree having different references to the same object.  
> This causes issues when using an a11y reader like Jaws, and also causes 
> problems displaying the tags in Adobe DC.
> Here is a failing unit test that was created in PDFMergerUtilityTest to 
> demonstrate the issue.  It was created using an example from W3: 
> https://www.w3.org/WAI/WCAG20/Techniques/working-examples/PDF3/reading-order.pdf
> {code:java}
> public void testStructureTreeMerge3() throws IOException
> {
> PDFMergerUtility pdfMergerUtility = new PDFMergerUtility();
> PDDocument src = PDDocument.load(new File(SRCDIR, "reading-order.pdf"));
> PDDocument dst = PDDocument.load(new File(SRCDIR, "reading-order.pdf"));
> pdfMergerUtility.appendDocument(dst, src);
> src.close();
> dst.save(new File(TARGETTESTDIR, "reading-order-merged.pdf"));
> dst.close();
> PDDocument doc = PDDocument.load(new File(TARGETTESTDIR, 
> "reading-order-merged.pdf"));
> 
> assertTrue(checkAnnotationMatches(doc.getDocumentCatalog().getStructureTreeRoot().getKArray(),
>  doc.getDocumentCatalog().getAcroForm().getFields(), 
> (COSArray)doc.getDocumentCatalog().getStructureTreeRoot().getParentTree().getCOSObject().getDictionaryObject(COSName.NUMS)));
> }
> private boolean checkAnnotationMatches(COSArray kArray, List 
> acroformFields, COSArray numbersArray) {
> for (int i = 0; i < kArray.size(); i++) {
> COSBase entry = kArray.get(i);
> if (entry instanceof COSArray){
> COSArray entryAsArray = (COSArray) entry;
> if (!checkAnnotationMatches(entryAsArray, acroformFields, 
> numbersArray)) {
> return false;
> }
> } else if (entry instanceof COSInteger) {
> //do nothing, just need to screen these out so next line doesn't 
> blow up
> } else if (((COSObject) entry).getObject() instanceof COSDictionary){
> COSDictionary entryDictionary = (COSDictionary)((COSObject) 
> entry).getObject();
> if (entryDictionary.getItem(COSName.K) != null) {
> COSBase kids = entryDictionary.getItem(COSName.K);
> if (kids != null) {
> if (kids instanceof COSInteger) {
> //do nothing, don't care about marked content tags
> } else if (kids instanceof COSDictionary) {
> COSDictionary kidsAsDictionary = (COSDictionary) kids;
> if 
> (!checkForMatches(kidsAsDictionary.getDictionaryObject(COSName.OBJ), 
> acroformFields, numbersArray)) {
> return false;
> }
> } else if (kids instanceof COSArray) {
> COSArray kidsAsArray = (COSArray) kids;
> if (!checkAnnotationMatches(kidsAsArray, 
> acroformFields, numbersArray)) {
> return false;
> }
> }
> }
> } else if (entryDictionary.getDictionaryObject(COSName.OBJ) != 
> null) {
> if 
> (!checkForMatches(entryDictionary.getDictionaryObject(COSName.OBJ), 
> acroformFields, numbersArray)) {
> return false;
> }
> }
> }
> }
> return true;
> }
> private boolean checkForMatches(COSBase objectReference, List 
> acroformFields, COSArray numbersArray) {
> boolean result = false;
> for (PDField field : acroformFields) {
> if (field.getCOSObject() == o

[jira] [Commented] (PDFBOX-4407) ParentTree Objects do not match KArray objects after merge

2018-12-19 Thread Tilman Hausherr (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16724863#comment-16724863
 ] 

Tilman Hausherr commented on PDFBOX-4407:
-

Example:

dummyFieldName3 in the acroForm fields tree at {{Root/AcroForm/Fields/[5]}} is 
object 16, and in the ParentTree in 
{{Root/StructTreeRoot/ParentTree/Nums/[15]/K/Obj}} it is object 166.

> ParentTree Objects do not match KArray objects after merge
> --
>
> Key: PDFBOX-4407
> URL: https://issues.apache.org/jira/browse/PDFBOX-4407
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 3.0.0 PDFBox
>Reporter: Dan Anderson
>Priority: Major
>  Labels: StructureTree
> Attachments: 4407.patch, reading-order.pdf
>
>
> After merging tagged documents together, the second page of the resulting 
> document is no longer valid.  When the field objects are cloned in 
> PDFMergerUtility, the new and old objects are stored in a map named 
> objMapping.  This is used to replace the old references with the new 
> references for the acroform, k array, and annotation list.  However the 
> ParentTree is not updated to this new object reference.  This results in the 
> K Array and the Parent Tree having different references to the same object.  
> This causes issues when using an a11y reader like Jaws, and also causes 
> problems displaying the tags in Adobe DC.
> Here is a failing unit test that was created in PDFMergerUtilityTest to 
> demonstrate the issue.  It was created using an example from W3: 
> https://www.w3.org/WAI/WCAG20/Techniques/working-examples/PDF3/reading-order.pdf
> {code:java}
> public void testStructureTreeMerge3() throws IOException
> {
> PDFMergerUtility pdfMergerUtility = new PDFMergerUtility();
> PDDocument src = PDDocument.load(new File(SRCDIR, "reading-order.pdf"));
> PDDocument dst = PDDocument.load(new File(SRCDIR, "reading-order.pdf"));
> pdfMergerUtility.appendDocument(dst, src);
> src.close();
> dst.save(new File(TARGETTESTDIR, "reading-order-merged.pdf"));
> dst.close();
> PDDocument doc = PDDocument.load(new File(TARGETTESTDIR, 
> "reading-order-merged.pdf"));
> 
> assertTrue(checkAnnotationMatches(doc.getDocumentCatalog().getStructureTreeRoot().getKArray(),
>  doc.getDocumentCatalog().getAcroForm().getFields(), 
> (COSArray)doc.getDocumentCatalog().getStructureTreeRoot().getParentTree().getCOSObject().getDictionaryObject(COSName.NUMS)));
> }
> private boolean checkAnnotationMatches(COSArray kArray, List 
> acroformFields, COSArray numbersArray) {
> for (int i = 0; i < kArray.size(); i++) {
> COSBase entry = kArray.get(i);
> if (entry instanceof COSArray){
> COSArray entryAsArray = (COSArray) entry;
> if (!checkAnnotationMatches(entryAsArray, acroformFields, 
> numbersArray)) {
> return false;
> }
> } else if (entry instanceof COSInteger) {
> //do nothing, just need to screen these out so next line doesn't 
> blow up
> } else if (((COSObject) entry).getObject() instanceof COSDictionary){
> COSDictionary entryDictionary = (COSDictionary)((COSObject) 
> entry).getObject();
> if (entryDictionary.getItem(COSName.K) != null) {
> COSBase kids = entryDictionary.getItem(COSName.K);
> if (kids != null) {
> if (kids instanceof COSInteger) {
> //do nothing, don't care about marked content tags
> } else if (kids instanceof COSDictionary) {
> COSDictionary kidsAsDictionary = (COSDictionary) kids;
> if 
> (!checkForMatches(kidsAsDictionary.getDictionaryObject(COSName.OBJ), 
> acroformFields, numbersArray)) {
> return false;
> }
> } else if (kids instanceof COSArray) {
> COSArray kidsAsArray = (COSArray) kids;
> if (!checkAnnotationMatches(kidsAsArray, 
> acroformFields, numbersArray)) {
> return false;
> }
> }
> }
> } else if (entryDictionary.getDictionaryObject(COSName.OBJ) != 
> null) {
> if 
> (!checkForMatches(entryDictionary.getDictionaryObject(COSName.OBJ), 
> acroformFields, numbersArray)) {
> return false;
> }
> }
> }
> }
> return true;
> }
> private boolean checkForMatches(COSBase objectReference, List 
> acroformFields, COSArray numbersArray) {
> boolean result = false;
> for (PDField field : acroformFields) {
> if (field.getCOSObject() == objectReference && 
> numbersA

[jira] [Commented] (PDFBOX-4407) ParentTree Objects do not match KArray objects after merge

2018-12-18 Thread Dan Anderson (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16724442#comment-16724442
 ] 

Dan Anderson commented on PDFBOX-4407:
--

I have attached an attempt to fix this issue.

> ParentTree Objects do not match KArray objects after merge
> --
>
> Key: PDFBOX-4407
> URL: https://issues.apache.org/jira/browse/PDFBOX-4407
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 3.0.0 PDFBox
>Reporter: Dan Anderson
>Priority: Major
>  Labels: StructureTree
> Attachments: 4407.patch, reading-order.pdf
>
>
> After merging tagged documents together, the second page of the resulting 
> document is no longer valid.  When the field objects are cloned in 
> PDFMergerUtility, the new and old objects are stored in a map named 
> objMapping.  This is used to replace the old references with the new 
> references for the acroform, k array, and annotation list.  However the 
> ParentTree is not updated to this new object reference.  This results in the 
> K Array and the Parent Tree having different references to the same object.  
> This causes issues when using an a11y reader like Jaws, and also causes 
> problems displaying the tags in Adobe DC.
> Here is a failing unit test that was created in PDFMergerUtilityTest to 
> demonstrate the issue.  It was created using an example from W3: 
> https://www.w3.org/WAI/WCAG20/Techniques/working-examples/PDF3/reading-order.pdf
> {code:java}
> public void testStructureTreeMerge3() throws IOException
> {
> PDFMergerUtility pdfMergerUtility = new PDFMergerUtility();
> PDDocument src = PDDocument.load(new File(SRCDIR, "reading-order.pdf"));
> PDDocument dst = PDDocument.load(new File(SRCDIR, "reading-order.pdf"));
> pdfMergerUtility.appendDocument(dst, src);
> src.close();
> dst.save(new File(TARGETTESTDIR, "reading-order-merged.pdf"));
> dst.close();
> PDDocument doc = PDDocument.load(new File(TARGETTESTDIR, 
> "reading-order-merged.pdf"));
> 
> assertTrue(checkAnnotationMatches(doc.getDocumentCatalog().getStructureTreeRoot().getKArray(),
>  doc.getDocumentCatalog().getAcroForm().getFields(), 
> (COSArray)doc.getDocumentCatalog().getStructureTreeRoot().getParentTree().getCOSObject().getDictionaryObject(COSName.NUMS)));
> }
> private boolean checkAnnotationMatches(COSArray kArray, List 
> acroformFields, COSArray numbersArray) {
> for (int i = 0; i < kArray.size(); i++) {
> COSBase entry = kArray.get(i);
> if (entry instanceof COSArray){
> COSArray entryAsArray = (COSArray) entry;
> if (!checkAnnotationMatches(entryAsArray, acroformFields, 
> numbersArray)) {
> return false;
> }
> } else if (entry instanceof COSInteger) {
> //do nothing, just need to screen these out so next line doesn't 
> blow up
> } else if (((COSObject) entry).getObject() instanceof COSDictionary){
> COSDictionary entryDictionary = (COSDictionary)((COSObject) 
> entry).getObject();
> if (entryDictionary.getItem(COSName.K) != null) {
> COSBase kids = entryDictionary.getItem(COSName.K);
> if (kids != null) {
> if (kids instanceof COSInteger) {
> //do nothing, don't care about marked content tags
> } else if (kids instanceof COSDictionary) {
> COSDictionary kidsAsDictionary = (COSDictionary) kids;
> if 
> (!checkForMatches(kidsAsDictionary.getDictionaryObject(COSName.OBJ), 
> acroformFields, numbersArray)) {
> return false;
> }
> } else if (kids instanceof COSArray) {
> COSArray kidsAsArray = (COSArray) kids;
> if (!checkAnnotationMatches(kidsAsArray, 
> acroformFields, numbersArray)) {
> return false;
> }
> }
> }
> } else if (entryDictionary.getDictionaryObject(COSName.OBJ) != 
> null) {
> if 
> (!checkForMatches(entryDictionary.getDictionaryObject(COSName.OBJ), 
> acroformFields, numbersArray)) {
> return false;
> }
> }
> }
> }
> return true;
> }
> private boolean checkForMatches(COSBase objectReference, List 
> acroformFields, COSArray numbersArray) {
> boolean result = false;
> for (PDField field : acroformFields) {
> if (field.getCOSObject() == objectReference && 
> numbersArray.indexOfObject(objectReference.getCOSObject()) > 0) {
> result = true;
> }
> }
> return result;
> }
> {code}



--
This mess

[jira] [Commented] (PDFBOX-4407) ParentTree Objects do not match KArray objects after merge

2018-12-17 Thread Dan Anderson (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16723377#comment-16723377
 ] 

Dan Anderson commented on PDFBOX-4407:
--

Thank you.  I was hoping a concrete example might help drive efforts.  I have 
some changes that seem to be helping, I will try to clean them up and submit 
them.

> ParentTree Objects do not match KArray objects after merge
> --
>
> Key: PDFBOX-4407
> URL: https://issues.apache.org/jira/browse/PDFBOX-4407
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 3.0.0 PDFBox
>Reporter: Dan Anderson
>Priority: Major
>  Labels: StructureTree
> Attachments: reading-order.pdf
>
>
> After merging tagged documents together, the second page of the resulting 
> document is no longer valid.  When the field objects are cloned in 
> PDFMergerUtility, the new and old objects are stored in a map named 
> objMapping.  This is used to replace the old references with the new 
> references for the acroform, k array, and annotation list.  However the 
> ParentTree is not updated to this new object reference.  This results in the 
> K Array and the Parent Tree having different references to the same object.  
> This causes issues when using an a11y reader like Jaws, and also causes 
> problems displaying the tags in Adobe DC.
> Here is a failing unit test that was created in PDFMergerUtilityTest to 
> demonstrate the issue.  It was created using an example from W3: 
> https://www.w3.org/WAI/WCAG20/Techniques/working-examples/PDF3/reading-order.pdf
> {code:java}
> public void testStructureTreeMerge3() throws IOException
> {
> PDFMergerUtility pdfMergerUtility = new PDFMergerUtility();
> PDDocument src = PDDocument.load(new File(SRCDIR, "reading-order.pdf"));
> PDDocument dst = PDDocument.load(new File(SRCDIR, "reading-order.pdf"));
> pdfMergerUtility.appendDocument(dst, src);
> src.close();
> dst.save(new File(TARGETTESTDIR, "reading-order-merged.pdf"));
> dst.close();
> PDDocument doc = PDDocument.load(new File(TARGETTESTDIR, 
> "reading-order-merged.pdf"));
> 
> assertTrue(checkAnnotationMatches(doc.getDocumentCatalog().getStructureTreeRoot().getKArray(),
>  doc.getDocumentCatalog().getAcroForm().getFields(), 
> (COSArray)doc.getDocumentCatalog().getStructureTreeRoot().getParentTree().getCOSObject().getDictionaryObject(COSName.NUMS)));
> }
> private boolean checkAnnotationMatches(COSArray kArray, List 
> acroformFields, COSArray numbersArray) {
> for (int i = 0; i < kArray.size(); i++) {
> COSBase entry = kArray.get(i);
> if (entry instanceof COSArray){
> COSArray entryAsArray = (COSArray) entry;
> if (!checkAnnotationMatches(entryAsArray, acroformFields, 
> numbersArray)) {
> return false;
> }
> } else if (entry instanceof COSInteger) {
> //do nothing, just need to screen these out so next line doesn't 
> blow up
> } else if (((COSObject) entry).getObject() instanceof COSDictionary){
> COSDictionary entryDictionary = (COSDictionary)((COSObject) 
> entry).getObject();
> if (entryDictionary.getItem(COSName.K) != null) {
> COSBase kids = entryDictionary.getItem(COSName.K);
> if (kids != null) {
> if (kids instanceof COSInteger) {
> //do nothing, don't care about marked content tags
> } else if (kids instanceof COSDictionary) {
> COSDictionary kidsAsDictionary = (COSDictionary) kids;
> if 
> (!checkForMatches(kidsAsDictionary.getDictionaryObject(COSName.OBJ), 
> acroformFields, numbersArray)) {
> return false;
> }
> } else if (kids instanceof COSArray) {
> COSArray kidsAsArray = (COSArray) kids;
> if (!checkAnnotationMatches(kidsAsArray, 
> acroformFields, numbersArray)) {
> return false;
> }
> }
> }
> } else if (entryDictionary.getDictionaryObject(COSName.OBJ) != 
> null) {
> if 
> (!checkForMatches(entryDictionary.getDictionaryObject(COSName.OBJ), 
> acroformFields, numbersArray)) {
> return false;
> }
> }
> }
> }
> return true;
> }
> private boolean checkForMatches(COSBase objectReference, List 
> acroformFields, COSArray numbersArray) {
> boolean result = false;
> for (PDField field : acroformFields) {
> if (field.getCOSObject() == objectReference && 
> numbersArray.indexOfObject(objectReference.getCOSObject()) 

[jira] [Commented] (PDFBOX-4407) ParentTree Objects do not match KArray objects after merge

2018-12-17 Thread Tilman Hausherr (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16723375#comment-16723375
 ] 

Tilman Hausherr commented on PDFBOX-4407:
-

Please correct the version… you clicked on a JBIG2 related version. I'm 
wondering whether you mean some old version (then maybe it has been fixed), or 
if you tested with 3.0.0, i.e. the trunk. If it is the later, then look at the 
other open StructureTree issues (click on the label). All this is mostly 
unknown territory for us, so I'm afraid it won't be fixed soon, but I'll still 
have a look.

> ParentTree Objects do not match KArray objects after merge
> --
>
> Key: PDFBOX-4407
> URL: https://issues.apache.org/jira/browse/PDFBOX-4407
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 3.0.2 JBIG2
>Reporter: Dan Anderson
>Priority: Major
>  Labels: StructureTree
> Attachments: reading-order.pdf
>
>
> After merging tagged documents together, the second page of the resulting 
> document is no longer valid.  When the field objects are cloned in 
> PDFMergerUtility, the new and old objects are stored in a map named 
> objMapping.  This is used to replace the old references with the new 
> references for the acroform, k array, and annotation list.  However the 
> ParentTree is not updated to this new object reference.  This results in the 
> K Array and the Parent Tree having different references to the same object.  
> This causes issues when using an a11y reader like Jaws, and also causes 
> problems displaying the tags in Adobe DC.
> Here is a failing unit test that was created in PDFMergerUtilityTest to 
> demonstrate the issue.  It was created using an example from W3: 
> https://www.w3.org/WAI/WCAG20/Techniques/working-examples/PDF3/reading-order.pdf
> {code:java}
> public void testStructureTreeMerge3() throws IOException
> {
> PDFMergerUtility pdfMergerUtility = new PDFMergerUtility();
> PDDocument src = PDDocument.load(new File(SRCDIR, "reading-order.pdf"));
> PDDocument dst = PDDocument.load(new File(SRCDIR, "reading-order.pdf"));
> pdfMergerUtility.appendDocument(dst, src);
> src.close();
> dst.save(new File(TARGETTESTDIR, "reading-order-merged.pdf"));
> dst.close();
> PDDocument doc = PDDocument.load(new File(TARGETTESTDIR, 
> "reading-order-merged.pdf"));
> 
> assertTrue(checkAnnotationMatches(doc.getDocumentCatalog().getStructureTreeRoot().getKArray(),
>  doc.getDocumentCatalog().getAcroForm().getFields(), 
> (COSArray)doc.getDocumentCatalog().getStructureTreeRoot().getParentTree().getCOSObject().getDictionaryObject(COSName.NUMS)));
> }
> private boolean checkAnnotationMatches(COSArray kArray, List 
> acroformFields, COSArray numbersArray) {
> for (int i = 0; i < kArray.size(); i++) {
> COSBase entry = kArray.get(i);
> if (entry instanceof COSArray){
> COSArray entryAsArray = (COSArray) entry;
> if (!checkAnnotationMatches(entryAsArray, acroformFields, 
> numbersArray)) {
> return false;
> }
> } else if (entry instanceof COSInteger) {
> //do nothing, just need to screen these out so next line doesn't 
> blow up
> } else if (((COSObject) entry).getObject() instanceof COSDictionary){
> COSDictionary entryDictionary = (COSDictionary)((COSObject) 
> entry).getObject();
> if (entryDictionary.getItem(COSName.K) != null) {
> COSBase kids = entryDictionary.getItem(COSName.K);
> if (kids != null) {
> if (kids instanceof COSInteger) {
> //do nothing, don't care about marked content tags
> } else if (kids instanceof COSDictionary) {
> COSDictionary kidsAsDictionary = (COSDictionary) kids;
> if 
> (!checkForMatches(kidsAsDictionary.getDictionaryObject(COSName.OBJ), 
> acroformFields, numbersArray)) {
> return false;
> }
> } else if (kids instanceof COSArray) {
> COSArray kidsAsArray = (COSArray) kids;
> if (!checkAnnotationMatches(kidsAsArray, 
> acroformFields, numbersArray)) {
> return false;
> }
> }
> }
> } else if (entryDictionary.getDictionaryObject(COSName.OBJ) != 
> null) {
> if 
> (!checkForMatches(entryDictionary.getDictionaryObject(COSName.OBJ), 
> acroformFields, numbersArray)) {
> return false;
> }
> }
> }
> }
> return true;
> }
> private boolean checkForMatches(COSBase objectReference, L