[ 
https://issues.apache.org/jira/browse/PDFBOX-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-4007:
------------------------------------
    Attachment: Tagged-GeneralForbearance-merged-21.12.2018.pdf

> Merged documents don't retain tags
> ----------------------------------
>
>                 Key: PDFBOX-4007
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4007
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Utilities
>    Affects Versions: 2.0.8
>            Reporter: Dave Hill
>            Priority: Minor
>              Labels: StructureTree, merge
>         Attachments: FourFontsTagged.pdf, HelloWorldTagged.pdf, 
> PDFMergeUtility-2.patch, PDFMergeUtility.patch, 
> Tagged+GeneralForbearance-Merged.pdf, 
> Tagged-GeneralForbearance-merged-21.12.2018.pdf, Tagged.pdf
>
>
> Certain combinations of documents don't retain tags when merged. The document 
> [^Tagged.pdf] is just a basic one word PDF created and tagged with Pro DC. If 
> you try to merge this with the government [General Forbearance 
> form|https://studentloans.gov/myDirectLoan/downloadForm.action?searchType=library&shortName=general&localeCode=en-us]
>  the output crashes DC when you try to view the tags. If you use a flattened 
> version of the General Forbearance form then the tags are just munged.
> {code}
>     public static void main(String[] args) throws Exception {
>         PDFMergerUtility pdfMergerUtility = new PDFMergerUtility();
>         PDDocument src = PDDocument.load(new File("Tagged.pdf"));
>         PDDocument dest = PDDocument.load(new File("GeneralForbearance.pdf"));
>         pdfMergerUtility.appendDocument(dest, src);
>         src.close();
>         dest.save(new File("BrokenTags.pdf"));
>         dest.close();
>     }
> {code}
> The included patch appears to make tagging more reliable, but I'm still 
> relying heavily on cloning which can apparently cause other issues.  The 
> documents I get out with this code seem present correctly in Adobe readers 
> for all combinations of documents that I tested against.
> My patch is made and tested against yesterdays production head and it 
> includes my changes from 
> [PDFBOX-3999|https://issues.apache.org/jira/browse/PDFBOX-3999] since it is 
> in the exact same place in the code.
> The priority of this is a blocker for 508 compliance of merged documents but 
> I guessed it to be more of a minor issue in the overall scheme of things, 
> please correct me if I am mistaken.
> Thanks!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to