[ https://issues.apache.org/jira/browse/PDFBOX-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16726539#comment-16726539 ]
Tilman Hausherr commented on PDFBOX-4007: ----------------------------------------- Hello [~DavesPlanet], could you test with the current snapshot whether the problem still happens? [https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.14-SNAPSHOT/] Several problems have been fixed recently (PDFBOX-4407 and PDFBOX-4408). > Merged documents don't retain tags > ---------------------------------- > > Key: PDFBOX-4007 > URL: https://issues.apache.org/jira/browse/PDFBOX-4007 > Project: PDFBox > Issue Type: Bug > Components: Utilities > Affects Versions: 2.0.8 > Reporter: Dave Hill > Priority: Minor > Labels: StructureTree, merge > Attachments: FourFontsTagged.pdf, HelloWorldTagged.pdf, > PDFMergeUtility-2.patch, PDFMergeUtility.patch, > Tagged+GeneralForbearance-Merged.pdf, > Tagged-GeneralForbearance-merged-21.12.2018.pdf, Tagged.pdf > > > Certain combinations of documents don't retain tags when merged. The document > [^Tagged.pdf] is just a basic one word PDF created and tagged with Pro DC. If > you try to merge this with the government [General Forbearance > form|https://studentloans.gov/myDirectLoan/downloadForm.action?searchType=library&shortName=general&localeCode=en-us] > the output crashes DC when you try to view the tags. If you use a flattened > version of the General Forbearance form then the tags are just munged. > {code} > public static void main(String[] args) throws Exception { > PDFMergerUtility pdfMergerUtility = new PDFMergerUtility(); > PDDocument src = PDDocument.load(new File("Tagged.pdf")); > PDDocument dest = PDDocument.load(new File("GeneralForbearance.pdf")); > pdfMergerUtility.appendDocument(dest, src); > src.close(); > dest.save(new File("BrokenTags.pdf")); > dest.close(); > } > {code} > The included patch appears to make tagging more reliable, but I'm still > relying heavily on cloning which can apparently cause other issues. The > documents I get out with this code seem present correctly in Adobe readers > for all combinations of documents that I tested against. > My patch is made and tested against yesterdays production head and it > includes my changes from > [PDFBOX-3999|https://issues.apache.org/jira/browse/PDFBOX-3999] since it is > in the exact same place in the code. > The priority of this is a blocker for 508 compliance of merged documents but > I guessed it to be more of a minor issue in the overall scheme of things, > please correct me if I am mistaken. > Thanks! -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org