Andre created PDFBOX-5528:
-----------------------------

             Summary: PDF/UA: Add marked content sections when flattening acro 
forms
                 Key: PDFBOX-5528
                 URL: https://issues.apache.org/jira/browse/PDFBOX-5528
             Project: PDFBox
          Issue Type: Improvement
          Components: AcroForm
            Reporter: Andre
         Attachments: correct.png, wrong.png

We need to support PDF/UA compliant documents to some extent. I noticed that 
when we take a PDF/UA compliant PDF document and flatten it via 
PDAcroForm#flatten, the resulting output is not PDF/UA compliant anymore.

After a little bit of research, the problem is that PDFBox creates /DO 
operators with paths representing the appearance of the form fields. According 
to the PDF/UA standard, such paths need to be enclosed in marked content 
sections (BMC ... EMC, BDC ... EMC, see attached images)

By copying some code from AcroForm#flatten and adding 
contentStream.beginMarkedContent and contentStream.endMarkedContent myself, I 
can workaround the problem, but that's less than ideal, it would be great if 
this could be included in PDFBox.

<pre> 

           final var dict = new COSDictionary();
           dict.setLong(COSName.MCID, mcid);
           dict.setItem(COSName.BBOX, bBox);
           dict.setItem(COSName.TYPE, COSName.BACKGROUND);
            final var propList = PDPropertyList.create(dict);
            contentStream.beginMarkedContent(COSName.ARTIFACT, propList);

            contentStream.saveGraphicsState();

            // see https://stackoverflow.com/a/54091766/1729265 for an 
explanation
            // of the steps required
            // this will transform the appearance stream form object into the 
rectangle of the
            // annotation bbox and map the coordinate systems
            final var transformationMatrix = 
pdfbox_resolveTransformationMatrix(form, annotation, appearanceStream);

            contentStream.transform(transformationMatrix);
            contentStream.drawForm(fieldObject);
            contentStream.restoreGraphicsState();

            contentStream.endMarkedContent();

</pre>



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to