[ 
https://issues.apache.org/jira/browse/PDFBOX-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14185117#comment-14185117
 ] 

Laurent Richard commented on PDFBOX-2419:
-----------------------------------------

The problem is specific to XML format XFDF where special characters should be 
escaped. There's no issue with FDF.
I join a sample PDF file with simple AcroForm containing such characters (in 
the field named "Nom").
With code like
{code}
PDDocument pdf = PDDocument.load("SampleForm.pdf");
PDAcroForm form = pdf.getDocumentCatalog().getAcroForm();
FDFDocument fdf = form.exportFDF();
List<FDFField> fields = fdf.getCatalog().getFDF().getFields();
StringWriter writer = new StringWriter();
fdf.saveXFDF(writer);
return writer.toString();
{code}
We get the following content
{code}
<?xml version="1.0" encoding="UTF-8"?>
<xfdf xmlns="http://ns.adobe.com/xfdf/"; xml:space="preserve">
<ids original="40DE256FBEC20B428C72BCF68015AB9E" 
modified="3E3C2606FFB360C3FA74D3921A630318" />
<fields>
<field name="choix_1_J-64ncBSov7NcPeTY8oJ3A">
<value>Oui</value>
</field>
<field name="prenom_By11Gk3puTlnwwnv4WA0-g">
<value>special XML characters &lt; &gt; &amp;</value>
</field>
<field name="nom_yQacEuz649N*BJguviO5Ow">
<value>special XML characters < > &</value>
</field>
</fields>
</xfdf>
{code}
which is not valid XML since '<', '>' and '&' should be escaped. The right 
result would be :
{code}
<?xml version="1.0" encoding="UTF-8"?>
<xfdf xmlns="http://ns.adobe.com/xfdf/"; xml:space="preserve">
<ids original="40DE256FBEC20B428C72BCF68015AB9E" 
modified="3E3C2606FFB360C3FA74D3921A630318" />
<fields>
<field name="choix_1_J-64ncBSov7NcPeTY8oJ3A">
<value>Oui</value>
</field>
<field name="prenom_By11Gk3puTlnwwnv4WA0-g">
<value>special XML characters &amp;lt; &amp;gt; &amp;amp;</value>
</field>
<field name="nom_yQacEuz649N*BJguviO5Ow">
<value>special XML characters &lt; &gt; &amp;</value>
</field>
</fields>
</xfdf>
{code}
Ideally, relying on JAXP (Java API for XML Processing) instead of manipulating 
directly a String content would handle such things.

> XFDF export is not XML compliant
> --------------------------------
>
>                 Key: PDFBOX-2419
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2419
>             Project: PDFBox
>          Issue Type: Bug
>          Components: AcroForm
>    Affects Versions: 1.8.7
>            Reporter: Laurent Richard
>              Labels: FDF
>             Fix For: 1.8.8
>
>         Attachments: SampleForm.pdf
>
>
> The XFDF content is written as a simple string instead of XML nodes.
> As a result, field values containing special characters (&, <, >, ...) are 
> not escaped and the resulting XML is invalid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to