Kai Keggenhoff created PDFBOX-4345:
--------------------------------------

             Summary: FDFAnnotation.richContentsToString
                 Key: PDFBOX-4345
                 URL: https://issues.apache.org/jira/browse/PDFBOX-4345
             Project: PDFBox
          Issue Type: Bug
          Components: PDModel
    Affects Versions: 2.0.12
            Reporter: Kai Keggenhoff
         Attachments: FDFAnnotation_diff.txt, FDFAnnotation_new.java, 
MergeTest.java

The method FDFAnnotation.richContentsToString does not evaluate text nodes with 
siblings in the XML which can lead to missing text when you parse XFDF data and 
add the annotations to a PDF.

Example : parsing a XFDF string containing

<p>Text A <span style="text-decoration:word;">Text B</span> Text C</p>

and adding the annotation will display only "+Text B+".

I've included a code sample (MergeTest.java) which generates two PDFs.
For one PDF, the paragraph contains only spans with text nodes as their only 
children and all the text is included, for the other PDF, the paragraph has 
mixed text nodes and elements as children and here, the content from the text 
siblings of the "span" is missing.

I propose the following fix:

Instead of traversing the children of an element with the XPath "*" expression, 
simply iterate the children obtained from Node.getChildNodes(), process Text 
and CDATASection nodes directly and call richContentsToString for any elements.

(source : FDFAnnotation_new.java, diff to 2.0.12 : FDFAnnotation_diff.txt) 

Note : my first attempt of a fix was to replace the XPath "*" expression with 
"node()", but for some reason, when I used this on a test case of

<p><![CDATA[A]]> B <span>C</span> D</p>

I would only obtain a NodeList containing the CDATASection, the "span" element 
and the final text node, but not the text node containing "B".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to