[
https://issues.apache.org/jira/browse/PDFBOX-3471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15433376#comment-15433376
]
Maruan Sahyoun commented on PDFBOX-3471:
----------------------------------------
{{root.removeChild(node)}} modifies the NodeList i.e. it no longer contains the
number of entries it had when the loop was entered so the next iteration goes
to the wrong index. A propose fix is something like this
{code}
private void removeComments(Node root)
{
// will hold the nodes which are to be deleted
List<Node> forDeletion = new ArrayList<Node>();
NodeList nl = root.getChildNodes();
if (nl.getLength()<=1)
{
// There is only one node so we do not remove it
return;
}
for (int i = 0; i < nl.getLength(); i++)
{
Node node = nl.item(i);
if (node instanceof Comment)
{
// comments to be deleted
forDeletion.add(node);
}
else if (node instanceof Text)
{
if (node.getTextContent().trim().isEmpty())
{
// TODO: verify why this is necessary
// empty text nodes to be deleted
forDeletion.add(node);
}
}
else if (node instanceof Element)
{
// clean child
removeComments(node);
} // else do nothing
}
// now remove the child nodes
for (Node node : forDeletion)
{
root.removeChild(node);
}
}
{code}
which makes sure that all nodes are visited and the removal is done outside the
loop.
> XMP parsing fails if XMP contain comments
> -----------------------------------------
>
> Key: PDFBOX-3471
> URL: https://issues.apache.org/jira/browse/PDFBOX-3471
> Project: PDFBox
> Issue Type: Bug
> Components: XmpBox
> Affects Versions: 2.0.2
> Reporter: Petras
> Attachments: PDFBOX-3471_XmpParsingIgnoringComments.patch
>
>
> DomXmpParser parser fails with such correct XMP:
> {code:xml}
> <?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?>
> <x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.1.0-jc003">
> <!-- PDF/A standarto versija (1 ar 2) ir suderinamumo lygmuo (A, B ar U)
> -->
> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
> <rdf:Description rdf:about = ""
> xmlns:pdfaid = "http://www.aiim.org/pdfa/ns/id/">
> <pdfaid:part>1</pdfaid:part>
> <pdfaid:conformance>B</pdfaid:conformance>
> </rdf:Description>
> </rdf:RDF>
> </x:xmpmeta>
> <?xpacket end="w"?>
> {code}
> DomXmpParser finds comment node and fails:
> {code}
> org.apache.xmpbox.xml.XmpParsingException: More than one element found in
> x:xmpmeta
> at
> org.apache.xmpbox.xml.DomXmpParser.findDescriptionsParent(DomXmpParser.java:750)
> at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:183)
> at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:111)
> ...
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]