[ 
https://issues.apache.org/jira/browse/XERCESJ-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen White updated XERCESJ-1622:
-----------------------------------

    Attachment: Bug.java

Demonstraton of the bug.  The output relates to walking the tree in a 
depth-first manner, but prior to this the code iterates through a NodeList and 
then accesses it again in a manner that results in the same NodeListCache being 
added to the free list twice.

$ java -cp xerces-2_11_0/xercesImpl.jar:xerces-2_11_0/xml-apis.jar:. Bug
[root: null](0) using org.apache.xerces.dom.NodeListCache@1a0a87c
[p: null](0) using org.apache.xerces.dom.NodeListCache@1a0a87c
[p: null](1) using org.apache.xerces.dom.NodeListCache@1a0a87c
[p: null](2) using org.apache.xerces.dom.NodeListCache@1a0a87c
[root: null](1) using null
[p: null](0) using org.apache.xerces.dom.NodeListCache@1a0a87c
[p: null](1) using org.apache.xerces.dom.NodeListCache@1a0a87c
[p: null](2) using org.apache.xerces.dom.NodeListCache@1a0a87c
[root: null](2) using null
[p: null](0) using org.apache.xerces.dom.NodeListCache@1a0a87c
[p: null](1) using org.apache.xerces.dom.NodeListCache@1a0a87c
[p: null](2) using org.apache.xerces.dom.NodeListCache@1a0a87c
[root: null](3) using null
[p: null](0) using org.apache.xerces.dom.NodeListCache@1a0a87c
[p: null](1) using org.apache.xerces.dom.NodeListCache@1a0a87c
[p: null](2) using org.apache.xerces.dom.NodeListCache@1a0a87c

As you can see the calls to item(2) and item(3) on the root don't use a cache.  
Each time a call to item(...) on the root Node claims the cache it looses it 
again straight away, as the same cache is given to the iteration over the <p> 
element children.

The expected output is:

$ java -cp fixedXercesImpl.jar:xerces-2_11_0/xml-apis.jar:. Bug
[root: null](0) using org.apache.xerces.dom.NodeListCache@e44d85
[p: null](0) using org.apache.xerces.dom.NodeListCache@e44d85
[p: null](1) using org.apache.xerces.dom.NodeListCache@e44d85
[p: null](2) using org.apache.xerces.dom.NodeListCache@e44d85
[root: null](1) using null
[p: null](0) using org.apache.xerces.dom.NodeListCache@9d93be
[p: null](1) using org.apache.xerces.dom.NodeListCache@9d93be
[p: null](2) using org.apache.xerces.dom.NodeListCache@9d93be
[root: null](2) using org.apache.xerces.dom.NodeListCache@e44d85
[p: null](0) using org.apache.xerces.dom.NodeListCache@9d93be
[p: null](1) using org.apache.xerces.dom.NodeListCache@9d93be
[p: null](2) using org.apache.xerces.dom.NodeListCache@9d93be
[root: null](3) using org.apache.xerces.dom.NodeListCache@e44d85
[p: null](0) using org.apache.xerces.dom.NodeListCache@e44d85
[p: null](1) using org.apache.xerces.dom.NodeListCache@e44d85
[p: null](2) using org.apache.xerces.dom.NodeListCache@e44d85

Here the NodeListCache used for iterating over the children of <root> is lost 
between root.item(0) and root.item(1), exactly the same as in the first case.  
This is because it has been added to the free list, as Iteration is expected to 
have been completed.  The difference here is that once root.item(1) has 
reclaimed it, it stays claimed.  Calls to item(...) on the <p> elements get 
given a new cache.

> CoreDocumentImpl.freeNodeListCache can be called multiple times and results 
> in free list corruption
> ---------------------------------------------------------------------------------------------------
>
>                 Key: XERCESJ-1622
>                 URL: https://issues.apache.org/jira/browse/XERCESJ-1622
>             Project: Xerces2-J
>          Issue Type: Bug
>          Components: DOM (Level 3 Core)
>    Affects Versions: 2.11.0
>         Environment: Tested on Linux using Java 1.7.0_25 and 1.7.0_45 
> (x86_64), but I don't think this issue is constrained to a particular 
> environment
>            Reporter: Stephen White
>         Attachments: Bug.java
>
>
> Iterating through the children of a ParentNode using indexes (as in the code 
> below) rather than the nextSibling pointers would be very slow (O(n^2) in the 
> number of nodes) if the list is long.  Xerces speeds this up by using a 
> cache, fNodeListCache.
> NodeList l = doc.getDocumentElement().getChildNodes();
> for (int i = 0; i < l.getLength(); i++) {
>     l.item(i);
> }
> ParentNode.nodeListItem() contains some code that allows a NodeListCache to 
> be freed when iteration is complete.  
> Freeing a NodeListCache is done inside CoreDocumentImpl.freeNodeListCache 
> which adds the NodeListCache to a list of free caches ready for re-use by a 
> different ParentNode, avoiding object allocations.  A NodeListCache can still 
> be used after it is freed, as long as it hasn't been claimed by anything else 
> yet.
> If you iterate through the same child list twice then the NodeListCache will 
> be freed multiple times (during the second iteration firstAccess will be 
> false and, according to the logic in nodeListItem(), it'll get freed again).
> Unfortunately CoreDocumentImpl.freeNodeListCache adds NodeListCaches to the 
> free list unconditionally, so if it is called multiple times on the same 
> NodeListCache then the cache is added to the list multiple times.  This 
> creates a cycle in the linked list.  When this happens subsequent calls to 
> getNodeListCache always return the same cache.  This means that in a 
> depth-first traversal of the DOM the same cache gets used for children as for 
> their parents, and as the children use it the state needed for the parent is 
> lost.  The result is that things run as slowly as if the cache wasn't present 
> at all.
> We've got real code that triggers this situation when trying to build a Saxon 
> TinyTree from a Xerces DOM, as the Saxon DOMSender iterates using item 
> indexes rather than nextSibling pointers.
> Example code to follow.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to