Hi All,

Any piece of code that uses nodelists will be thread unsafe as soon as a 
document is shared across threads.  It doesn't matter if the original EPR has a 
seperate document or not.  The problem with TLSing the Document itself is that 
the standard says its not thread safe either, all that happens is push the 
problem further down the line until two threads try to nodelist the same 
document.  

point 1) affects this directly, the problem is that both the getChildNodes 
optimisation and the importing (as Vinh just showed) aren't thread safe.

For EPRs at least the only way to be truly thread safe is to store the actual 
information in raw xml strings (or objects) and re-parse it every time you use 
an epr.  Its expensive to then reparse but at least safe.

point 2) is exactly the same problem just pushed further down the line.  The 
resource EPR with its seperate document is then read simultaneously by seperate 
threads wishing to publish events leading to same issue.  
getFirstChild/getNextSibling help with this but only as another hack tied to a 
specific version of xerces.

Please upload the test case, the more test cases we can use the better.

cheers,
Chris

-----Original Message-----
From: Oliver Waeldrich [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, September 05, 2007 12:52 PM
To: [email protected]
Subject: Re: RE: EMPTY_DOC thread stability issues


Hi all,

I did some additional testing related to the EMPTY_DOC problem. Therefore I set 
up a test where multiple threads created new EPRs based on a set of existing 
(shared) EPRs. Such shared EPRs are resource EPRs (e.g. notification producer 
EPR). In this test two different kinds of exceptions are thrown:

1.)
java.lang.NullPointerException
    at org.apache.xerces.dom.ParentNode.nodeListItem(Unknown Source)
    at org.apache.xerces.dom.ParentNode.item(Unknown Source)
    at org.apache.muse.util.xml.XmlUtils.getAllElements(XmlUtils.java:932)
    at org.apache.muse.util.xml.XmlUtils.getElement(XmlUtils.java:1144)
    at org.apache.muse.util.xml.XmlUtils.getElement(XmlUtils.java:1119)
    at 
org.apache.muse.test.case1.TestEndpointReference.initializeFromXML(TestEndpointReference.java:560)
    at 
org.apache.muse.test.case1.TestEndpointReference.<init>(TestEndpointReference.java:193)
    at org.apache.muse.test.case1.TestThread.run(TestThread.java:60)

This exception relates to the following code parts in the EPR implementation:
   
        ...
        Document doc = XmlUtils.EMPTY_DOC;
        _xml = XmlUtils.createElement(doc, typeName);

        NodeList children = copy.toXML().getChildNodes();
        int length = children.getLength();
        
        for (int n = 0; n < length; ++n)
        {
            Node next = doc.importNode(children.item(n), true);
            _xml.appendChild(next);
        }

        try
        {
            initializeFromXML();
        }
        ...
        
The reason for the exception is that multiple threads import nodes into the 
SHARED document. Later on in initializeFromXML() the different element nodes 
within the SHARED document are traversed simulations, which leads to the 
threading issue. The usage of ThreadLocal WILL eliminate such errors, since no 
SHARED document is used between the different threads to create XML elements. 
In fact each thread has its own document that is used for this. This is shown 
in the sample thread local.

2.)

java.lang.NullPointerException
    at org.apache.xerces.dom.ParentNode.nodeListItem(Unknown Source)
    at org.apache.xerces.dom.ParentNode.item(Unknown Source)
    at 
org.apache.muse.test.case1.TestEndpointReference.<init>(TestEndpointReference.java:187)
    at org.apache.muse.test.case1.TestThread.run(TestThread.java:60)

The second exception has a similar background. In the test case the initial 
EPRs are created in the class Test via the
    
        TestEndpointReference(URI address, QName typeName)
         
constructor. This constructor creates the internal XML representation by using 
EMPTY_DOC. The original EPR is copied by the following code:

    NodeList children = copy.toXML().getChildNodes();
    int length = children.getLength();
    
    for (int n = 0; n < length; ++n)
    {
        Node next = doc.importNode(children.item(n), true);
        _xml.appendChild(next);
    }

The copy.toXML() method simply returns the XML element created with EMPTY_DOC. 
Here again multiple thread try to traverse the same node
concurrently which leads to the exception.        
        
Therefore, in my point of view we have two different problems:

     
 1.) Multiple threads must not use the same target document to import nodes.
  
     This means XML elements, that are only used within a request/response
     cycle (e.g. for de-/serialization) can use ThreadLocal mechnisms.
     It is not necessary to create new documents all the way for this purpose.
     
 2.) XML elements, that have a lifetime longer than a request/response
     cycle, must be created by using new (not shared) documents. This I already
     pointed out in my first mail. This is especially true for EPRs, that are
     

I attached a number of test cases that illustrate my point.

To Rich:

  >>The default implementation creates a thread for processing the
  >>notify request, and then immediately releases the thread that called it.
  
  You are right: in this special case the objects need to be copied
  by using a separate document, since here the message objects leave the
  request/response cycle. This has to be taken into account.
  
Finally, the usage of createDocument() compared to the solution I presented 
adds a performance overhead of more than 50%. This can simply be verified with 
the tests I attached. However, I admit that my solution requires more attention 
by the programmers than simply replacing EMPTY_DOC, but I think
it is worth.   

Interestingly, I still found that sometimes exceptions will occur, even when 
using new documents for every XML operation. This is really rare, but in my 
opinion this results from the “still” shared usage of DOM objects within 
the EPR (the _xml element). Therefore, concurrent read access to this instance 
variable can still occour. To solve the problem with the EPR completely, this 
instance variable should perhaps be removed from the EPR implementation.
  
Regards,
Oliver

p.s. unfortunatelly I could not attach the test files, since this is rejected 
by the mailer deamon. However, I could provide the test in the JIRA.
--
Psssst! Schon vom neuen GMX MultiMessenger gehört?
Der kanns mit allen: http://www.gmx.net/de/go/multimessenger

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to