Hi Vinh,

Could you add your test to the Jira so we can download and help out?

cheers,
Chris  

-----Original Message-----
From: Vinh Nguyen (vinguye2) [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, September 05, 2007 9:52 AM
To: [email protected]
Subject: RE: EMPTY_DOC thread stability issues 

Hi all,
After doing more testing, the issue is correctly because of the Xerces
limitation.
Both a Document and a Node/Element are *NOT* thread-safe.

The following shows that Document.importNode() is not thread-safe:

17:26:49,730 ERROR [STDERR] java.lang.NullPointerException
17:26:49,730 ERROR [STDERR]     at
org.apache.xerces.dom.CoreDocumentImpl.importNode(Unknown Source)
17:26:49,730 ERROR [STDERR]     at
org.apache.xerces.dom.CoreDocumentImpl.importNode(Unknown Source)
17:26:49,730 ERROR [STDERR]     at
org.apache.muse.ws.addressing.EndpointReference.<init>(EndpointReference
.java:186)
17:26:49,730 ERROR [STDERR]     at
org.apache.muse.ws.notification.impl.SimpleNotificationMessage.setProduc
erReference(SimpleNotificationMessage.java:209)
17:26:49,730 ERROR [STDERR]     at
org.apache.muse.ws.notification.impl.SimpleSubscriptionManager.publish(S
impleSubscriptionManager.java:256)

The following shows that reading the children of a Node/Element is not
thread-safe:

00:16:53,400 ERROR [STDERR] java.lang.NullPointerException
00:16:53,400 ERROR [STDERR]     at
org.apache.xerces.dom.ParentNode.nodeListItem(Unknown Source)
00:16:53,400 ERROR [STDERR]     at
org.apache.xerces.dom.ParentNode.item(Unknown Source)
00:16:53,400 ERROR [STDERR]     at
org.apache.muse.util.xml.XmlUtils.getAllElements(XmlUtils.java:883)
00:16:53,400 ERROR [STDERR]     at
org.apache.muse.util.xml.XmlUtils.getAllElements(XmlUtils.java:815)
00:16:53,400 ERROR [STDERR]     at
org.apache.muse.util.xml.XmlUtils.getAllElements(XmlUtils.java:791)
00:16:53,400 ERROR [STDERR]     at
org.apache.muse.util.xml.XmlUtils.getAllNamespaces(XmlUtils.java:974)
00:16:53,400 ERROR [STDERR]     at
org.apache.muse.util.xml.XmlUtils.getAllNamespaces(XmlUtils.java:977)
00:16:53,400 ERROR [STDERR]     at
org.apache.muse.util.xml.XmlUtils.getAllNamespaces(XmlUtils.java:940)
00:16:53,400 ERROR [STDERR]     at
org.apache.muse.ws.notification.impl.SimpleNotificationMessage.toXML(Sim
pleNotificationMessage.java:291)
00:16:53,415 ERROR [STDERR]     at
org.apache.muse.ws.notification.impl.SimpleNotificationMessage.toXML(Sim
pleNotificationMessage.java:239)
00:16:53,415 ERROR [STDERR]     at
org.apache.muse.ws.notification.remote.NotificationConsumerClient.notify
(NotificationConsumerClient.java:97)
00:16:53,415 ERROR [STDERR]     at
org.apache.muse.ws.notification.impl.SimpleSubscriptionManager.publish(S
impleSubscriptionManager.java:267)

To work around these problem, we should follow these rules:

1) Do not use XmlUtils.EMPTY_DOC in multi-thread processes.
2) Do not access the children of a Node/Element from multiple threads.
3) Do not create a Node/Element with its owner as XmlUtils.EMPTY_DOC, if
the node's children will be accessed.
4) Do not create a Node/Element with its owner as XmlUtils.EMPTY_DOC, if
the node itself can be accessed from multiple threads.

This most likely will affect a lot of Muse code.  As Chris pointed out,
Oliver's ThreadLocal solution for the EMPTY_DOC won't work.  EMPTY_DOC
can still be used, but any code which uses it must follow the rules
above.  Also, the DocumentBuilderFactory can't be a singleton either
since it's also not thread-safe.

NOTIFICATION FIXES:
The following updates finally fix the multi-thread notifications issue
for me:

1) Updated EndpointReference.java.  Changed all XmlUtils.EMPTY_DOC
references to XmlUtil.createDocument().  This is because a resource's
EPR and its Element representation can be accessed from multiple
threads, so all rules above apply.

2) Updated SimpleNotificationMessage.toXML().  Changed XmlUtil.EMPTY_DOC
reference to XmlUtil.createDocument().  This is because a root Element
is created and XmlUtils.getAllNamespaces(root) is called on it, so rule
#3 apply.

3) Updated NotificationConsumerClient.notify(NotificationMessage[]
messages) to not assume that messages[x].toXML() will return an Element
whose owner is EMPTY_DOC.
Changed these lines:
        Element notify =
XmlUtils.createElement(WsnConstants.NOTIFY_QNAME);        
        for (int n = 0; n < messages.length; ++n)
            notify.appendChild(messages[n].toXML(), true); To these
lines:
        Document doc = XmlUtils.createDocument();
        Element notify = XmlUtils.createElement(doc,
WsnConstants.NOTIFY_QNAME);
        for (int n = 0; n < messages.length; ++n)
            notify.appendChild(doc.importNode(messages[n].toXML(),
true));

So now, I no longer get any exceptions when generating notifications
simultaneously from multiple producers.

BUT, now I have another problem: notifications are still being lost
somehow, yet no errors appear at all!  I have to investigate this
further to see what's causing this problem.  If anyone encounters this
problem, please post to the group:)




-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Sent: Tuesday, September 04, 2007 2:50 PM
To: [email protected]
Subject: RE: EMPTY_DOC thread stability issues 

Hi All,

Joining this thread on Vinh's recommendation.

This issue I believe stems from both

https://issues.apache.org/jira/browse/XERCESJ-911

where large amounts of Muse use this functionality.  Regarding the
ThreadLocal solution it doesn't work to cache documents themeselves.
The sdk is very clear that only DocumentFactory.getInstance is thread
safe.  You can use ThreadLocal with DocumentBuilders however. (new
DocumentBuilders is quite expensive, but new documents are much cheaper)

The solution below (combined with TLS DocumentBuilders) should resolve
the issue with both WRONG_DOCUMENTs and the 911 issue from xerces.  I
read in the rest of the thread that Vinh had a simple test case with
multiple threads.  I'm happy to do a quick refactor in the code to put
these two approaches into the code base, if you could send me the test
case Vinh, I'll get started on it tomorrow.

NB whilst this "should work" it is still against the spirit of xerces
which is share nothing.  I'd also note that due to the nodecache
optimisation within Xerces, using NodeLists its just not possible to be
thread safe even for reads.

copied from muse-user:

Hi,

>From what I could work out, from within the list comments and the code,
the state is stored in the Document itself, and as cloneNode uses
Object.clone and then sets the doc it won't work.  Using importNode
helps a little (as it uses getFirstChild()/getNextSibling()), but it
just puts the problem to a later stage.

getAllElements just does the same, calls getChildNodes and then forces
the cache to be used.  Deleting the cache just stops the null for the
parent, it doesn't stop incorrect nodes being returned or race
conditions with other nulls.

The simple thing is to stop using getChildNodes, from what I can see in
the code there isn't a need for it.  The only place I've seen that
doesn't require all of the nodes anyway is in EndpointReference's
getNumberOfParameters, but that behaviour can be safely cached (its not
used directly in the project anyway).

Looking further at the use cases in Muse only the IsolationLayer
(because of the DeferredImpl) needs to call hasChildNodes() on the
document node, for it to force that synchronizeChildren be called (its
cached from then on in each node).  Then every other piece of code can
simply pointer chase with the getFirstChild()/getNextSibling() approach.
No synchronization required.

re using other jaxp's, the DOM itself makes no statement about even read
thread safety.  All of the jaxp impls suffer some form of threading
problem.  Considering all of the problems with fighting against
namespace problems (much worse IMO) it makes sense to stick with the
devil you know :-<.

Again for most of the xerces releases using the
getFirstChild()/getNextSibling() is a seamless dropin for the
getChildNodes problem.  Its a shame that the xerces guys are very much
against any form of thread safety (except application enforced).  Going
with the standard approach the only safe thing is to always serialize to
objects / keep the strings around, which would overly complicate the
code.

I'm willing to give it a try and send you patched libs to try out (I
don't have a test case for this yet) if its quick to reproduce, just let
me know.  If it works out I can raise a jira with the patches.

cheers,
Chris 


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to