Hi Vinh, Could you add your test to the Jira so we can download and help out?
cheers, Chris -----Original Message----- From: Vinh Nguyen (vinguye2) [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 05, 2007 9:52 AM To: [email protected] Subject: RE: EMPTY_DOC thread stability issues Hi all, After doing more testing, the issue is correctly because of the Xerces limitation. Both a Document and a Node/Element are *NOT* thread-safe. The following shows that Document.importNode() is not thread-safe: 17:26:49,730 ERROR [STDERR] java.lang.NullPointerException 17:26:49,730 ERROR [STDERR] at org.apache.xerces.dom.CoreDocumentImpl.importNode(Unknown Source) 17:26:49,730 ERROR [STDERR] at org.apache.xerces.dom.CoreDocumentImpl.importNode(Unknown Source) 17:26:49,730 ERROR [STDERR] at org.apache.muse.ws.addressing.EndpointReference.<init>(EndpointReference .java:186) 17:26:49,730 ERROR [STDERR] at org.apache.muse.ws.notification.impl.SimpleNotificationMessage.setProduc erReference(SimpleNotificationMessage.java:209) 17:26:49,730 ERROR [STDERR] at org.apache.muse.ws.notification.impl.SimpleSubscriptionManager.publish(S impleSubscriptionManager.java:256) The following shows that reading the children of a Node/Element is not thread-safe: 00:16:53,400 ERROR [STDERR] java.lang.NullPointerException 00:16:53,400 ERROR [STDERR] at org.apache.xerces.dom.ParentNode.nodeListItem(Unknown Source) 00:16:53,400 ERROR [STDERR] at org.apache.xerces.dom.ParentNode.item(Unknown Source) 00:16:53,400 ERROR [STDERR] at org.apache.muse.util.xml.XmlUtils.getAllElements(XmlUtils.java:883) 00:16:53,400 ERROR [STDERR] at org.apache.muse.util.xml.XmlUtils.getAllElements(XmlUtils.java:815) 00:16:53,400 ERROR [STDERR] at org.apache.muse.util.xml.XmlUtils.getAllElements(XmlUtils.java:791) 00:16:53,400 ERROR [STDERR] at org.apache.muse.util.xml.XmlUtils.getAllNamespaces(XmlUtils.java:974) 00:16:53,400 ERROR [STDERR] at org.apache.muse.util.xml.XmlUtils.getAllNamespaces(XmlUtils.java:977) 00:16:53,400 ERROR [STDERR] at org.apache.muse.util.xml.XmlUtils.getAllNamespaces(XmlUtils.java:940) 00:16:53,400 ERROR [STDERR] at org.apache.muse.ws.notification.impl.SimpleNotificationMessage.toXML(Sim pleNotificationMessage.java:291) 00:16:53,415 ERROR [STDERR] at org.apache.muse.ws.notification.impl.SimpleNotificationMessage.toXML(Sim pleNotificationMessage.java:239) 00:16:53,415 ERROR [STDERR] at org.apache.muse.ws.notification.remote.NotificationConsumerClient.notify (NotificationConsumerClient.java:97) 00:16:53,415 ERROR [STDERR] at org.apache.muse.ws.notification.impl.SimpleSubscriptionManager.publish(S impleSubscriptionManager.java:267) To work around these problem, we should follow these rules: 1) Do not use XmlUtils.EMPTY_DOC in multi-thread processes. 2) Do not access the children of a Node/Element from multiple threads. 3) Do not create a Node/Element with its owner as XmlUtils.EMPTY_DOC, if the node's children will be accessed. 4) Do not create a Node/Element with its owner as XmlUtils.EMPTY_DOC, if the node itself can be accessed from multiple threads. This most likely will affect a lot of Muse code. As Chris pointed out, Oliver's ThreadLocal solution for the EMPTY_DOC won't work. EMPTY_DOC can still be used, but any code which uses it must follow the rules above. Also, the DocumentBuilderFactory can't be a singleton either since it's also not thread-safe. NOTIFICATION FIXES: The following updates finally fix the multi-thread notifications issue for me: 1) Updated EndpointReference.java. Changed all XmlUtils.EMPTY_DOC references to XmlUtil.createDocument(). This is because a resource's EPR and its Element representation can be accessed from multiple threads, so all rules above apply. 2) Updated SimpleNotificationMessage.toXML(). Changed XmlUtil.EMPTY_DOC reference to XmlUtil.createDocument(). This is because a root Element is created and XmlUtils.getAllNamespaces(root) is called on it, so rule #3 apply. 3) Updated NotificationConsumerClient.notify(NotificationMessage[] messages) to not assume that messages[x].toXML() will return an Element whose owner is EMPTY_DOC. Changed these lines: Element notify = XmlUtils.createElement(WsnConstants.NOTIFY_QNAME); for (int n = 0; n < messages.length; ++n) notify.appendChild(messages[n].toXML(), true); To these lines: Document doc = XmlUtils.createDocument(); Element notify = XmlUtils.createElement(doc, WsnConstants.NOTIFY_QNAME); for (int n = 0; n < messages.length; ++n) notify.appendChild(doc.importNode(messages[n].toXML(), true)); So now, I no longer get any exceptions when generating notifications simultaneously from multiple producers. BUT, now I have another problem: notifications are still being lost somehow, yet no errors appear at all! I have to investigate this further to see what's causing this problem. If anyone encounters this problem, please post to the group:) -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Tuesday, September 04, 2007 2:50 PM To: [email protected] Subject: RE: EMPTY_DOC thread stability issues Hi All, Joining this thread on Vinh's recommendation. This issue I believe stems from both https://issues.apache.org/jira/browse/XERCESJ-911 where large amounts of Muse use this functionality. Regarding the ThreadLocal solution it doesn't work to cache documents themeselves. The sdk is very clear that only DocumentFactory.getInstance is thread safe. You can use ThreadLocal with DocumentBuilders however. (new DocumentBuilders is quite expensive, but new documents are much cheaper) The solution below (combined with TLS DocumentBuilders) should resolve the issue with both WRONG_DOCUMENTs and the 911 issue from xerces. I read in the rest of the thread that Vinh had a simple test case with multiple threads. I'm happy to do a quick refactor in the code to put these two approaches into the code base, if you could send me the test case Vinh, I'll get started on it tomorrow. NB whilst this "should work" it is still against the spirit of xerces which is share nothing. I'd also note that due to the nodecache optimisation within Xerces, using NodeLists its just not possible to be thread safe even for reads. copied from muse-user: Hi, >From what I could work out, from within the list comments and the code, the state is stored in the Document itself, and as cloneNode uses Object.clone and then sets the doc it won't work. Using importNode helps a little (as it uses getFirstChild()/getNextSibling()), but it just puts the problem to a later stage. getAllElements just does the same, calls getChildNodes and then forces the cache to be used. Deleting the cache just stops the null for the parent, it doesn't stop incorrect nodes being returned or race conditions with other nulls. The simple thing is to stop using getChildNodes, from what I can see in the code there isn't a need for it. The only place I've seen that doesn't require all of the nodes anyway is in EndpointReference's getNumberOfParameters, but that behaviour can be safely cached (its not used directly in the project anyway). Looking further at the use cases in Muse only the IsolationLayer (because of the DeferredImpl) needs to call hasChildNodes() on the document node, for it to force that synchronizeChildren be called (its cached from then on in each node). Then every other piece of code can simply pointer chase with the getFirstChild()/getNextSibling() approach. No synchronization required. re using other jaxp's, the DOM itself makes no statement about even read thread safety. All of the jaxp impls suffer some form of threading problem. Considering all of the problems with fighting against namespace problems (much worse IMO) it makes sense to stick with the devil you know :-<. Again for most of the xerces releases using the getFirstChild()/getNextSibling() is a seamless dropin for the getChildNodes problem. Its a shame that the xerces guys are very much against any form of thread safety (except application enforced). Going with the standard approach the only safe thing is to always serialize to objects / keep the strings around, which would overly complicate the code. I'm willing to give it a try and send you patched libs to try out (I don't have a test case for this yet) if its quick to reproduce, just let me know. If it works out I can raise a jira with the patches. cheers, Chris --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
