Bugs item #1007959, was opened at 2004-08-12 15:39 Message generated for change (Comment added) made by pbackx You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=376685&aid=1007959&group_id=22866
Category: Clustering Group: v3.2 Status: Open Resolution: None >Priority: 7 Submitted By: Peter Backx (pbackx) Assigned to: Scott M Stark (starksm) Summary: HA-JNDI extremely unreliable Initial Comment: os: linux 2.6 jdk: Sun hotspot 1.4.2_02 I have deployed 2 SLSBs on 2 different machines that are clustered (I tried both TCP and UDP) HA-JNDI is enabled but no farming. Both looking for a bean from the client and looking for a bean from another is extremely unreliable. It works in about 10% of the cases. I have been unable to make this bug 100% reproducable ... it appears to be entirely random. If I let one bean call another I get the following in my server logs which shows that the server is actually responding: 2004-08-12 15:31:08,759 DEBUG [org.jboss.ha.framework.server.HAPartitionImpl] dests=[asklepios30:7800 (additional data: 16 bytes), 10.10.12.1:7800 (additional data: 15 bytes)], method_call=HAJNDI.lookupLocally(Another), mode=2, timeout=60000 2004-08-12 15:31:08,759 DEBUG [org.jboss.ha.framework.server.HAPartitionImpl] real_dests=[asklepios30:7800 (additional data: 16 bytes), 10.10.12.1:7800 (additional data: 15 bytes)] 2004-08-12 15:31:09,027 DEBUG [org.jboss.ha.framework.server.HAPartitionImpl] responses: [sender=asklepios30:7800 (additional data: 16 bytes), retval=null, received=true, suspected=false] [sender=10.10.12.1:7800 (additional data: 15 bytes), [EMAIL PROTECTED], received=true, suspected=false] (I have 3 servers: 10.10.11.1 is doing the lookup and asklepios30 and 10.10.12.1 are replying ... don't mind the fairly complicated (for a cluster) network setup, the problem is the same if they are all on the same subnet and UDP multicasting) However by the time this value gets returned by Context.lookup it is null (most of the time). This bug was not in 3.2.2 but because I cannot assign the correct NodeAddress to partition members I have to use 3.2.5 I have also attached my cluster configuration file (@BIND_ADDR@ gets replaced by the correct address in an ANT script) I'm willing to debug the code myself because this problem is really urgent to me, however I hope some one can tell me through which calls the information flows between the HAPartitionImpl and the lookup return. Any pointers are appreciated. ---------------------------------------------------------------------- >Comment By: Peter Backx (pbackx) Date: 2004-08-16 14:23 Message: Logged In: YES user_id=91876 My initial assessment of the problem seems to have been right. org.jboss.ha.jndi.TreeHead was returning incorrect values. This is caused by one of 2 reasons: * Either a server that does not have a bean installed locally should not reply to a lookup request * Or TreeHead should check the entire list of return values instead of just returning the first value. I pressume 2 is the correct answer and with this info fixing the bug was easy: add a pair of accolades in the last for loop of TreeHead.lookup: for (int i = 0; i < rsp.size(); i++) { result = rsp.get(i); if( result != null ) { // <- added log.trace("_lookupLocally returned: " + result.getClass().getName()); if (!(result instanceof Exception)) return result; } // <- added } I have also attached a diff with what I believe is the correct CVS version (it's kinda hard to find the right files, I used tag JBoss_3_2_6_RC1) which might be easier to use. ---------------------------------------------------------------------- Comment By: Peter Backx (pbackx) Date: 2004-08-13 11:44 Message: Logged In: YES user_id=91876 Cluster config attached. One additional note, maybe less interesting, but I have tried different versions of JGroups: the one with JBoss 3.2.2 (doesn't work because some methods have been added or changed), the jgroups file with JBoss 3.2.5 (which I guess is JGroups 2.2.5 but there was some confusion I think), JGroups 2.2.5 and now the latest JGroups 2.2.6. ---------------------------------------------------------------------- Comment By: Peter Backx (pbackx) Date: 2004-08-12 19:45 Message: Logged In: YES user_id=91876 Test setup consists of 3 machines: asklepios30, 10.10.11.1 and 10.10.12.1. The trace is from 10.10.11.1. 2 very very basic stateless session beans are deployed (no farming, so they are only on 1 of the 3 servers): "Hello" on 10.10.11.1 "Another" on 10.10.12.1 Hello is called from a client on another machine. Hello calls Another through a HAJNDI lookup. It's this last lookup that is shown in the trace. The lookup for Hello by the client fails about 80% of the time, that for Another (by the Hello bean) pretty much always. If the client looks up Another it's pretty much the same scenario as Hello, maybe a higher succes rate, but I have only tried that a few times. Bindings are just what happens automatically when deploying the beans: Hello in the local JNDI tree of 10.10.11.1 and Another at 10.10.12.1. I'm not at work any more so if you need configuration files or other logs just ask and I'll send them tomorrow. Pretty much everything is like the default config, except for the cluster configuration, which apparently was not attached properly, I'll try that again in the morning. As it looks right now (very preliminary): either asklepios30 shouldn't reply with a null value or the HAJNDI at 10.10.11.1 is incorrectly interpreting the responses. ---------------------------------------------------------------------- Comment By: Scott M Stark (starksm) Date: 2004-08-12 18:53 Message: Logged In: YES user_id=175228 Describe the test scenario in terms of what is being looked up, by who, and where the binding exists in the cluster. If you have the test deployments attach them to the report. ---------------------------------------------------------------------- Comment By: Peter Backx (pbackx) Date: 2004-08-12 18:31 Message: Logged In: YES user_id=91876 Based on my trace, I started looking into the code and here's where am I right now: TreeHead.lookup first does a local lookup but does not find the name (which is to be expected) after that it does a callMethodOnCluster (on HAPartitionImpl) to do a clusterwide lookup. This call is forwarded to the RpcDispatcher (which is part of jgroups). This all works ok, since the last trace message (responses: ...) is from within RpcDispatcher's callRemoteMethods method. However something goes wrong in returning that response back to the initial lookup call. Two points: * One would expect at least some output from the last trace call in TreeHead's lookup. * Secondly it appears that the first result in the ArrayList is returned which in this case is the wrong entry (it is for server asklepios30 which does _not_ have the bean) The last remark could very well cause the pretty random behavior, but considering that there is no trace message I'm not sure yet if this is the actual bug. I will try to set-up my environment and compile my own jboss (or at least jbossha.jar) tomorrow (all info above is just from static code analysis) and see what happens. ---------------------------------------------------------------------- Comment By: Peter Backx (pbackx) Date: 2004-08-12 17:49 Message: Logged In: YES user_id=91876 Here's a trace of org.jboss.ha: 17:54:30,098 TRACE [TreeHead] lookup, name=Another 17:54:30,098 TRACE [TreeHead] lookupLocally, name=Another 17:54:30,098 TRACE [TreeHead] lookupLocally failed, name=Another javax.naming.NameNotFoundException: Another not bound at org.jnp.server.NamingServer.getBinding(NamingServer.java:495) at org.jnp.server.NamingServer.getBinding(NamingServer.java:503) at org.jnp.server.NamingServer.getObject(NamingServer.java:509) at org.jnp.server.NamingServer.lookup(NamingServer.java:282) at org.jboss.ha.jndi.TreeHead.lookupLocally(TreeHead.java:266) at org.jboss.ha.jndi.TreeHead.lookup(TreeHead.java:192) at org.jboss.ha.jndi.HAJNDI.lookup(HAJNDI.java:131) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.jboss.ha.framework.interfaces.HARMIClient.invoke(HARMIClient.java:187) at $Proxy47.lookup(Unknown Source) at org.jnp.interfaces.NamingContext.lookup(NamingContext.java:530) at org.jnp.interfaces.NamingContext.lookup(NamingContext.java:509) at javax.naming.InitialContext.lookup(Unknown Source) at be.ugent.pbackx.util.HAJNDIUtil.lookup_aroundBody6(HAJNDIUtil.java:29) at be.ugent.pbackx.util.HAJNDIUtil.HAJNDIlookup(HAJNDIUtil.java:29) at be.ugent.pbackx.bean1.HelloWorldBean.hello_aroundBody34(HelloWorldBean.java:40) at be.ugent.pbackx.bean1.HelloWorldBean$AjcClosure35.run(HelloWorldBean.java) at be.ugent.pbackx.aspect.TestAspect.ajc$around$be_ugent_pbackx_aspect_TestAspect$2$dd6145a5proceed(TestAspect.java) at be.ugent.pbackx.aspect.TestAspect.ajc$around$be_ugent_pbackx_aspect_TestAspect$2$dd6145a5(TestAspect.java:58) at be.ugent.pbackx.bean1.HelloWorldBean.hello(HelloWorldBean.java) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.jboss.ejb.StatelessSessionContainer$ContainerInterceptor.invoke(StatelessSessionContainer.java:683) at org.jboss.resource.connectionmanager.CachedConnectionInterceptor.invoke(CachedConnectionInterceptor.java:185) at org.jboss.ejb.plugins.AbstractTxInterceptor.invokeNext(AbstractTxInterceptor.java:84) at org.jboss.ejb.plugins.AbstractTxInterceptorBMT.invokeNext(AbstractTxInterceptorBMT.java:144) at org.jboss.ejb.plugins.TxInterceptorBMT.invoke(TxInterceptorBMT.java:62) at org.jboss.ejb.plugins.StatelessSessionInstanceInterceptor.invoke(StatelessSessionInstanceInterceptor.java:72) at org.jboss.ejb.plugins.SecurityInterceptor.invoke(SecurityInterceptor.java:120) at org.jboss.ejb.plugins.LogInterceptor.invoke(LogInterceptor.java:191) at org.jboss.ejb.plugins.CleanShutdownInterceptor.invoke(CleanShutdownInterceptor.java:264) at org.jboss.ejb.plugins.ProxyFactoryFinderInterceptor.invoke(ProxyFactoryFinderInterceptor.java:122) at org.jboss.ejb.StatelessSessionContainer.internalInvoke(StatelessSessionContainer.java:331) at org.jboss.ejb.Container.invoke(Container.java:723) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.jboss.mx.server.ReflectedDispatcher.dispatch(ReflectedDispatcher.java:60) at org.jboss.mx.server.Invocation.dispatch(Invocation.java:61) at org.jboss.mx.server.Invocation.dispatch(Invocation.java:53) at org.jboss.mx.server.Invocation.invoke(Invocation.java:86) at org.jboss.mx.server.AbstractMBeanInvoker.invoke(AbstractMBeanInvoker.java:185) at org.jboss.mx.server.MBeanServerImpl.invoke(MBeanServerImpl.java:473) at org.jboss.invocation.jrmp.server.JRMPInvokerHA.invoke(JRMPInvokerHA.java:163) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at sun.rmi.server.UnicastServerRef.dispatch(Unknown Source) at sun.rmi.transport.Transport$1.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Unknown Source) at sun.rmi.transport.tcp.TCPTransport.handleMessages(Unknown Source) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(Unknown Source) at java.lang.Thread.run(Unknown Source) 17:54:30,101 TRACE [TreeHead] calling lookupLocally(Another) on cluster 17:54:30,101 TRACE [DefaultPartition] callMethodOnCluster(true), objName=HAJNDI, methodName=lookupLocally, members=[asklepios30:7800 (additional data: 16 bytes), 10.10.12.1:7800 (additional data: 15 bytes)] 17:54:30,101 DEBUG [HAPartitionImpl] dests=[asklepios30:7800 (additional data: 16 bytes), 10.10.12.1:7800 (additional data: 15 bytes)], method_call=HAJNDI.lookupLocally(Another), mode=2, timeout=60000 17:54:30,103 DEBUG [HAPartitionImpl] real_dests=[asklepios30:7800 (additional data: 16 bytes), 10.10.12.1:7800 (additional data: 15 bytes)] 17:54:30,200 DEBUG [HAPartitionImpl] responses: [sender=asklepios30:7800 (additional data: 16 bytes), retval=null, received=true, suspected=false] [sender=10.10.12.1:7800 (additional data: 15 bytes), [EMAIL PROTECTED], received=true, suspected=false] ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=376685&aid=1007959&group_id=22866 ------------------------------------------------------- SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift. http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 _______________________________________________ JBoss-Development mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/jboss-development