Bugs item #1007959, was opened at 2004-08-12 06:39
Message generated for change (Comment added) made by starksm
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=376685&aid=1007959&group_id=22866

Category: Clustering
Group: v3.2
>Status: Closed
>Resolution: Fixed
Priority: 7
Submitted By: Peter Backx (pbackx)
Assigned to: Scott M Stark (starksm)
Summary: HA-JNDI extremely unreliable

Initial Comment:
os: linux 2.6
jdk: Sun hotspot 1.4.2_02

I have deployed 2 SLSBs on 2 different machines that
are clustered (I tried both TCP and UDP) HA-JNDI is
enabled but no farming.

Both looking for a bean from the client and looking for
a bean from another is extremely unreliable. It works
in about 10% of the cases. I have been unable to make
this bug 100% reproducable ... it appears to be
entirely random.

If I let one bean call another I get the following in
my server logs which shows that the server is actually
responding:

2004-08-12 15:31:08,759 DEBUG
[org.jboss.ha.framework.server.HAPartitionImpl]
dests=[asklepios30:7800 (additional data: 16 bytes),
10.10.12.1:7800 (additional data: 15 bytes)],
method_call=HAJNDI.lookupLocally(Another), mode=2,
timeout=60000
2004-08-12 15:31:08,759 DEBUG
[org.jboss.ha.framework.server.HAPartitionImpl]
real_dests=[asklepios30:7800 (additional data: 16
bytes), 10.10.12.1:7800 (additional data: 15 bytes)]
2004-08-12 15:31:09,027 DEBUG
[org.jboss.ha.framework.server.HAPartitionImpl]
responses: [sender=asklepios30:7800 (additional data:
16 bytes), retval=null, received=true, suspected=false]
[sender=10.10.12.1:7800 (additional data: 15 bytes),
[EMAIL PROTECTED],
received=true, suspected=false]

(I have 3 servers: 10.10.11.1 is doing the lookup and
asklepios30 and 10.10.12.1 are replying ... don't mind
the fairly complicated (for a cluster) network setup,
the problem is the same if they are all on the same
subnet and UDP multicasting)

However by the time this value gets returned by
Context.lookup it is null (most of the time).

This bug was not in 3.2.2 but because I cannot assign
the correct NodeAddress to partition members I have to
use 3.2.5

I have also attached my cluster configuration file
(@BIND_ADDR@ gets replaced by the correct address in an
ANT script)

I'm willing to debug the code myself because this
problem is really urgent to me, however I hope some one
can tell me through which calls the information flows
between the HAPartitionImpl and the lookup return. Any
pointers are appreciated.

----------------------------------------------------------------------

>Comment By: Scott M Stark (starksm)
Date: 2004-08-22 16:24

Message:
Logged In: YES 
user_id=175228

The null values are due to exceptions from the nodes without
the binding and should be filtered out. This has been fixed
for 3.2.6RC2.

----------------------------------------------------------------------

Comment By: Peter Backx (pbackx)
Date: 2004-08-16 05:23

Message:
Logged In: YES 
user_id=91876

My initial assessment of the problem seems to have been
right. org.jboss.ha.jndi.TreeHead was returning incorrect
values. This is caused by one of 2 reasons:
* Either a server that does not have a bean installed
locally should not reply to a lookup request
* Or TreeHead should check the entire list of return values
instead of just returning the first value.

I pressume 2 is the correct answer and with this info fixing
the bug was easy: add a pair of accolades in the last for
loop of TreeHead.lookup:

        for (int i = 0; i < rsp.size(); i++)
        {
           result = rsp.get(i);
           if( result != null ) 
           { // <- added
              log.trace("_lookupLocally returned: " +
result.getClass().getName());
              if (!(result instanceof Exception))
                 return result;
           } // <- added
          }

I have also attached a diff with what I believe is the
correct CVS version (it's kinda hard to find the right
files, I used tag JBoss_3_2_6_RC1) which might be easier to use.


----------------------------------------------------------------------

Comment By: Peter Backx (pbackx)
Date: 2004-08-13 02:44

Message:
Logged In: YES 
user_id=91876

Cluster config attached.
One additional note, maybe less interesting, but I have
tried different versions of JGroups: the one with JBoss
3.2.2 (doesn't work because some methods have been added or
changed), the jgroups file with JBoss 3.2.5 (which I guess
is JGroups 2.2.5 but there was some confusion I think),
JGroups 2.2.5 and now the latest JGroups 2.2.6.

----------------------------------------------------------------------

Comment By: Peter Backx (pbackx)
Date: 2004-08-12 10:45

Message:
Logged In: YES 
user_id=91876

Test setup consists of 3 machines: asklepios30, 10.10.11.1
and 10.10.12.1. The trace is from 10.10.11.1.
2 very very basic stateless session beans are deployed (no
farming, so they are only on 1 of the 3 servers):
"Hello" on 10.10.11.1
"Another" on 10.10.12.1

Hello is called from a client on another machine. Hello
calls Another through a HAJNDI lookup. It's this last lookup
that is shown in the trace.

The lookup for Hello by the client fails about 80% of the
time, that for Another (by the Hello bean) pretty much
always. If the client looks up Another it's pretty much the
same scenario as Hello, maybe a higher succes rate, but I
have only tried that a few times.

Bindings are just what happens automatically when deploying
the beans: Hello in the local JNDI tree of 10.10.11.1 and
Another at 10.10.12.1.

I'm not at work any more so if you need configuration files
or other logs just ask and I'll send them tomorrow. Pretty
much everything is like the default config, except for the
cluster configuration, which apparently was not attached
properly, I'll try that again in the morning.

As it looks right now (very preliminary): either asklepios30
shouldn't reply with a null value or the HAJNDI at
10.10.11.1 is incorrectly interpreting the responses.

----------------------------------------------------------------------

Comment By: Scott M Stark (starksm)
Date: 2004-08-12 09:53

Message:
Logged In: YES 
user_id=175228

Describe the test scenario in terms of what is being looked
up, by who, and where the binding exists in the cluster. If
you have the test deployments attach them to the report.

----------------------------------------------------------------------

Comment By: Peter Backx (pbackx)
Date: 2004-08-12 09:31

Message:
Logged In: YES 
user_id=91876

Based on my trace, I started looking into the code and
here's where am I right now:

TreeHead.lookup first does a local lookup but does not find
the name (which is to be expected) after that it does a
callMethodOnCluster (on HAPartitionImpl) to do a clusterwide
lookup. This call is forwarded to the RpcDispatcher (which
is part of jgroups).

This all works ok, since the last trace message (responses:
...) is from within RpcDispatcher's callRemoteMethods method.

However something goes wrong in returning that response back
to the initial lookup call. Two points:
* One would expect at least some output from the last trace
call in TreeHead's lookup.
* Secondly it appears that the first result in the ArrayList
is returned which in this case is the wrong entry (it is for
server asklepios30 which does _not_ have the bean)

The last remark could very well cause the pretty random
behavior, but considering that there is no trace message I'm
not sure yet if this is the actual bug.

I will try to set-up my environment and compile my own jboss
(or at least jbossha.jar) tomorrow (all info above is just
from static code analysis) and see what happens.

----------------------------------------------------------------------

Comment By: Peter Backx (pbackx)
Date: 2004-08-12 08:49

Message:
Logged In: YES 
user_id=91876

Here's a trace of org.jboss.ha:

17:54:30,098 TRACE [TreeHead] lookup, name=Another
17:54:30,098 TRACE [TreeHead] lookupLocally, name=Another
17:54:30,098 TRACE [TreeHead] lookupLocally failed, name=Another
javax.naming.NameNotFoundException: Another not bound
        at
org.jnp.server.NamingServer.getBinding(NamingServer.java:495)
        at
org.jnp.server.NamingServer.getBinding(NamingServer.java:503)
        at
org.jnp.server.NamingServer.getObject(NamingServer.java:509)
        at
org.jnp.server.NamingServer.lookup(NamingServer.java:282)
        at
org.jboss.ha.jndi.TreeHead.lookupLocally(TreeHead.java:266)
        at org.jboss.ha.jndi.TreeHead.lookup(TreeHead.java:192)
        at org.jboss.ha.jndi.HAJNDI.lookup(HAJNDI.java:131)
        at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at
org.jboss.ha.framework.interfaces.HARMIClient.invoke(HARMIClient.java:187)
        at $Proxy47.lookup(Unknown Source)
        at
org.jnp.interfaces.NamingContext.lookup(NamingContext.java:530)
        at
org.jnp.interfaces.NamingContext.lookup(NamingContext.java:509)
        at javax.naming.InitialContext.lookup(Unknown Source)
        at
be.ugent.pbackx.util.HAJNDIUtil.lookup_aroundBody6(HAJNDIUtil.java:29)
        at
be.ugent.pbackx.util.HAJNDIUtil.HAJNDIlookup(HAJNDIUtil.java:29)
        at
be.ugent.pbackx.bean1.HelloWorldBean.hello_aroundBody34(HelloWorldBean.java:40)
        at
be.ugent.pbackx.bean1.HelloWorldBean$AjcClosure35.run(HelloWorldBean.java)
        at
be.ugent.pbackx.aspect.TestAspect.ajc$around$be_ugent_pbackx_aspect_TestAspect$2$dd6145a5proceed(TestAspect.java)
        at
be.ugent.pbackx.aspect.TestAspect.ajc$around$be_ugent_pbackx_aspect_TestAspect$2$dd6145a5(TestAspect.java:58)
        at
be.ugent.pbackx.bean1.HelloWorldBean.hello(HelloWorldBean.java)
        at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at
org.jboss.ejb.StatelessSessionContainer$ContainerInterceptor.invoke(StatelessSessionContainer.java:683)
        at
org.jboss.resource.connectionmanager.CachedConnectionInterceptor.invoke(CachedConnectionInterceptor.java:185)
        at
org.jboss.ejb.plugins.AbstractTxInterceptor.invokeNext(AbstractTxInterceptor.java:84)
        at
org.jboss.ejb.plugins.AbstractTxInterceptorBMT.invokeNext(AbstractTxInterceptorBMT.java:144)
        at
org.jboss.ejb.plugins.TxInterceptorBMT.invoke(TxInterceptorBMT.java:62)
        at
org.jboss.ejb.plugins.StatelessSessionInstanceInterceptor.invoke(StatelessSessionInstanceInterceptor.java:72)
        at
org.jboss.ejb.plugins.SecurityInterceptor.invoke(SecurityInterceptor.java:120)
        at
org.jboss.ejb.plugins.LogInterceptor.invoke(LogInterceptor.java:191)
        at
org.jboss.ejb.plugins.CleanShutdownInterceptor.invoke(CleanShutdownInterceptor.java:264)
        at
org.jboss.ejb.plugins.ProxyFactoryFinderInterceptor.invoke(ProxyFactoryFinderInterceptor.java:122)
        at
org.jboss.ejb.StatelessSessionContainer.internalInvoke(StatelessSessionContainer.java:331)
        at org.jboss.ejb.Container.invoke(Container.java:723)
        at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at
org.jboss.mx.server.ReflectedDispatcher.dispatch(ReflectedDispatcher.java:60)
        at
org.jboss.mx.server.Invocation.dispatch(Invocation.java:61)
        at
org.jboss.mx.server.Invocation.dispatch(Invocation.java:53)
        at
org.jboss.mx.server.Invocation.invoke(Invocation.java:86)
        at
org.jboss.mx.server.AbstractMBeanInvoker.invoke(AbstractMBeanInvoker.java:185)
        at
org.jboss.mx.server.MBeanServerImpl.invoke(MBeanServerImpl.java:473)
        at
org.jboss.invocation.jrmp.server.JRMPInvokerHA.invoke(JRMPInvokerHA.java:163)
        at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at sun.rmi.server.UnicastServerRef.dispatch(Unknown
Source)
        at sun.rmi.transport.Transport$1.run(Unknown Source)
        at
java.security.AccessController.doPrivileged(Native Method)
        at sun.rmi.transport.Transport.serviceCall(Unknown
Source)
        at
sun.rmi.transport.tcp.TCPTransport.handleMessages(Unknown
Source)
        at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(Unknown
Source)
        at java.lang.Thread.run(Unknown Source)
17:54:30,101 TRACE [TreeHead] calling lookupLocally(Another)
on cluster
17:54:30,101 TRACE [DefaultPartition]
callMethodOnCluster(true), objName=HAJNDI,
methodName=lookupLocally, members=[asklepios30:7800
(additional data: 16 bytes), 10.10.12.1:7800 (additional
data: 15 bytes)]
17:54:30,101 DEBUG [HAPartitionImpl] dests=[asklepios30:7800
(additional data: 16 bytes), 10.10.12.1:7800 (additional
data: 15 bytes)], method_call=HAJNDI.lookupLocally(Another),
mode=2, timeout=60000
17:54:30,103 DEBUG [HAPartitionImpl]
real_dests=[asklepios30:7800 (additional data: 16 bytes),
10.10.12.1:7800 (additional data: 15 bytes)]
17:54:30,200 DEBUG [HAPartitionImpl] responses:
[sender=asklepios30:7800 (additional data: 16 bytes),
retval=null, received=true, suspected=false]
[sender=10.10.12.1:7800 (additional data: 15 bytes),
[EMAIL PROTECTED],
received=true, suspected=false]


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=376685&aid=1007959&group_id=22866


-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
JBoss-Development mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/jboss-development

Reply via email to