https://issues.apache.org/bugzilla/show_bug.cgi?id=56828
Bug ID: 56828
Summary: Cluster setup stopped working after 3 months in
production
Product: Tomcat 6
Version: 6.0.39
Hardware: Other
OS: Linux
Status: NEW
Severity: major
Priority: P2
Component: Cluster
Assignee: [email protected]
Reporter: [email protected]
We have J2EE war application deployed in a cluster setup having two nodes.
Tomcat 6.0.39 is installed in the both nodes having identical war deployed in
both. Its deployed in Amazon AWS environment, and the two ec2-nodes are beneath
an ELB , with session stickiness enabled for JSESSIONID. Also the two tomcat
nodes are session replication enabled too.
Following is Cluster config updated server.xml file:
=============================================================================
<Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster"
channelSendOptions="6" channelStartOptions="3">
<Manager className="org.apache.catalina.ha.session.DeltaManager"
expireSessionsOnShutdown="false" notifyListenersOnReplication="true" />
<Channel className="org.apache.catalina.tribes.group.GroupChannel">
<Receiver className="org.apache.catalina.tribes.transport.nio.NioReceiver"
autoBind="0" selectorTimeout="5000"
maxThreads="6"
address="x.x.x.x" port="4444" />
<Sender
className="org.apache.catalina.tribes.transport.ReplicationTransmitter">
<Transport
className="org.apache.catalina.tribes.transport.nio.PooledParallelSender"
timeout="60000"
keepAliveTime="10"
keepAliveCount="0"
/>
</Sender>
<Interceptor
className="org.apache.catalina.tribes.group.interceptors.TcpPingInterceptor"
staticOnly="true"/>
<Interceptor
className="org.apache.catalina.tribes.group.interceptors.TcpFailureDetector"/>
<Interceptor
className="org.apache.catalina.tribes.group.interceptors.StaticMembershipInterceptor">
<Member className="org.apache.catalina.tribes.membership.StaticMember"
host="x.x.x.x"
port="4444"
uniqueId="{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4}"/>
</Interceptor>
</Channel>
<Valve className="org.apache.catalina.ha.tcp.ReplicationValve" filter="" />
<Valve className="org.apache.catalina.ha.session.JvmRouteBinderValve" />
<ClusterListener
className="org.apache.catalina.ha.session.JvmRouteSessionIDBinderListener"/>
<ClusterListener
className="org.apache.catalina.ha.session.ClusterSessionListener"/>
</Cluster>
==========================================================================
Receiver ip, static member ip and unique id is different in the server.xml of
the other node in the cluster.
this was running fine in production environment for 3 months. Suddenly there
was
an exception logged like this :, and started coming up infinitely.
==================================================
Aug 6, 2014 12:00:39 AM
org.apache.catalina.tribes.group.interceptors.TcpFailureDetector
memberDisappeared
INFO: Received
memberDisappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://10.160.40.12:4444,10.160.40.12,4444,
alive=0,id={0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 }, payload={}, command={},
domain={}, ]] message. Will verify.
Aug 6, 2014 12:00:39 AM
org.apache.catalina.tribes.group.interceptors.TcpFailureDetector
memberDisappeared
INFO: Verification complete. Member still
alive[org.apache.catalina.tribes.membership.MemberImpl[tcp://10.160.40.12:4444,10.160.40.12,4444,
alive=0,id={0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 }, payload={}, command={},
domain={}, ]]
Aug 6, 2014 12:00:39 AM org.apache.catalina.ha.tcp.SimpleTcpCluster send
SEVERE: Unable to send message through cluster sender.
org.apache.catalina.tribes.ChannelException: Operation has timed out(60000
ms.).; Faulty members:tcp://10.160.40.12:4444;
at
org.apache.catalina.tribes.transport.nio.ParallelNioSender.sendMessage(ParallelNioSender.java:97)
at
org.apache.catalina.tribes.transport.nio.PooledParallelSender.sendMessage(PooledParallelSender.java:53)
at
org.apache.catalina.tribes.transport.ReplicationTransmitter.sendMessage(ReplicationTransmitter.java:80)
at
org.apache.catalina.tribes.group.ChannelCoordinator.sendMessage(ChannelCoordinator.java:76)
at
org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:75)
at
org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:75)
at
org.apache.catalina.tribes.group.interceptors.TcpFailureDetector.sendMessage(TcpFailureDetector.java:88)
at
org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:75)
at
org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:75)
at
org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:216)
at
org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:175)
at
org.apache.catalina.ha.tcp.SimpleTcpCluster.send(SimpleTcpCluster.java:817)
at
org.apache.catalina.ha.tcp.SimpleTcpCluster.sendClusterDomain(SimpleTcpCluster.java:791)
at
org.apache.catalina.ha.tcp.ReplicationValve.send(ReplicationValve.java:553)
at
org.apache.catalina.ha.tcp.ReplicationValve.sendMessage(ReplicationValve.java:537)
at
org.apache.catalina.ha.tcp.ReplicationValve.sendSessionReplicationMessage(ReplicationValve.java:519)
at
org.apache.catalina.ha.tcp.ReplicationValve.sendReplicationMessage(ReplicationValve.java:430)
at
org.apache.catalina.ha.tcp.ReplicationValve.invoke(ReplicationValve.java:363)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:861)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:606)
at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:662)
============================================================================
After this, the web application is not accessible, and we have to manually kill
the tomcat process in one node, thereby disabling the cluster.
We are unsure, how all of a sudden this is coming, and disabling application
access altogether. If there are any suggestion on remedy, pls provide the same.
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]