Reason im asking is this, i didnt have time yesterday to go into detail.
I ran some load tests on my app in a 3 node cluster. One node in particular takes more of the load than the others and this node also hosts the TcpCacheServer. After a while lets call it node3 ran into difficulties with GC. The whole node froze for quite a while. Now the problem is this, after a while the other 2 nodes froze as well which is not what i expected. When node3 recovered from its GC problems the other 2 nodes recovered as well. Now there is 2 ways the other 2 nodes are connected to node3. One way is that it communicates via UDP, so i dont see this as a problem. The 2nd way is via the TcpDelegatingLoader. Now unfortunately these tests where running on our production or will be production system and i dont have write access and no-one was there from operations to take a dump for me on this so i cant say for sure what exactly happened. I am using just the _get method of the delegator | protected Map<Object, Object> _get(Fqn name) throws Exception | { | synchronized (out) | { | out.reset(); | | out.writeByte(TcpCacheOperations.GET); | out.writeObject(name); | out.flush(); | Object retval = in.readObject(); | if (retval instanceof Exception) | { | throw (Exception) retval; | } | return (Map) retval; | } | } | However my theory on it is this, node 1 and node2 after a period of time after node3 became totally unresponsive, it couldnt take any more requests. Only after node3 recovered did the other 2 recover so it looked like the complete thread pool of node1 and node2 got exhausted, node1 and node2 were still receiving requests for data that is located on node3, after a while the thread pool gets exhausted because every thread is waiting for a lock on the outputstream, this will never happen because we have one thread who is asking node3 for data but is blocked forever because it doesnt timeout on the read. When node1 and node2 keep receiving requests for data on node3 then eventually every thread will end up waiting for a lock that will never come. Like I said, its a theory, very hard to say what happened for sure without a dump but my original question stands, why isnt there a timeout on the read?? Thanks guys, LL View the original post : http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4196813#4196813 Reply to the post : http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&p=4196813 _______________________________________________ jboss-user mailing list jboss-user@lists.jboss.org https://lists.jboss.org/mailman/listinfo/jboss-user