Reason im asking is this, i didnt have time yesterday to go into detail.

I ran some load tests on my app in a 3 node cluster. One node in particular 
takes more of the load than the others and this node also hosts the 
TcpCacheServer. After a while lets call it node3 ran into difficulties with GC. 
The whole node froze for quite a while. Now the problem is this, after a while 
the other 2 nodes froze as well which is not what i expected. When node3 
recovered from its GC problems the other 2 nodes recovered as well.

Now there is 2 ways the other 2 nodes are connected to node3. One way is that 
it communicates via UDP, so i dont see this as a problem. The 2nd way is via 
the TcpDelegatingLoader. Now unfortunately these tests where running on our 
production or will be production system and i dont have write access and no-one 
was there from operations to take a dump for me on this so i cant say for sure 
what exactly happened.

I am using just the _get method of the delegator


  |  protected Map<Object, Object> _get(Fqn name) throws Exception
  |    {
  |       synchronized (out)
  |       {
  |          out.reset();
  | 
  |          out.writeByte(TcpCacheOperations.GET);
  |          out.writeObject(name);
  |          out.flush();
  |          Object retval = in.readObject();
  |          if (retval instanceof Exception)
  |          {
  |             throw (Exception) retval;
  |          }
  |          return (Map) retval;
  |       }
  |    }
  | 

However my theory on it is this, node 1 and node2 after a period of time after 
node3 became totally unresponsive, it couldnt take any more requests. Only 
after node3 recovered did the other 2 recover so it looked like the complete 
thread pool of node1 and node2 got exhausted, node1 and node2 were still 
receiving requests for data that is located on node3, after a while the thread 
pool gets exhausted because every thread is waiting for a lock on the 
outputstream, this will never happen because we have one thread who is asking 
node3 for data but is blocked forever because it doesnt timeout on the read.  
When node1 and node2 keep receiving requests for data on node3 then eventually 
every thread will end up waiting for a lock that will never come.

Like I said, its a theory, very hard to say what happened for sure without a 
dump but my original question stands, why isnt there a timeout on the read??

Thanks guys,
LL


View the original post : 
http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4196813#4196813

Reply to the post : 
http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&p=4196813
_______________________________________________
jboss-user mailing list
jboss-user@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/jboss-user

Reply via email to