On 1 Feb 2013, at 08:04, Radim Vansa wrote: > Hi guys, > > after dealing with the large cluster for a while I find the way how we use > OOB threads in synchronous configuration non-robust. > Imagine a situation where node which is not an owner of the key calls PUT. > Then the a RPC is called to the primary owner of that key, which reroutes the > request to all other owners and after these reply, it replies back.
This delegation RPC pattern happens for non-transactional caches only. Do you have the same problem with transactional caches as well? > There are two problems: > 1) If we do simultanously X requests from non-owners to the primary owner > where X is OOB TP size, all the OOB threads are waiting for the responses and > there is no thread to process the OOB response and release the thread. > 2) Node A is primary owner of keyA, non-primary owner of keyB and B is > primary of keyB and non-primary of keyA. We got many requests for both keyA > and keyB from other nodes, therefore, all OOB threads from both nodes call > RPC to the non-primary owner but there's noone who could process the request. > > While we wait for the requests to timeout, the nodes with depleted OOB > threadpools start suspecting all other nodes because they can't receive > heartbeats etc... > > You can say "increase your OOB tp size", but that's not always an option, I > have currently set it to 1000 threads and it's not enough. In the end, I will > be always limited by RAM and something tells me that even nodes with few gigs > of RAM should be able to form a huge cluster. We use 160 hotrod worker > threads in JDG, that means that 160 * clusterSize = 10240 (64 nodes in my > cluster) parallel requests can be executed, and if 10% targets the same node > with 1000 OOB threads, it stucks. It's about scaling and robustness. > > Not that I'd have any good solution, but I'd really like to start a > discussion. > Thinking about it a bit, the problem is that blocking call (calling RPC on > primary owner from message handler) can block non-blocking calls (such as RPC > response or command that never sends any more messages). Therefore, having a > flag on message "this won't send another message" could let the message be > executed in different threadpool, which will be never deadlocked. In fact, > the pools could share the threads but the non-blocking would have always a > few threads spare. > It's a bad solution as maintaining which message could block in the other > node is really, really hard (we can be sure only in case of RPC responses), > especially when some locks come. I will welcome anything better. > > Radim > > > ----------------------------------------------------------- > Radim Vansa > Quality Assurance Engineer > JBoss Datagrid > tel. +420532294559 ext. 62559 > > Red Hat Czech, s.r.o. > Brno, Purkyňova 99/71, PSČ 612 45 > Czech Republic > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev@lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org)
_______________________________________________ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev