Yeah, that would work if it is possible to break execution path into the FutureListener from the middle of interceptor stack - I am really not sure about that but as in current design no locks should be held when a RPC is called, it may be possible.
Let's see what someone more informed (Dan?) would think about that. Thanks, Bela Radim ----- Original Message ----- | From: "Bela Ban" <b...@redhat.com> | To: infinispan-dev@lists.jboss.org | Sent: Friday, February 1, 2013 9:39:43 AM | Subject: Re: [infinispan-dev] Threadpools in a large cluster | | It looks like the core problem is an incoming RPC-1 which triggers | another blocking RPC-2: the thread delivering RPC-1 is blocked | waiting | for the response from RPC-2, and can therefore not be used to serve | other requests for the duration of RPC-2. If RPC-2 takes a while, | e.g. | waiting to acquire a lock in the remote node, then it is clear that | the | thread pool will quickly exceed its max size. | | A simple solution would be to prevent invoking blocking RPCs *from | within* a received RPC. Let's take a look at an example: | - A invokes a blocking PUT-1 on B | - B forwards the request as blocking PUT-2 to C and D | - When PUT-2 returns and B gets the responses from C and D (or the | first | one to respond, don't know exactly how this is implemented), it sends | the response back to A (PUT-1 terminates now at A) | | We could change this to the following: | - A invokes a blocking PUT-1 on B | - B receives PUT-1. Instead of invoking a blocking PUT-2 on C and D, | it | does the following: | - B invokes PUT-2 and gets a future | - B adds itself as a FutureListener, and it also stores the | address of the original sender (A) | - When the FutureListener is invoked, B sends back the result | as a | response to A | - Whenever a member leaves the cluster, the corresponding futures are | cancelled and removed from the hashmaps | | This could probably be done differently (e.g. by sending asynchronous | messages and implementing a finite state machine), but the core of | the | solution is the same; namely to avoid having an incoming thread block | on | a sync RPC. | | Thoughts ? | | | | | On 2/1/13 9:04 AM, Radim Vansa wrote: | > Hi guys, | > | > after dealing with the large cluster for a while I find the way how | > we use OOB threads in synchronous configuration non-robust. | > Imagine a situation where node which is not an owner of the key | > calls PUT. Then the a RPC is called to the primary owner of that | > key, which reroutes the request to all other owners and after | > these reply, it replies back. | > There are two problems: | > 1) If we do simultanously X requests from non-owners to the primary | > owner where X is OOB TP size, all the OOB threads are waiting for | > the responses and there is no thread to process the OOB response | > and release the thread. | > 2) Node A is primary owner of keyA, non-primary owner of keyB and B | > is primary of keyB and non-primary of keyA. We got many requests | > for both keyA and keyB from other nodes, therefore, all OOB | > threads from both nodes call RPC to the non-primary owner but | > there's noone who could process the request. | > | > While we wait for the requests to timeout, the nodes with depleted | > OOB threadpools start suspecting all other nodes because they | > can't receive heartbeats etc... | > | > You can say "increase your OOB tp size", but that's not always an | > option, I have currently set it to 1000 threads and it's not | > enough. In the end, I will be always limited by RAM and something | > tells me that even nodes with few gigs of RAM should be able to | > form a huge cluster. We use 160 hotrod worker threads in JDG, that | > means that 160 * clusterSize = 10240 (64 nodes in my cluster) | > parallel requests can be executed, and if 10% targets the same | > node with 1000 OOB threads, it stucks. It's about scaling and | > robustness. | > | > Not that I'd have any good solution, but I'd really like to start a | > discussion. | > Thinking about it a bit, the problem is that blocking call (calling | > RPC on primary owner from message handler) can block non-blocking | > calls (such as RPC response or command that never sends any more | > messages). Therefore, having a flag on message "this won't send | > another message" could let the message be executed in different | > threadpool, which will be never deadlocked. In fact, the pools | > could share the threads but the non-blocking would have always a | > few threads spare. | > It's a bad solution as maintaining which message could block in the | > other node is really, really hard (we can be sure only in case of | > RPC responses), especially when some locks come. I will welcome | > anything better. | | -- | Bela Ban, JGroups lead (http://www.jgroups.org) | | _______________________________________________ | infinispan-dev mailing list | infinispan-dev@lists.jboss.org | https://lists.jboss.org/mailman/listinfo/infinispan-dev | _______________________________________________ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev