If you send me the details, I'll take a look. I'm pretty busy with message batching, so I can't promise next week, but soon...
On 2/1/13 11:08 AM, Pedro Ruivo wrote: > Hi, > > I had a similar problem when I tried GMU[1] in "large" cluster (40 vms), > because the remote gets and the commit messages (I'm talking about ISPN > commands) must wait for some conditions before being processed. > > I solved this problem by adding a feature in JGroups[2] that allows the > request to be moved to another thread, releasing the OOB thread. The > other thread will send the reply of the JGroups Request. Of course, I'm > only moving commands that I know they can block. > > I can enter in some detail if you want =) > > Cheers, > Pedro > > [1] http://www.gsd.inesc-id.pt/~romanop/files/papers/icdcs12.pdf > [2] I would like to talk with Bela about this, because it makes my life > easier to support total order in ISPN. I'll try to send an email this > weekend =) > > On 01-02-2013 08:04, Radim Vansa wrote: >> Hi guys, >> >> after dealing with the large cluster for a while I find the way how we use >> OOB threads in synchronous configuration non-robust. >> Imagine a situation where node which is not an owner of the key calls PUT. >> Then the a RPC is called to the primary owner of that key, which reroutes >> the request to all other owners and after these reply, it replies back. >> There are two problems: >> 1) If we do simultanously X requests from non-owners to the primary owner >> where X is OOB TP size, all the OOB threads are waiting for the responses >> and there is no thread to process the OOB response and release the thread. >> 2) Node A is primary owner of keyA, non-primary owner of keyB and B is >> primary of keyB and non-primary of keyA. We got many requests for both keyA >> and keyB from other nodes, therefore, all OOB threads from both nodes call >> RPC to the non-primary owner but there's noone who could process the request. >> >> While we wait for the requests to timeout, the nodes with depleted OOB >> threadpools start suspecting all other nodes because they can't receive >> heartbeats etc... >> >> You can say "increase your OOB tp size", but that's not always an option, I >> have currently set it to 1000 threads and it's not enough. In the end, I >> will be always limited by RAM and something tells me that even nodes with >> few gigs of RAM should be able to form a huge cluster. We use 160 hotrod >> worker threads in JDG, that means that 160 * clusterSize = 10240 (64 nodes >> in my cluster) parallel requests can be executed, and if 10% targets the >> same node with 1000 OOB threads, it stucks. It's about scaling and >> robustness. >> >> Not that I'd have any good solution, but I'd really like to start a >> discussion. >> Thinking about it a bit, the problem is that blocking call (calling RPC on >> primary owner from message handler) can block non-blocking calls (such as >> RPC response or command that never sends any more messages). Therefore, >> having a flag on message "this won't send another message" could let the >> message be executed in different threadpool, which will be never deadlocked. >> In fact, the pools could share the threads but the non-blocking would have >> always a few threads spare. >> It's a bad solution as maintaining which message could block in the other >> node is really, really hard (we can be sure only in case of RPC responses), >> especially when some locks come. I will welcome anything better. >> >> Radim >> >> >> ----------------------------------------------------------- >> Radim Vansa >> Quality Assurance Engineer >> JBoss Datagrid >> tel. +420532294559 ext. 62559 >> >> Red Hat Czech, s.r.o. >> Brno, Purkyňova 99/71, PSČ 612 45 >> Czech Republic >> >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev@lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > _______________________________________________ > infinispan-dev mailing list > infinispan-dev@lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Bela Ban, JGroups lead (http://www.jgroups.org) _______________________________________________ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev