On Fri, Feb 1, 2013 at 12:13 PM, Manik Surtani <msurt...@redhat.com> wrote:
> > On 1 Feb 2013, at 09:39, Dan Berindei <dan.berin...@gmail.com> wrote: > > > Radim, do these problems happen with the HotRod server, or only with > memcached? > > > > HotRod requests handled by non-owners should be very rare, instead the > vast majority should be handled by the primary owner directly. So if this > happens with HotRod, we should focus on fixing the HotRod routing instead > of focusing on how to handle a large number of requests from non-owners. > > Well, even Hot Rod only optionally uses smart routing. Some client > libraries don't have this capability. > > True, and I meant to say that with memcached it should be much worse, but at least in Radim's tests I hope smart routing is enabled. > > > > That being said, even if a HotRod put request is handled by the primary > owner, it "generates" (numOwners - 1) extra OOB requests. So if you have > 160 HotRod worker threads per node, you can expect 4 * 160 OOB messages per > node. Multiply that by 2, because responses are OOB as well, and you can > get 1280 OOB messages before you even start reusing any HotRod worker > thread. Have you tried decreasing the number of HotRod workers? > > > > The thing is, our OOB thread pool can't use queueing because we'd get a > queue full of commit commands while all the OOB threads are waiting on keys > that those commit commands would unlock. As the OOB thread pool is full, we > discard messages, which I suspect slows things down quite a bit (especially > if it's a credit request/response message). So it may well be that a lower > number of HotRod working threads would perform better. > > > > On the other hand, why is increasing the number of OOB threads a > solution? With -Xss 512k, you can get 2000 threads with only 1 GB of > virtual memory (the actual used memory is probably even less, unless you're > using huge pages). AFAIK the Linux kernel doesn't break a sweat with 100000 > threads running, so having 2000 threads just hanging around, waiting for a > response, should be such a problem. > > > > I did chat with Bela (or was it a break-out session?) about moving > Infinispan's request processing to another thread pool during the team > meeting in Palma. That would leave the OOB thread pool free to receive > response messages, FD heartbeats, credit requests/responses etc. The > downside, I guess, is that each request would have to be passed to another > thread, and the context switch may slow things down a bit. But since the > new thread pool would be in Infinispan, we could even do tricks like > executing a commit/rollback directly on the OOB thread. > > Right. I always got the impression we were abusing the OOB pool. But in > the end, I think it makes sense (in JGroups) to separate a service thread > pool (for heartbeats, credits, etc) and an application thread pool (what > we'd use instead of OOB). This way you could even tune your service thread > pool to just have, say, 2 threads, and the application thread pool to 1000 > or whatever. > > A separate service pool would be good, but I think we could go further and treat ClusteredGet/Commit/Rollback commands the same way, because they can't block waiting for other commands to be processed. > > In the end, I just didn't feel that working on this was justified, > considering the number of critical bugs we had. But maybe now's the time to > start experimenting… > > > > > > > > On Fri, Feb 1, 2013 at 10:04 AM, Radim Vansa <rva...@redhat.com> wrote: > > Hi guys, > > > > after dealing with the large cluster for a while I find the way how we > use OOB threads in synchronous configuration non-robust. > > Imagine a situation where node which is not an owner of the key calls > PUT. Then the a RPC is called to the primary owner of that key, which > reroutes the request to all other owners and after these reply, it replies > back. > > There are two problems: > > 1) If we do simultanously X requests from non-owners to the primary > owner where X is OOB TP size, all the OOB threads are waiting for the > responses and there is no thread to process the OOB response and release > the thread. > > 2) Node A is primary owner of keyA, non-primary owner of keyB and B is > primary of keyB and non-primary of keyA. We got many requests for both keyA > and keyB from other nodes, therefore, all OOB threads from both nodes call > RPC to the non-primary owner but there's noone who could process the > request. > > > > While we wait for the requests to timeout, the nodes with depleted OOB > threadpools start suspecting all other nodes because they can't receive > heartbeats etc... > > > > You can say "increase your OOB tp size", but that's not always an > option, I have currently set it to 1000 threads and it's not enough. In the > end, I will be always limited by RAM and something tells me that even nodes > with few gigs of RAM should be able to form a huge cluster. We use 160 > hotrod worker threads in JDG, that means that 160 * clusterSize = 10240 (64 > nodes in my cluster) parallel requests can be executed, and if 10% targets > the same node with 1000 OOB threads, it stucks. It's about scaling and > robustness. > > > > Not that I'd have any good solution, but I'd really like to start a > discussion. > > Thinking about it a bit, the problem is that blocking call (calling RPC > on primary owner from message handler) can block non-blocking calls (such > as RPC response or command that never sends any more messages). Therefore, > having a flag on message "this won't send another message" could let the > message be executed in different threadpool, which will be never > deadlocked. In fact, the pools could share the threads but the non-blocking > would have always a few threads spare. > > It's a bad solution as maintaining which message could block in the > other node is really, really hard (we can be sure only in case of RPC > responses), especially when some locks come. I will welcome anything better. > > > > Radim > > > > > > ----------------------------------------------------------- > > Radim Vansa > > Quality Assurance Engineer > > JBoss Datagrid > > tel. +420532294559 ext. 62559 > > > > Red Hat Czech, s.r.o. > > Brno, Purkyňova 99/71, PSČ 612 45 > > Czech Republic > > > > > > _______________________________________________ > > infinispan-dev mailing list > > infinispan-dev@lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > _______________________________________________ > > infinispan-dev mailing list > > infinispan-dev@lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > -- > Manik Surtani > ma...@jboss.org > twitter.com/maniksurtani > > Platform Architect, JBoss Data Grid > http://red.ht/data-grid > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev@lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev >
_______________________________________________ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev