Very interesting. However I presume this would be something for Infinispan 6.0? Any thoughts on backward compat?
On 7 Feb 2013, at 04:53, Bela Ban <b...@redhat.com> wrote: > Hi Pedro, > > this is almost exactly what I wanted to implement ! > > Question: > - In RequestCorrelator.handleRequest(): > > protected void handleRequest(Message req, Header hdr) { > Object retval; > boolean threwException = false; > MessageRequest messageRequest = new MessageRequestImpl(req, hdr); > try { > retval=request_handler.handle(messageRequest); > } catch(Throwable t) { > retval=t; > threwException = true; > } > messageRequest.sendReply(retval, threwException);// <-- should be moved > up, or called only if threwException == true > } > > > , you create a MessageRequestImpl and pass it to the RequestHandler. The > request handler then dispatches the request (possibly) to a thread pool > and calls MessageRequestImpl.sendReply() when done. > > However, you also call MessageRequest.sendReply() before returning from > handleRequest(). I think this is an error, and > MessageRequest.sendReply() should be moved up inside the catch clause, > or be called only if threwException is true, so that we send a reply on > behalf of the RequestHandler if and only if it threw an exception (e.g. > before it dispatches the request to a thread pool). Otherwise, we'd send > a reply *twice* ! > > A few changes I have in mind (need to think about it more): > > - I want to leave the existing RequestHandler interface in place, so > current implementation continue to work > - There will be a new AsyncRequestHandler interface (possibly extending > RequestHandler, so an implementation can decide to implement both). The > RequestCorrelator needs to have either request_handler or > async_request_handler set. If the former is set, the logic is unchanged. > If the latter is set I'll invoke the async dispatching code > > - AsyncRequestHandler will look similar to the following: > void handle(Message request, Handback hb, boolean requires_response) > throws Throwable; > - Handback is an interface, and its impl contains header information > (e.g. request ID) > - Handback has a sendReply(Object reply, boolean is_exception) method > which sends a response (or exception) back to the caller > - When requires_response is false, the AsyncRequestHandler doesn't need > to invoke sendReply() > > - Message batching > - The above interfaces need to take message batching into account, e.g. > the ability to handle multiple requests concurrently (if they don't need > to be executed sequentially) > > > Thoughts ? > > > On 2/6/13 8:29 PM, Pedro Ruivo wrote: >> Hi all, >> >> Recently I came up with a solution that can help with the thread pool >> problem motivated by the following: >> >> In one of the first implementation of Total Order based commit >> protocol (TO), I had the requirement to move the PrepareCommand to >> another thread pool. In resume, the TO protocol delivers the >> PrepareCommand in a deterministic order in all the nodes, by a single >> deliver thread. To ensure consistency, if it delivers two conflicting >> transactions, the second transaction must wait until the first >> transaction finishes. However, blocking single deliver thread is not a >> good solution, because no more transaction can be validated, even if >> they don't conflict, while the thread is blocked. >> >> So, after creating a dependency graph (i.e. the second transaction >> knows that it must wait for the first transaction to finish) I move >> the PrepareCommand to another thread pool. Initially, I implemented a >> new command, called PrepareResponseCommand, that sends back the reply >> of the PrepareCommand. This solution has one disadvantage: I had to >> implement an ack collector in ISPN, while JGroups already offers me >> that with a synchronous communication. >> >> Recently (2 or 3 months ago) I implemented a simple modification in >> JGroups. In a more generic approach, it allows other threads to reply >> to a RPC request (such as the PrepareCommand). In the previous >> scenario, I replaced the PrepareResponseCommand and the ack collector >> implementation with a synchronous RPC invocation. I've used this >> solution in other issues in the Cloud-TM's ISPN fork. >> >> This solution is quite simple to implement and may help you to move >> the commands to ISPN internal thread pools. The modifications I've >> made are the following: >> >> 1) I added a new interface (see [1]) that is sent to the application >> instead of the Message object (see [4]). This interface contains the >> Message and it has a method to allow the application send the reply to >> that particular request. >> 2) I added a new object in [4] with the meaning: this return value is >> not the reply to the RPC request. This is the returned value that I >> return when I want to release the thread, because ISPN should return >> some object in the handle() method. Of course, I know that ISPN will >> invoke the sendReply() in some other place, otherwise, I will get a >> TimeoutException in the sender side. >> 3) Also I've changed the RequestCorrelator implementation to support >> the previous modifications (see [2] and [3]) >> >> In the Cloud-TM's ISPN fork I added a reference in the >> BaseCacheRpcCommand to [1] and added the method sendReply() [5]. In >> addition, I have the following uses cases working perfectly with this: >> >> 1) Total Order >> >> The scenario described in the beginning. The ParallelTotalOrderManager >> returns the DO_NOT_REPLY object when it receives a remote >> PrepareCommand (see [6] line 77). When the PrepareCommand is finally >> processed by the rest of the interceptor chain, it invokes the >> PreapreCommand.sendReply() (see [6] line 230). >> >> 2) GMU remote get >> >> GMU ensures SERIALIZABLE Isolation Level and the remote gets must >> ensure that the node that is processing the request has a minimum >> version available to ensure data consistency. The problem in ours >> initial implementation in large cluster, is the number of remote gets >> are very high and all the OOB are being blocked because of this condition. >> >> Same thing I've done with the ClusteredRemoteGet as you can in see >> [7], line 93 and 105. >> >> 3) GMU CommitCommand >> >> In GMU, the CommitCommand cannot be processed by any order. If T1 is >> serialized before T2, the commit command of T1 must be processed >> before the commit command of T2, even if the transactions do not have >> conflicts. This generates the same problem above and the same solution >> was adopted. >> >> I know that you have discussed some solutions and I would like to know >> what it is your opinion about what I've described. >> >> If you have questions, please let me know. >> >> Cheers, >> Pedro >> >> [1] >> https://github.com/pruivo/JGroups/blob/t_cloudtm/src/org/jgroups/blocks/MessageRequest.java >> >> <https://github.com/pruivo/JGroups/blob/t_cloudtm/src/org/jgroups/blocks/MessageRequest.java> >> [2] >> https://github.com/pruivo/JGroups/blob/t_cloudtm/src/org/jgroups/blocks/RequestCorrelator.java#L463 >> >> <https://github.com/pruivo/JGroups/blob/t_cloudtm/src/org/jgroups/blocks/RequestCorrelator.java#L463> >> [3] >> https://github.com/pruivo/JGroups/blob/t_cloudtm/src/org/jgroups/blocks/RequestCorrelator.java#L495 >> >> <https://github.com/pruivo/JGroups/blob/t_cloudtm/src/org/jgroups/blocks/RequestCorrelator.java#L495> >> [4] >> https://github.com/pruivo/JGroups/blob/t_cloudtm/src/org/jgroups/blocks/RequestHandler.java >> >> <https://github.com/pruivo/JGroups/blob/t_cloudtm/src/org/jgroups/blocks/RequestHandler.java> >> [5] >> https://github.com/pruivo/infinispan/blob/cloudtm_v1/core/src/main/java/org/infinispan/commands/remote/BaseRpcCommand.java#L75 >> >> <https://github.com/pruivo/infinispan/blob/cloudtm_v1/core/src/main/java/org/infinispan/commands/remote/BaseRpcCommand.java#L75> >> [6] >> https://github.com/pruivo/infinispan/blob/cloudtm_v1/core/src/main/java/org/infinispan/transaction/totalorder/ParallelTotalOrderManager.java >> [7] >> https://github.com/pruivo/infinispan/blob/cloudtm_v1/core/src/main/java/org/infinispan/commands/remote/GMUClusteredGetCommand.java >> >> >> On 2/3/13 11:35 AM, Bela Ban wrote: >>> If you send me the details, I'll take a look. I'm pretty busy with >>> message batching, so I can't promise next week, but soon... >>> >>> On 2/1/13 11:08 AM, Pedro Ruivo wrote: >>>> Hi, >>>> >>>> I had a similar problem when I tried GMU[1] in "large" cluster (40 vms), >>>> because the remote gets and the commit messages (I'm talking about ISPN >>>> commands) must wait for some conditions before being processed. >>>> >>>> I solved this problem by adding a feature in JGroups[2] that allows the >>>> request to be moved to another thread, releasing the OOB thread. The >>>> other thread will send the reply of the JGroups Request. Of course, I'm >>>> only moving commands that I know they can block. >>>> >>>> I can enter in some detail if you want =) >>>> >>>> Cheers, >>>> Pedro >>>> >>>> [1]http://www.gsd.inesc-id.pt/~romanop/files/papers/icdcs12.pdf >>>> [2] I would like to talk with Bela about this, because it makes my life >>>> easier to support total order in ISPN. I'll try to send an email this >>>> weekend =) >>>> >>>> On 01-02-2013 08:04, Radim Vansa wrote: >>>>> Hi guys, >>>>> >>>>> after dealing with the large cluster for a while I find the way how we >>>>> use OOB threads in synchronous configuration non-robust. >>>>> Imagine a situation where node which is not an owner of the key calls >>>>> PUT. Then the a RPC is called to the primary owner of that key, which >>>>> reroutes the request to all other owners and after these reply, it >>>>> replies back. >>>>> There are two problems: >>>>> 1) If we do simultanously X requests from non-owners to the primary owner >>>>> where X is OOB TP size, all the OOB threads are waiting for the responses >>>>> and there is no thread to process the OOB response and release the thread. >>>>> 2) Node A is primary owner of keyA, non-primary owner of keyB and B is >>>>> primary of keyB and non-primary of keyA. We got many requests for both >>>>> keyA and keyB from other nodes, therefore, all OOB threads from both >>>>> nodes call RPC to the non-primary owner but there's noone who could >>>>> process the request. >>>>> >>>>> While we wait for the requests to timeout, the nodes with depleted OOB >>>>> threadpools start suspecting all other nodes because they can't receive >>>>> heartbeats etc... >>>>> >>>>> You can say "increase your OOB tp size", but that's not always an option, >>>>> I have currently set it to 1000 threads and it's not enough. In the end, >>>>> I will be always limited by RAM and something tells me that even nodes >>>>> with few gigs of RAM should be able to form a huge cluster. We use 160 >>>>> hotrod worker threads in JDG, that means that 160 * clusterSize = 10240 >>>>> (64 nodes in my cluster) parallel requests can be executed, and if 10% >>>>> targets the same node with 1000 OOB threads, it stucks. It's about >>>>> scaling and robustness. >>>>> >>>>> Not that I'd have any good solution, but I'd really like to start a >>>>> discussion. >>>>> Thinking about it a bit, the problem is that blocking call (calling RPC >>>>> on primary owner from message handler) can block non-blocking calls (such >>>>> as RPC response or command that never sends any more messages). >>>>> Therefore, having a flag on message "this won't send another message" >>>>> could let the message be executed in different threadpool, which will be >>>>> never deadlocked. In fact, the pools could share the threads but the >>>>> non-blocking would have always a few threads spare. >>>>> It's a bad solution as maintaining which message could block in the other >>>>> node is really, really hard (we can be sure only in case of RPC >>>>> responses), especially when some locks come. I will welcome anything >>>>> better. >>>>> >>>>> Radim >>>>> >>>>> >>>>> ----------------------------------------------------------- >>>>> Radim Vansa >>>>> Quality Assurance Engineer >>>>> JBoss Datagrid >>>>> tel. +420532294559 ext. 62559 >>>>> >>>>> Red Hat Czech, s.r.o. >>>>> Brno, Purkyňova 99/71, PSČ 612 45 >>>>> Czech Republic >>>>> >>>>> >>>>> _______________________________________________ >>>>> infinispan-dev mailing list >>>>> infinispan-dev@lists.jboss.org >>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> infinispan-dev@lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev@lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > -- > Bela Ban, JGroups lead (http://www.jgroups.org) > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev@lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Manik Surtani ma...@jboss.org twitter.com/maniksurtani Platform Architect, JBoss Data Grid http://red.ht/data-grid _______________________________________________ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev