Re: [infinispan-dev] Threadpools in a large cluster

2013-02-08 Thread Mircea Markus

On 7 Feb 2013, at 11:36, Manik Surtani wrote:

 On 7 Feb 2013, at 11:29, Bela Ban b...@redhat.com wrote:
 
 I meant to say that this is up to you guys to decide inwhich Infinispan 
 release this will be used, but it will be available in JGroups 3.3.
 
 What's the strategy/schedule for 6 and 5.3 anyway ?
 
 5.3 will be relatively quick, just an incremental release mainly targeting 
 reviewing and putting in a few contribs on the queue. E.g., we're hoping for 
 5.3 to be feature-complete in no more than 2 months.
 
 6 is where we get to break API - specifically with the new packaging, etc.
 
 But this seems like a pretty important feature/fix so it may make sense to 
 get this into 5.3.

Pedro - please correct me if I'm wrong.
This feature (or the alternative Pedro has already implemented in his branch) 
is used be the TO code that we'll integrate in 5.3. Se in order to have TO in 
ISPN we need these jgroups improvements in. Pedro - are there any other 
modifications in *your branch* of jgroups that require integration into jgroups 
upstream from a TO perspective? 

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)




___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-08 Thread Pedro Ruivo



On 2/8/13 12:26 PM, Mircea Markus wrote:


On 7 Feb 2013, at 11:36, Manik Surtani wrote:

On 7 Feb 2013, at 11:29, Bela Ban b...@redhat.com 
mailto:b...@redhat.com wrote:



I meant to say that this is up to you guys to decide inwhich Infinispan
release this will be used, but it will be available in JGroups 3.3.

What's the strategy/schedule for 6 and 5.3 anyway ?


5.3 will be relatively quick, just an incremental release mainly 
targeting reviewing and putting in a few contribs on the queue. E.g., 
we're hoping for 5.3 to be feature-complete in no more than 2 months.


6 is where we get to break API - specifically with the new packaging, 
etc.


But this seems like a pretty important feature/fix so it may make 
sense to get this into 5.3.


Pedro - please correct me if I'm wrong.
This feature (or the alternative Pedro has already implemented in his 
branch) is used be the TO code that we'll integrate in 5.3. Se in 
order to have TO in ISPN we need these jgroups improvements in.
yes, I'll need it. otherwise, I have to create another RpcCommand to 
send back the response generated by the other thread
Pedro - are there any other modifications in *your branch* of jgroups 
that require integration into jgroups upstream from a TO perspective?

No, just this...

Cheers
Pedro


Cheers,
--
Mircea Markus
Infinispan lead (www.infinispan.org http://www.infinispan.org)





___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-08 Thread Pedro Ruivo
Hi Bela,

Alpha1 does not exist in GitHub. Instead you have 3.3.0.Alpha2 =)

So, can I start to use it?

Thanks
Pedro

On 2/8/13 2:31 PM, Bela Ban wrote:
 On 2/8/13 3:23 PM, Mircea Markus wrote:
 Hi Bela,

 Do you think we can have this particular improvement in JGroups by 18
 Feb? That week Pedro, Dan and I are going to start integrating TO
 protocols in ISPN and this is a requirement for it.
 It is done and part of the alpha1 release...



 On 8 Feb 2013, at 12:41, Pedro Ruivo wrote:
 On 2/8/13 12:26 PM, Mircea Markus wrote:
 On 7 Feb 2013, at 11:36, Manik Surtani wrote:

 On 7 Feb 2013, at 11:29, Bela Banb...@redhat.com
 mailto:b...@redhat.com  wrote:

 I meant to say that this is up to you guys to decide inwhich
 Infinispan
 release this will be used, but it will be available in JGroups 3.3.

 What's the strategy/schedule for 6 and 5.3 anyway ?
 5.3 will be relatively quick, just an incremental release mainly
 targeting reviewing and putting in a few contribs on the queue.
 E.g., we're hoping for 5.3 to be feature-complete in no more than 2
 months.

 6 is where we get to break API - specifically with the new
 packaging, etc.

 But this seems like a pretty important feature/fix so it may make
 sense to get this into 5.3.
 Pedro - please correct me if I'm wrong.
 This feature (or the alternative Pedro has already implemented in
 his branch) is used be the TO code that we'll integrate in 5.3. Se
 in order to have TO in ISPN we need these jgroups improvements in.
 yes, I'll need it. otherwise, I have to create another RpcCommand to
 send back the response generated by the other thread
 Cheers,
 -- 
 Mircea Markus
 Infinispan lead (www.infinispan.orghttp://www.infinispan.org)






 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] Threadpools in a large cluster

2013-02-08 Thread Mircea Markus

On 8 Feb 2013, at 14:31, Bela Ban wrote:

 
 On 2/8/13 3:23 PM, Mircea Markus wrote:
 Hi Bela,
 
 Do you think we can have this particular improvement in JGroups by 18 
 Feb? That week Pedro, Dan and I are going to start integrating TO 
 protocols in ISPN and this is a requirement for it.
 
 It is done and part of the alpha1 release...


 thanks Bela!

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)




___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-08 Thread Mircea Markus
I've created a JIRA in the scope of Infinispan 5.3 to track the thread pool 
improvements discussed in this email thread:
ISPN-2808 - Make Infinispan use its own thread pool for sending OOB messages in 
order to avoid thread deadlocks

On 8 Feb 2013, at 14:44, Mircea Markus wrote:

 
 On 8 Feb 2013, at 14:31, Bela Ban wrote:
 
 
 On 2/8/13 3:23 PM, Mircea Markus wrote:
 Hi Bela,
 
 Do you think we can have this particular improvement in JGroups by 18 
 Feb? That week Pedro, Dan and I are going to start integrating TO 
 protocols in ISPN and this is a requirement for it.
 
 It is done and part of the alpha1 release...
 
 
  thanks Bela!
 
 Cheers,
 -- 
 Mircea Markus
 Infinispan lead (www.infinispan.org)
 
 
 
 
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)




___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-08 Thread Bela Ban
yes, I mis-tagged it... :-(
On 2/8/13 3:33 PM, Pedro Ruivo wrote:
 Hi Bela,

 Alpha1 does not exist in GitHub. Instead you have 3.3.0.Alpha2 =)

 So, can I start to use it?

 Thanks
 Pedro

 On 2/8/13 2:31 PM, Bela Ban wrote:
 On 2/8/13 3:23 PM, Mircea Markus wrote:
 Hi Bela,

 Do you think we can have this particular improvement in JGroups by 18
 Feb? That week Pedro, Dan and I are going to start integrating TO
 protocols in ISPN and this is a requirement for it.
 It is done and part of the alpha1 release...



 On 8 Feb 2013, at 12:41, Pedro Ruivo wrote:
 On 2/8/13 12:26 PM, Mircea Markus wrote:
 On 7 Feb 2013, at 11:36, Manik Surtani wrote:

 On 7 Feb 2013, at 11:29, Bela Banb...@redhat.com
 mailto:b...@redhat.com  wrote:

 I meant to say that this is up to you guys to decide inwhich
 Infinispan
 release this will be used, but it will be available in JGroups 3.3.

 What's the strategy/schedule for 6 and 5.3 anyway ?
 5.3 will be relatively quick, just an incremental release mainly
 targeting reviewing and putting in a few contribs on the queue.
 E.g., we're hoping for 5.3 to be feature-complete in no more than 2
 months.

 6 is where we get to break API - specifically with the new
 packaging, etc.

 But this seems like a pretty important feature/fix so it may make
 sense to get this into 5.3.
 Pedro - please correct me if I'm wrong.
 This feature (or the alternative Pedro has already implemented in
 his branch) is used be the TO code that we'll integrate in 5.3. Se
 in order to have TO in ISPN we need these jgroups improvements in.
 yes, I'll need it. otherwise, I have to create another RpcCommand to
 send back the response generated by the other thread
 Cheers,
 -- 
 Mircea Markus
 Infinispan lead (www.infinispan.orghttp://www.infinispan.org)






 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

-- 
Bela Ban, JGroups lead (http://www.jgroups.org)

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] Threadpools in a large cluster

2013-02-08 Thread Bela Ban
excellent, and we'll update it with our findings from the London meeting
On 2/8/13 5:19 PM, Mircea Markus wrote:
 I've created a JIRA in the scope of Infinispan 5.3 to track the thread 
 pool improvements discussed in this email thread:
 ISPN-2808 - Make Infinispan use its own thread pool for sending OOB 
 messages in order to avoid thread deadlocks

 On 8 Feb 2013, at 14:44, Mircea Markus wrote:


 On 8 Feb 2013, at 14:31, Bela Ban wrote:


 On 2/8/13 3:23 PM, Mircea Markus wrote:
 Hi Bela,

 Do you think we can have this particular improvement in JGroups by 18
 Feb? That week Pedro, Dan and I are going to start integrating TO
 protocols in ISPN and this is a requirement for it.

 It is done and part of the alpha1 release...

  thanks Bela!

 Cheers,
 -- 
 Mircea Markus
 Infinispan lead (www.infinispan.org http://www.infinispan.org/)




 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org mailto:infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

 Cheers,
 -- 
 Mircea Markus
 Infinispan lead (www.infinispan.org http://www.infinispan.org)






 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

-- 
Bela Ban, JGroups lead (http://www.jgroups.org)

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] Threadpools in a large cluster

2013-02-07 Thread Dan Berindei
On Thu, Feb 7, 2013 at 6:53 AM, Bela Ban b...@redhat.com wrote:

 Hi Pedro,

 this is almost exactly what I wanted to implement !

 Question:
 - In RequestCorrelator.handleRequest():

 protected void handleRequest(Message req, Header hdr) {
 Object retval;
 boolean threwException = false;
 MessageRequest messageRequest = new MessageRequestImpl(req, hdr);
 try {
 retval=request_handler.handle(messageRequest);
 } catch(Throwable t) {
 retval=t;
 threwException = true;
 }
 messageRequest.sendReply(retval, threwException);// -- should be moved
 up, or called only if threwException == true
 }


 , you create a MessageRequestImpl and pass it to the RequestHandler. The
 request handler then dispatches the request (possibly) to a thread pool
 and calls MessageRequestImpl.sendReply() when done.

 However, you also call MessageRequest.sendReply() before returning from
 handleRequest(). I think this is an error, and
 MessageRequest.sendReply() should be moved up inside the catch clause,
 or be called only if threwException is true, so that we send a reply on
 behalf of the RequestHandler if and only if it threw an exception (e.g.
 before it dispatches the request to a thread pool). Otherwise, we'd send
 a reply *twice* !

 A few changes I have in mind (need to think about it more):

 - I want to leave the existing RequestHandler interface in place, so
 current implementation continue to work
 - There will be a new AsyncRequestHandler interface (possibly extending
 RequestHandler, so an implementation can decide to implement both). The
 RequestCorrelator needs to have either request_handler or
 async_request_handler set. If the former is set, the logic is unchanged.
 If the latter is set I'll invoke the async dispatching code

 - AsyncRequestHandler will look similar to the following:
 void handle(Message request, Handback hb, boolean requires_response)
 throws Throwable;
 - Handback is an interface, and its impl contains header information
 (e.g. request ID)
 - Handback has a sendReply(Object reply, boolean is_exception) method
 which sends a response (or exception) back to the caller


+1 for a new interface. TBH I hadn't read the RequestCorrelator code, so I
had assumed it was already asynchronous, and only RpcDispatcher was
synchronous.

I'm not so sure about the Handback name, how about calling it Response
instead?



 - When requires_response is false, the AsyncRequestHandler doesn't need
 to invoke sendReply()


I think this should be the other way around: when requires_response is
true, the AsyncRequestHandler *can* invoke sendReply(), but is not required
to (the call will just time out on the caller node); when requires_response
is false, invoking sendReply() should throw an exception.


- Message batching
 - The above interfaces need to take message batching into account, e.g.
 the ability to handle multiple requests concurrently (if they don't need
 to be executed sequentially)


You mean handle() is still going to be called once for each request, but
second handle() call won't necessarily wait for the first message's
sendReply() call?

Is this going to apply only to OOB messages, or to regular messages as
well? I think I'd prefer it if it only applied to OOB messages, otherwise
we'd have to implement our own ordering for regular/async commands.




 Thoughts ?




 On 2/6/13 8:29 PM, Pedro Ruivo wrote:
  Hi all,
 
  Recently I came up with a solution that can help with the thread pool
  problem motivated by the following:
 
  In one of the first implementation of Total Order based commit
  protocol (TO), I had the requirement to move the PrepareCommand to
  another thread pool. In resume, the TO protocol delivers the
  PrepareCommand in a deterministic order in all the nodes, by a single
  deliver thread. To ensure consistency, if it delivers two conflicting
  transactions, the second transaction must wait until the first
  transaction finishes. However, blocking single deliver thread is not a
  good solution, because no more transaction can be validated, even if
  they don't conflict, while the thread is blocked.
 
  So, after creating a dependency graph (i.e. the second transaction
  knows that it must wait for the first transaction to finish) I move
  the PrepareCommand to another thread pool. Initially, I implemented a
  new command, called PrepareResponseCommand, that sends back the reply
  of the PrepareCommand. This solution has one disadvantage: I had to
  implement an ack collector in ISPN, while JGroups already offers me
  that with a synchronous communication.
 
  Recently (2 or 3 months ago) I implemented a simple modification in
  JGroups. In a more generic approach, it allows other threads to reply
  to a RPC request (such as the PrepareCommand). In the previous
  scenario, I replaced the PrepareResponseCommand and the ack collector
  implementation with a synchronous RPC invocation. I've used this
  solution in other issues in the Cloud-TM's ISPN fork.
 
  This solution is quite simple to 

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-07 Thread Manik Surtani

On 7 Feb 2013, at 10:30, Bela Ban b...@redhat.com wrote:

 No, this *could* be in Infinispan 5.3 (it will be in JGroups 3.3).
 
 A MessageDispatcher (RpcDispatcher) instance picks which dispatching 
 mechanism it wants to use. I have RequestHandler (the default) and a 
 sub-interface AsyncRequestHandler. MessageDispatcher (and subclass 
 RpcDispatcher) will implement both (they already do implement 
 handle(Message)).
 
 So now a user simply sets an attribute in MessageDispatcher to select 
 async dispatching (sync is the default).
 
 What needs to be done from the Infinispan side is to override 
 handle(Message,Response), and implement handling of requests in a thread 
 pool. The current behavior (inherited from MessageDispatcher) will be to 
 call handle(Message) which CommandAwareDispatcher already implements.
 
 The Infinispan side can be done in 10 minutes. However, the real work 
 will be the dispatching of incoming requests to threads from the 
 Infinispan thread pool, and the impl of the thread pool, which doesn't 
 exist yet. I guess preserving ordering of requests will be the important 
 part.

This isn't particularly hard either ... 

 
 If you have your own thread pool, sync RPCs can be sent without OOB, but 
 the handle() method in CommandAwareDispatcher can decide, based on the 
 mode (e.g. sync) whether to queue the request behind other requests, or 
 whether to invoke it directly.
 
 I wanted to implement this quickly in JGroups so the hooks are in place 
 for Infinispan to use them later, once a pool has been implemented.
 
 
 On 2/7/13 10:56 AM, Manik Surtani wrote:
 Very interesting.  However I presume this would be something for Infinispan 
 6.0?  Any thoughts on backward compat?
 
 On 7 Feb 2013, at 04:53, Bela Ban b...@redhat.com wrote:
 
 Hi Pedro,
 
 this is almost exactly what I wanted to implement !
 
 Question:
 - In RequestCorrelator.handleRequest():
 
 protected void handleRequest(Message req, Header hdr) {
 Object retval;
 boolean threwException = false;
 MessageRequest messageRequest = new MessageRequestImpl(req, hdr);
 try {
 retval=request_handler.handle(messageRequest);
 } catch(Throwable t) {
 retval=t;
 threwException = true;
 }
 messageRequest.sendReply(retval, threwException);// -- should be moved
 up, or called only if threwException == true
 }
 
 
 , you create a MessageRequestImpl and pass it to the RequestHandler. The
 request handler then dispatches the request (possibly) to a thread pool
 and calls MessageRequestImpl.sendReply() when done.
 
 However, you also call MessageRequest.sendReply() before returning from
 handleRequest(). I think this is an error, and
 MessageRequest.sendReply() should be moved up inside the catch clause,
 or be called only if threwException is true, so that we send a reply on
 behalf of the RequestHandler if and only if it threw an exception (e.g.
 before it dispatches the request to a thread pool). Otherwise, we'd send
 a reply *twice* !
 
 A few changes I have in mind (need to think about it more):
 
 - I want to leave the existing RequestHandler interface in place, so
 current implementation continue to work
 - There will be a new AsyncRequestHandler interface (possibly extending
 RequestHandler, so an implementation can decide to implement both). The
 RequestCorrelator needs to have either request_handler or
 async_request_handler set. If the former is set, the logic is unchanged.
 If the latter is set I'll invoke the async dispatching code
 
 - AsyncRequestHandler will look similar to the following:
 void handle(Message request, Handback hb, boolean requires_response)
 throws Throwable;
 - Handback is an interface, and its impl contains header information
 (e.g. request ID)
 - Handback has a sendReply(Object reply, boolean is_exception) method
 which sends a response (or exception) back to the caller
 - When requires_response is false, the AsyncRequestHandler doesn't need
 to invoke sendReply()
 
 - Message batching
 - The above interfaces need to take message batching into account, e.g.
 the ability to handle multiple requests concurrently (if they don't need
 to be executed sequentially)
 
 
 Thoughts ?
 
 
 On 2/6/13 8:29 PM, Pedro Ruivo wrote:
 Hi all,
 
 Recently I came up with a solution that can help with the thread pool
 problem motivated by the following:
 
 In one of the first implementation of Total Order based commit
 protocol (TO), I had the requirement to move the PrepareCommand to
 another thread pool. In resume, the TO protocol delivers the
 PrepareCommand in a deterministic order in all the nodes, by a single
 deliver thread. To ensure consistency, if it delivers two conflicting
 transactions, the second transaction must wait until the first
 transaction finishes. However, blocking single deliver thread is not a
 good solution, because no more transaction can be validated, even if
 they don't conflict, while the thread is blocked.
 
 So, after creating a dependency graph (i.e. the second transaction
 knows that 

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-07 Thread Bela Ban
I meant to say that this is up to you guys to decide inwhich Infinispan 
release this will be used, but it will be available in JGroups 3.3.

What's the strategy/schedule for 6 and 5.3 anyway ?

On 2/7/13 11:30 AM, Bela Ban wrote:
 No, this *could* be in Infinispan 5.3 (it will be in JGroups 3.3).

 A MessageDispatcher (RpcDispatcher) instance picks which dispatching
 mechanism it wants to use. I have RequestHandler (the default) and a
 sub-interface AsyncRequestHandler. MessageDispatcher (and subclass
 RpcDispatcher) will implement both (they already do implement
 handle(Message)).

 So now a user simply sets an attribute in MessageDispatcher to select
 async dispatching (sync is the default).

 What needs to be done from the Infinispan side is to override
 handle(Message,Response), and implement handling of requests in a thread
 pool. The current behavior (inherited from MessageDispatcher) will be to
 call handle(Message) which CommandAwareDispatcher already implements.

 The Infinispan side can be done in 10 minutes. However, the real work
 will be the dispatching of incoming requests to threads from the
 Infinispan thread pool, and the impl of the thread pool, which doesn't
 exist yet. I guess preserving ordering of requests will be the important
 part.

 If you have your own thread pool, sync RPCs can be sent without OOB, but
 the handle() method in CommandAwareDispatcher can decide, based on the
 mode (e.g. sync) whether to queue the request behind other requests, or
 whether to invoke it directly.

 I wanted to implement this quickly in JGroups so the hooks are in place
 for Infinispan to use them later, once a pool has been implemented.


 On 2/7/13 10:56 AM, Manik Surtani wrote:
 Very interesting.  However I presume this would be something for Infinispan 
 6.0?  Any thoughts on backward compat?

 On 7 Feb 2013, at 04:53, Bela Ban b...@redhat.com wrote:

 Hi Pedro,

 this is almost exactly what I wanted to implement !

 Question:
 - In RequestCorrelator.handleRequest():

 protected void handleRequest(Message req, Header hdr) {
 Object retval;
 boolean threwException = false;
 MessageRequest messageRequest = new MessageRequestImpl(req, hdr);
 try {
 retval=request_handler.handle(messageRequest);
 } catch(Throwable t) {
 retval=t;
 threwException = true;
 }
 messageRequest.sendReply(retval, threwException);// -- should be moved
 up, or called only if threwException == true
 }


 , you create a MessageRequestImpl and pass it to the RequestHandler. The
 request handler then dispatches the request (possibly) to a thread pool
 and calls MessageRequestImpl.sendReply() when done.

 However, you also call MessageRequest.sendReply() before returning from
 handleRequest(). I think this is an error, and
 MessageRequest.sendReply() should be moved up inside the catch clause,
 or be called only if threwException is true, so that we send a reply on
 behalf of the RequestHandler if and only if it threw an exception (e.g.
 before it dispatches the request to a thread pool). Otherwise, we'd send
 a reply *twice* !

 A few changes I have in mind (need to think about it more):

 - I want to leave the existing RequestHandler interface in place, so
 current implementation continue to work
 - There will be a new AsyncRequestHandler interface (possibly extending
 RequestHandler, so an implementation can decide to implement both). The
 RequestCorrelator needs to have either request_handler or
 async_request_handler set. If the former is set, the logic is unchanged.
 If the latter is set I'll invoke the async dispatching code

 - AsyncRequestHandler will look similar to the following:
 void handle(Message request, Handback hb, boolean requires_response)
 throws Throwable;
 - Handback is an interface, and its impl contains header information
 (e.g. request ID)
 - Handback has a sendReply(Object reply, boolean is_exception) method
 which sends a response (or exception) back to the caller
 - When requires_response is false, the AsyncRequestHandler doesn't need
 to invoke sendReply()

 - Message batching
 - The above interfaces need to take message batching into account, e.g.
 the ability to handle multiple requests concurrently (if they don't need
 to be executed sequentially)


 Thoughts ?


 On 2/6/13 8:29 PM, Pedro Ruivo wrote:
 Hi all,

 Recently I came up with a solution that can help with the thread pool
 problem motivated by the following:

 In one of the first implementation of Total Order based commit
 protocol (TO), I had the requirement to move the PrepareCommand to
 another thread pool. In resume, the TO protocol delivers the
 PrepareCommand in a deterministic order in all the nodes, by a single
 deliver thread. To ensure consistency, if it delivers two conflicting
 transactions, the second transaction must wait until the first
 transaction finishes. However, blocking single deliver thread is not a
 good solution, because no more transaction can be validated, even if
 they don't conflict, while the thread is 

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-07 Thread Pedro Ruivo
Hi Bela

On 2/7/13 4:53 AM, Bela Ban wrote:
 Hi Pedro,

 this is almost exactly what I wanted to implement !

 Question:
 - In RequestCorrelator.handleRequest():

 protected void handleRequest(Message req, Header hdr) {
 Object retval;
 boolean threwException = false;
 MessageRequest messageRequest = new MessageRequestImpl(req, hdr);
 try {
 retval=request_handler.handle(messageRequest);
 } catch(Throwable t) {
 retval=t;
 threwException = true;
 }
 messageRequest.sendReply(retval, threwException);//-- should be moved
 up, or called only if threwException == true
 }


 , you create a MessageRequestImpl and pass it to the RequestHandler. The
 request handler then dispatches the request (possibly) to a thread pool
 and calls MessageRequestImpl.sendReply() when done.

 However, you also call MessageRequest.sendReply() before returning from
 handleRequest(). I think this is an error, and
 MessageRequest.sendReply() should be moved up inside the catch clause,
 or be called only if threwException is true, so that we send a reply on
 behalf of the RequestHandler if and only if it threw an exception (e.g.
 before it dispatches the request to a thread pool). Otherwise, we'd send
 a reply *twice* !
In my defense, I was assuming if the application uses the sendReply() 
method, it must return a special return value: DO_NOT_REPLY (in 
RequestHandler interface).
This return value is automatically ignored:

public final void sendReply(Object reply, boolean exceptionThrown) {
if(!header.rsp_expected || reply == RequestHandler.DO_NOT_REPLY)
return;

 A few changes I have in mind (need to think about it more):

 - I want to leave the existing RequestHandler interface in place, so
 current implementation continue to work
 - There will be a new AsyncRequestHandler interface (possibly extending
 RequestHandler, so an implementation can decide to implement both). The
 RequestCorrelator needs to have either request_handler or
 async_request_handler set. If the former is set, the logic is unchanged.
 If the latter is set I'll invoke the async dispatching code
I'm not sure if it is a good idea to have the AsyncRequestHandler 
extending the RequestHandler interface. If the application implements 
both methods (Object handle(Message) and void handle(Message, ...)) how 
do you know which method should be invoked?
 - AsyncRequestHandler will look similar to the following:
 void handle(Message request, Handback hb, boolean requires_response)
 throws Throwable;
 - Handback is an interface, and its impl contains header information
 (e.g. request ID)
 - Handback has a sendReply(Object reply, boolean is_exception) method
 which sends a response (or exception) back to the caller
 - When requires_response is false, the AsyncRequestHandler doesn't need
 to invoke sendReply()
My 2 cents, I think that the boolean requires_response should be in 
Handback implementation to avoid passing an extra argument
 - Message batching
 - The above interfaces need to take message batching into account, e.g.
 the ability to handle multiple requests concurrently (if they don't need
 to be executed sequentially)


 Thoughts ?


 On 2/6/13 8:29 PM, Pedro Ruivo wrote:
 Hi all,

 Recently I came up with a solution that can help with the thread pool
 problem motivated by the following:

 In one of the first implementation of Total Order based commit
 protocol (TO), I had the requirement to move the PrepareCommand to
 another thread pool. In resume, the TO protocol delivers the
 PrepareCommand in a deterministic order in all the nodes, by a single
 deliver thread. To ensure consistency, if it delivers two conflicting
 transactions, the second transaction must wait until the first
 transaction finishes. However, blocking single deliver thread is not a
 good solution, because no more transaction can be validated, even if
 they don't conflict, while the thread is blocked.

 So, after creating a dependency graph (i.e. the second transaction
 knows that it must wait for the first transaction to finish) I move
 the PrepareCommand to another thread pool. Initially, I implemented a
 new command, called PrepareResponseCommand, that sends back the reply
 of the PrepareCommand. This solution has one disadvantage: I had to
 implement an ack collector in ISPN, while JGroups already offers me
 that with a synchronous communication.

 Recently (2 or 3 months ago) I implemented a simple modification in
 JGroups. In a more generic approach, it allows other threads to reply
 to a RPC request (such as the PrepareCommand). In the previous
 scenario, I replaced the PrepareResponseCommand and the ack collector
 implementation with a synchronous RPC invocation. I've used this
 solution in other issues in the Cloud-TM's ISPN fork.

 This solution is quite simple to implement and may help you to move
 the commands to ISPN internal thread pools. The modifications I've
 made are the following:

 1) I added a new interface (see [1]) that is sent to the application
 instead of the Message 

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-07 Thread Bela Ban

On 2/7/13 12:29 PM, Pedro Ruivo wrote:
 Hi Bela

 On 2/7/13 4:53 AM, Bela Ban wrote:
 Hi Pedro,

 this is almost exactly what I wanted to implement !

 Question:
 - In RequestCorrelator.handleRequest():

 protected void handleRequest(Message req, Header hdr) {
 Object retval;
 boolean threwException = false;
 MessageRequest messageRequest = new MessageRequestImpl(req, hdr);
 try {
 retval=request_handler.handle(messageRequest);
 } catch(Throwable t) {
 retval=t;
 threwException = true;
 }
 messageRequest.sendReply(retval, threwException);//-- should be moved
 up, or called only if threwException == true
 }


 , you create a MessageRequestImpl and pass it to the RequestHandler. The
 request handler then dispatches the request (possibly) to a thread pool
 and calls MessageRequestImpl.sendReply() when done.

 However, you also call MessageRequest.sendReply() before returning from
 handleRequest(). I think this is an error, and
 MessageRequest.sendReply() should be moved up inside the catch clause,
 or be called only if threwException is true, so that we send a reply on
 behalf of the RequestHandler if and only if it threw an exception (e.g.
 before it dispatches the request to a thread pool). Otherwise, we'd send
 a reply *twice* !
 In my defense, I was assuming if the application uses the sendReply()
 method, it must return a special return value: DO_NOT_REPLY (in
 RequestHandler interface).
 This return value is automatically ignored:

 public final void sendReply(Object reply, boolean exceptionThrown) {
 if(!header.rsp_expected || reply == RequestHandler.DO_NOT_REPLY)
 return;

OK



 A few changes I have in mind (need to think about it more):

 - I want to leave the existing RequestHandler interface in place, so
 current implementation continue to work
 - There will be a new AsyncRequestHandler interface (possibly extending
 RequestHandler, so an implementation can decide to implement both). The
 RequestCorrelator needs to have either request_handler or
 async_request_handler set. If the former is set, the logic is unchanged.
 If the latter is set I'll invoke the async dispatching code
 I'm not sure if it is a good idea to have the AsyncRequestHandler
 extending the RequestHandler interface. If the application implements
 both methods (Object handle(Message) and void handle(Message, ...)) how
 do you know which method should be invoked?


The default would be to invoke the old handle(Message) method. The 
dispatching mechanism could be changed to use the async method by 
setting an attribute in MessageDispatcher (which in turn sets it in 
RequestCorrelator).

How would you do this ? Remember, we cannot change or remove 
handle(Message) as subclasses of RpcDispatcher or MessageDispatcher, or 
impls of RequestHandler are out there and any change to handle(Message) 
would break them.

Would you simply provide a separate AsyncRequestHandler interface, not 
extending RequestHandler ? This would require RequestCorrelator and 
MessageDispatcher to have 2 refs instead of 1. With the current approach 
I can do an instanceof on the RequestHandler.

I eventually like to merge RequestHandler and AsyncRequestHandler into 1 
class, but this can be done in 4.0 at the earliest time.

 - AsyncRequestHandler will look similar to the following:
 void handle(Message request, Handback hb, boolean requires_response)
 throws Throwable;
 - Handback is an interface, and its impl contains header information
 (e.g. request ID)
 - Handback has a sendReply(Object reply, boolean is_exception) method
 which sends a response (or exception) back to the caller
 - When requires_response is false, the AsyncRequestHandler doesn't need
 to invoke sendReply()
 My 2 cents, I think that the boolean requires_response should be in
 Handback implementation to avoid passing an extra argument

I actually removed the requires_response parameter. If no response is 
required, rsp in handle(Message req, Response rsp) will simply be null.

-- 
Bela Ban, JGroups lead (http://www.jgroups.org)

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] Threadpools in a large cluster

2013-02-07 Thread Dan Berindei
On Thu, Feb 7, 2013 at 12:43 PM, Bela Ban b...@redhat.com wrote:


 On 2/7/13 11:09 AM, Dan Berindei wrote:
 
 
  A few changes I have in mind (need to think about it more):
 
  - I want to leave the existing RequestHandler interface in place, so
  current implementation continue to work
  - There will be a new AsyncRequestHandler interface (possibly
  extending
  RequestHandler, so an implementation can decide to implement
  both). The
  RequestCorrelator needs to have either request_handler or
  async_request_handler set. If the former is set, the logic is
  unchanged.
  If the latter is set I'll invoke the async dispatching code
 
  - AsyncRequestHandler will look similar to the following:
  void handle(Message request, Handback hb, boolean requires_response)
  throws Throwable;
  - Handback is an interface, and its impl contains header information
  (e.g. request ID)
  - Handback has a sendReply(Object reply, boolean is_exception) method
  which sends a response (or exception) back to the caller
 
 
  +1 for a new interface. TBH I hadn't read the RequestCorrelator code,
  so I had assumed it was already asynchronous, and only RpcDispatcher
  was synchronous.


 Nope, unfortunately not.




 
  I'm not so sure about the Handback name, how about calling it Response
  instead?


 It *is* actually called Response (can you read my mind?) :-)


Nice :)


 
  - When requires_response is false, the AsyncRequestHandler doesn't
  need to invoke sendReply()
 
 
  I think this should be the other way around: when requires_response is
  true, the AsyncRequestHandler *can* invoke sendReply(), but is not
  required to (the call will just time out on the caller node); when
  requires_response is false, invoking sendReply() should throw an
  exception.


 The way I actually implemented it this morning is to omit the boolean
 parameter altogether:
 void handle(Message request, Response response) throws Exception;

 Response is null for async requests.


Sounds good.






 
 
  - Message batching
  - The above interfaces need to take message batching into account,
  e.g.
  the ability to handle multiple requests concurrently (if they
  don't need
  to be executed sequentially)
 
 
  You mean handle() is still going to be called once for each request,
  but second handle() call won't necessarily wait for the first
  message's sendReply() call?

 Yes. I was thinking of adding a second method to the interface, which
 has a message batch as parameter. However, we'd also have to pass in an
 array of Response objects and it looked a bit clumsy.


Agree, it would look quite clumsy.


 
  Is this going to apply only to OOB messages, or to regular messages as
  well? I think I'd prefer it if it only applied to OOB messages,
  otherwise we'd have to implement our own ordering for regular/async
  commands.

 No, I think it'll apply to all messages. A simple implementation could
 dispatch OOB messages to the thread pool, as they don't need to be
 ordered. Regular messages could be added to a queue where they are
 processed sequentially by a *single* thread. Pedro does implement
 ordering based on transactions (see his prev email), and I think there
 are some other good use cases for regular messages. I think one thing
 that could be done for regular messages is to implement something like
 SCOPE (remember ?) for async RPCs: updates to different web sessions
 could be processed concurrently, only updates to the *same* session
 would have to be ordered.


Yeah, I agree implementing the regular message ordering ourselves would
give us a little more room for optimizations. But it would make our part
more complicated, too. Well, not for Total Ordering, because Pedro already
implemented it, but for our regular async scenarios we'd need to add a
thread pool (we want to allow 2 threads from different sources to access
different keys at the same time).



 This API is not in stone, we can always change it. Once I'm done with
 this and have batching II implemented, plus some other JIRAs, I'll ping
 you guys and we should have a meeting discussing
 - Async invocation API
 - Message batching (also in conjunction with the above)
 - Message bundling and OOB / DONT_BUNDLE; bundling of OOB messages

 --
 Bela Ban, JGroups lead (http://www.jgroups.org)

 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-07 Thread Bela Ban

On 2/7/13 1:38 PM, Dan Berindei wrote:





 No, I think it'll apply to all messages. A simple implementation could
 dispatch OOB messages to the thread pool, as they don't need to be
 ordered. Regular messages could be added to a queue where they are
 processed sequentially by a *single* thread. Pedro does implement
 ordering based on transactions (see his prev email), and I think there
 are some other good use cases for regular messages. I think one thing
 that could be done for regular messages is to implement something like
 SCOPE (remember ?) for async RPCs: updates to different web sessions
 could be processed concurrently, only updates to the *same* session
 would have to be ordered.


 Yeah, I agree implementing the regular message ordering ourselves 
 would give us a little more room for optimizations. But it would make 
 our part more complicated, too. Well, not for Total Ordering, because 
 Pedro already implemented it, but for our regular async scenarios we'd 
 need to add a thread pool (we want to allow 2 threads from different 
 sources to access different keys at the same time).

We don't need to implement this *now*, but let's agree on the API 
(feedback from Pedro and you guys), and add this in JGroups 3.3.

You can start out with a very simple implementation, or no 
implementation at all (then you'd have the same behavior as now), but 
the point is that you can do this at your own pace, and don't depend on 
any changes in JGroups (once 3.3 is used by Infinispan).

I agree that optimization of thread/request processing *while still 
preserving ordering where needed* might be a bit tricky, but the 
potential benefits are great. You could for example add request 
priorities by adding a parameter/flag to RPCs and dispatching them to a 
priority queue in your thread pool, without me having to change anything.

-- 
Bela Ban, JGroups lead (http://www.jgroups.org)

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] Threadpools in a large cluster

2013-02-07 Thread Bela Ban


On 2/7/13 12:36 PM, Manik Surtani wrote:
 On 7 Feb 2013, at 11:29, Bela Ban b...@redhat.com wrote:

 I meant to say that this is up to you guys to decide inwhich Infinispan
 release this will be used, but it will be available in JGroups 3.3.

 What's the strategy/schedule for 6 and 5.3 anyway ?
 5.3 will be relatively quick, just an incremental release mainly targeting 
 reviewing and putting in a few contribs on the queue.  E.g., we're hoping for 
 5.3 to be feature-complete in no more than 2 months.

OK, so let's target JGroups 3.3 to be included in Infinispan 5.3. Then 
you have a choice of doing this in 5.3 or waiting until 6.


 6 is where we get to break API - specifically with the new packaging, etc.

 But this seems like a pretty important feature/fix so it may make sense to 
 get this into 5.3.



-- 
Bela Ban, JGroups lead (http://www.jgroups.org)

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] Threadpools in a large cluster

2013-02-07 Thread Mircea Markus

On 1 Feb 2013, at 08:04, Radim Vansa wrote:

 Hi guys,
 
 after dealing with the large cluster for a while I find the way how we use 
 OOB threads in synchronous configuration non-robust.
 Imagine a situation where node which is not an owner of the key calls PUT. 
 Then the a RPC is called to the primary owner of that key, which reroutes the 
 request to all other owners and after these reply, it replies back.

This delegation RPC pattern happens for non-transactional caches only. Do you 
have the same problem with transactional caches as well?

 There are two problems:
 1) If we do simultanously X requests from non-owners to the primary owner 
 where X is OOB TP size, all the OOB threads are waiting for the responses and 
 there is no thread to process the OOB response and release the thread.
 2) Node A is primary owner of keyA, non-primary owner of keyB and B is 
 primary of keyB and non-primary of keyA. We got many requests for both keyA 
 and keyB from other nodes, therefore, all OOB threads from both nodes call 
 RPC to the non-primary owner but there's noone who could process the request.
 
 While we wait for the requests to timeout, the nodes with depleted OOB 
 threadpools start suspecting all other nodes because they can't receive 
 heartbeats etc...
 
 You can say increase your OOB tp size, but that's not always an option, I 
 have currently set it to 1000 threads and it's not enough. In the end, I will 
 be always limited by RAM and something tells me that even nodes with few gigs 
 of RAM should be able to form a huge cluster. We use 160 hotrod worker 
 threads in JDG, that means that 160 * clusterSize = 10240 (64 nodes in my 
 cluster) parallel requests can be executed, and if 10% targets the same node 
 with 1000 OOB threads, it stucks. It's about scaling and robustness.
 
 Not that I'd have any good solution, but I'd really like to start a 
 discussion.
 Thinking about it a bit, the problem is that blocking call (calling RPC on 
 primary owner from message handler) can block non-blocking calls (such as RPC 
 response or command that never sends any more messages). Therefore, having a 
 flag on message this won't send another message could let the message be 
 executed in different threadpool, which will be never deadlocked. In fact, 
 the pools could share the threads but the non-blocking would have always a 
 few threads spare.
 It's a bad solution as maintaining which message could block in the other 
 node is really, really hard (we can be sure only in case of RPC responses), 
 especially when some locks come. I will welcome anything better.
 
 Radim
 
 
 ---
 Radim Vansa
 Quality Assurance Engineer
 JBoss Datagrid
 tel. +420532294559 ext. 62559
 
 Red Hat Czech, s.r.o.
 Brno, Purkyňova 99/71, PSČ 612 45
 Czech Republic
 
 
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)




___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-07 Thread Mircea Markus

On 1 Feb 2013, at 09:54, Dan Berindei wrote:

 Yeah, I wouldn't call this a simple solution...
 
 The distribution/replication interceptors are quite high in the interceptor 
 stack, so we'd have to save the state of the interceptor stack (basically the 
 thread's stack) somehow and resume processing it on the thread receiving the 
 responses. In a language that supports continuations that would be a piece of 
 cake, but since we're in Java we'd have to completely change the way the 
 interceptor stack works.
 
 Actually we do hold the lock on modified keys while the command is replicated 
 to the other owners. But think locking wouldn't be a problem: we already 
 allow locks to be owned by transactions instead of threads, so it would just 
 be a matter of creating a lite transaction for non-transactional caches. 
 Obviously the TransactionSynchronizerInterceptor would have to go, but I see 
 that as a positive thing ;)
The TransactionSynchronizerInterceptor protected the CacheTransaction objects 
from multiple writes, we'd still need that because of the NBST forwarding.
 
 So yeah, it could work, but it would take a huge amount of effort and it's 
 going to obfuscate the code. Plus, I'm not at all convinced that it's going 
 to improve performance that much compared to a new thread pool.
+1
 
 Cheers
 Dan
 
 
 On Fri, Feb 1, 2013 at 10:59 AM, Radim Vansa rva...@redhat.com wrote:
 Yeah, that would work if it is possible to break execution path into the 
 FutureListener from the middle of interceptor stack - I am really not sure 
 about that but as in current design no locks should be held when a RPC is 
 called, it may be possible.
 
 Let's see what someone more informed (Dan?) would think about that.
 
 Thanks, Bela
 
 Radim
 
 - Original Message -
 | From: Bela Ban b...@redhat.com
 | To: infinispan-dev@lists.jboss.org
 | Sent: Friday, February 1, 2013 9:39:43 AM
 | Subject: Re: [infinispan-dev] Threadpools in a large cluster
 |
 | It looks like the core problem is an incoming RPC-1 which triggers
 | another blocking RPC-2: the thread delivering RPC-1 is blocked
 | waiting
 | for the response from RPC-2, and can therefore not be used to serve
 | other requests for the duration of RPC-2. If RPC-2 takes a while,
 | e.g.
 | waiting to acquire a lock in the remote node, then it is clear that
 | the
 | thread pool will quickly exceed its max size.
 |
 | A simple solution would be to prevent invoking blocking RPCs *from
 | within* a received RPC. Let's take a look at an example:
 | - A invokes a blocking PUT-1 on B
 | - B forwards the request as blocking PUT-2 to C and D
 | - When PUT-2 returns and B gets the responses from C and D (or the
 | first
 | one to respond, don't know exactly how this is implemented), it sends
 | the response back to A (PUT-1 terminates now at A)
 |
 | We could change this to the following:
 | - A invokes a blocking PUT-1 on B
 | - B receives PUT-1. Instead of invoking a blocking PUT-2 on C and D,
 | it
 | does the following:
 |   - B invokes PUT-2 and gets a future
 |   - B adds itself as a FutureListener, and it also stores the
 | address of the original sender (A)
 |   - When the FutureListener is invoked, B sends back the result
 |   as a
 | response to A
 | - Whenever a member leaves the cluster, the corresponding futures are
 | cancelled and removed from the hashmaps
 |
 | This could probably be done differently (e.g. by sending asynchronous
 | messages and implementing a finite state machine), but the core of
 | the
 | solution is the same; namely to avoid having an incoming thread block
 | on
 | a sync RPC.
 |
 | Thoughts ?
 |
 |
 |
 |
 | On 2/1/13 9:04 AM, Radim Vansa wrote:
 |  Hi guys,
 | 
 |  after dealing with the large cluster for a while I find the way how
 |  we use OOB threads in synchronous configuration non-robust.
 |  Imagine a situation where node which is not an owner of the key
 |  calls PUT. Then the a RPC is called to the primary owner of that
 |  key, which reroutes the request to all other owners and after
 |  these reply, it replies back.
 |  There are two problems:
 |  1) If we do simultanously X requests from non-owners to the primary
 |  owner where X is OOB TP size, all the OOB threads are waiting for
 |  the responses and there is no thread to process the OOB response
 |  and release the thread.
 |  2) Node A is primary owner of keyA, non-primary owner of keyB and B
 |  is primary of keyB and non-primary of keyA. We got many requests
 |  for both keyA and keyB from other nodes, therefore, all OOB
 |  threads from both nodes call RPC to the non-primary owner but
 |  there's noone who could process the request.
 | 
 |  While we wait for the requests to timeout, the nodes with depleted
 |  OOB threadpools start suspecting all other nodes because they
 |  can't receive heartbeats etc...
 | 
 |  You can say increase your OOB tp size, but that's not always an
 |  option, I have currently set it to 1000 threads and it's

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-07 Thread Dan Berindei
On Thu, Feb 7, 2013 at 8:05 PM, Mircea Markus mmar...@redhat.com wrote:


 On 1 Feb 2013, at 09:54, Dan Berindei wrote:

 Yeah, I wouldn't call this a simple solution...

 The distribution/replication interceptors are quite high in the
 interceptor stack, so we'd have to save the state of the interceptor stack
 (basically the thread's stack) somehow and resume processing it on the
 thread receiving the responses. In a language that supports continuations
 that would be a piece of cake, but since we're in Java we'd have to
 completely change the way the interceptor stack works.

 Actually we do hold the lock on modified keys while the command is
 replicated to the other owners. But think locking wouldn't be a problem: we
 already allow locks to be owned by transactions instead of threads, so it
 would just be a matter of creating a lite transaction for
 non-transactional caches. Obviously the TransactionSynchronizerInterceptor
 would have to go, but I see that as a positive thing ;)

 The TransactionSynchronizerInterceptor protected the CacheTransaction
 objects from multiple writes, we'd still need that because of the NBST
 forwarding.


We wouldn't need it if access to the Collection members in CacheTransaction
was properly synchronized. Perhaps hack is too strong a word, let's just
say I'm seeing TransactionSynchronizerInterceptor as a temporary solution :)


 So yeah, it could work, but it would take a huge amount of effort and it's
 going to obfuscate the code. Plus, I'm not at all convinced that it's going
 to improve performance that much compared to a new thread pool.

 +1


 Cheers
 Dan


 On Fri, Feb 1, 2013 at 10:59 AM, Radim Vansa rva...@redhat.com wrote:

 Yeah, that would work if it is possible to break execution path into the
 FutureListener from the middle of interceptor stack - I am really not sure
 about that but as in current design no locks should be held when a RPC is
 called, it may be possible.

 Let's see what someone more informed (Dan?) would think about that.

 Thanks, Bela

 Radim

 - Original Message -
 | From: Bela Ban b...@redhat.com
 | To: infinispan-dev@lists.jboss.org
 | Sent: Friday, February 1, 2013 9:39:43 AM
 | Subject: Re: [infinispan-dev] Threadpools in a large cluster
 |
 | It looks like the core problem is an incoming RPC-1 which triggers
 | another blocking RPC-2: the thread delivering RPC-1 is blocked
 | waiting
 | for the response from RPC-2, and can therefore not be used to serve
 | other requests for the duration of RPC-2. If RPC-2 takes a while,
 | e.g.
 | waiting to acquire a lock in the remote node, then it is clear that
 | the
 | thread pool will quickly exceed its max size.
 |
 | A simple solution would be to prevent invoking blocking RPCs *from
 | within* a received RPC. Let's take a look at an example:
 | - A invokes a blocking PUT-1 on B
 | - B forwards the request as blocking PUT-2 to C and D
 | - When PUT-2 returns and B gets the responses from C and D (or the
 | first
 | one to respond, don't know exactly how this is implemented), it sends
 | the response back to A (PUT-1 terminates now at A)
 |
 | We could change this to the following:
 | - A invokes a blocking PUT-1 on B
 | - B receives PUT-1. Instead of invoking a blocking PUT-2 on C and D,
 | it
 | does the following:
 |   - B invokes PUT-2 and gets a future
 |   - B adds itself as a FutureListener, and it also stores the
 | address of the original sender (A)
 |   - When the FutureListener is invoked, B sends back the result
 |   as a
 | response to A
 | - Whenever a member leaves the cluster, the corresponding futures are
 | cancelled and removed from the hashmaps
 |
 | This could probably be done differently (e.g. by sending asynchronous
 | messages and implementing a finite state machine), but the core of
 | the
 | solution is the same; namely to avoid having an incoming thread block
 | on
 | a sync RPC.
 |
 | Thoughts ?
 |
 |
 |
 |
 | On 2/1/13 9:04 AM, Radim Vansa wrote:
 |  Hi guys,
 | 
 |  after dealing with the large cluster for a while I find the way how
 |  we use OOB threads in synchronous configuration non-robust.
 |  Imagine a situation where node which is not an owner of the key
 |  calls PUT. Then the a RPC is called to the primary owner of that
 |  key, which reroutes the request to all other owners and after
 |  these reply, it replies back.
 |  There are two problems:
 |  1) If we do simultanously X requests from non-owners to the primary
 |  owner where X is OOB TP size, all the OOB threads are waiting for
 |  the responses and there is no thread to process the OOB response
 |  and release the thread.
 |  2) Node A is primary owner of keyA, non-primary owner of keyB and B
 |  is primary of keyB and non-primary of keyA. We got many requests
 |  for both keyA and keyB from other nodes, therefore, all OOB
 |  threads from both nodes call RPC to the non-primary owner but
 |  there's noone who could process the request.
 | 
 |  While we wait for the requests

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-06 Thread Bela Ban
Hi Pedro,

this is almost exactly what I wanted to implement !

Question:
- In RequestCorrelator.handleRequest():

protected void handleRequest(Message req, Header hdr) {
Object retval;
boolean threwException = false;
MessageRequest messageRequest = new MessageRequestImpl(req, hdr);
try {
retval=request_handler.handle(messageRequest);
} catch(Throwable t) {
retval=t;
threwException = true;
}
messageRequest.sendReply(retval, threwException);// -- should be moved 
up, or called only if threwException == true
}


, you create a MessageRequestImpl and pass it to the RequestHandler. The 
request handler then dispatches the request (possibly) to a thread pool 
and calls MessageRequestImpl.sendReply() when done.

However, you also call MessageRequest.sendReply() before returning from 
handleRequest(). I think this is an error, and 
MessageRequest.sendReply() should be moved up inside the catch clause, 
or be called only if threwException is true, so that we send a reply on 
behalf of the RequestHandler if and only if it threw an exception (e.g. 
before it dispatches the request to a thread pool). Otherwise, we'd send 
a reply *twice* !

A few changes I have in mind (need to think about it more):

- I want to leave the existing RequestHandler interface in place, so 
current implementation continue to work
- There will be a new AsyncRequestHandler interface (possibly extending 
RequestHandler, so an implementation can decide to implement both). The 
RequestCorrelator needs to have either request_handler or 
async_request_handler set. If the former is set, the logic is unchanged. 
If the latter is set I'll invoke the async dispatching code

- AsyncRequestHandler will look similar to the following:
void handle(Message request, Handback hb, boolean requires_response) 
throws Throwable;
- Handback is an interface, and its impl contains header information 
(e.g. request ID)
- Handback has a sendReply(Object reply, boolean is_exception) method 
which sends a response (or exception) back to the caller
- When requires_response is false, the AsyncRequestHandler doesn't need 
to invoke sendReply()

- Message batching
- The above interfaces need to take message batching into account, e.g. 
the ability to handle multiple requests concurrently (if they don't need 
to be executed sequentially)


Thoughts ?


On 2/6/13 8:29 PM, Pedro Ruivo wrote:
 Hi all,

 Recently I came up with a solution that can help with the thread pool 
 problem motivated by the following:

 In one of the first implementation of Total Order based commit 
 protocol (TO), I had the requirement to move the PrepareCommand to 
 another thread pool. In resume, the TO protocol delivers the 
 PrepareCommand in a deterministic order in all the nodes, by a single 
 deliver thread. To ensure consistency, if it delivers two conflicting 
 transactions, the second transaction must wait until the first 
 transaction finishes. However, blocking single deliver thread is not a 
 good solution, because no more transaction can be validated, even if 
 they don't conflict, while the thread is blocked.

 So, after creating a dependency graph (i.e. the second transaction 
 knows that it must wait for the first transaction to finish) I move 
 the PrepareCommand to another thread pool. Initially, I implemented a 
 new command, called PrepareResponseCommand, that sends back the reply 
 of the PrepareCommand. This solution has one disadvantage: I had to 
 implement an ack collector in ISPN, while JGroups already offers me 
 that with a synchronous communication.

 Recently (2 or 3 months ago) I implemented a simple modification in 
 JGroups. In a more generic approach, it allows other threads to reply 
 to a RPC request (such as the PrepareCommand). In the previous 
 scenario, I replaced the PrepareResponseCommand and the ack collector 
 implementation with a synchronous RPC invocation. I've used this 
 solution in other issues in the Cloud-TM's ISPN fork.

 This solution is quite simple to implement and may help you to move 
 the commands to ISPN internal thread pools. The modifications I've 
 made are the following:

 1) I added a new interface (see [1]) that is sent to the application 
 instead of the Message object (see [4]). This interface contains the 
 Message and it has a method to allow the application send the reply to 
 that particular request.
 2) I added a new object in [4] with the meaning: this return value is 
 not the reply to the RPC request. This is the returned value that I 
 return when I want to release the thread, because ISPN should return 
 some object in the handle() method. Of course, I know that ISPN will 
 invoke the sendReply() in some other place, otherwise, I will get a 
 TimeoutException in the sender side.
 3) Also I've changed the RequestCorrelator implementation to support 
 the previous modifications (see [2] and [3])

 In the Cloud-TM's ISPN fork I added a reference in the 
 BaseCacheRpcCommand to [1] and added the method sendReply() [5]. In 
 

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-04 Thread Manik Surtani
I agree that an application thread pool just pushes the issue of the OOB pool 
running out of threads elsewhere, but that is only one of the two problems 
Radim has.

The other is that nodes get suspected and kicked out because heartbeat messages 
get blocked as well.  Same thing with FC credit messages.  By having a separate 
application pool, at least we guarantee that the cluster service messages get 
handled… 


On 3 Feb 2013, at 11:23, Bela Ban b...@redhat.com wrote:

 A new thread pool owned by Infinispan is certainly something desirable, 
 as discussed in Palma, but I think it wouldn't solve the issue Radim ran 
 into, namely threads being used despite the fact that they only wait for 
 another blocking RPC to finish.
 
 If we made the JGroups thread return immediately by transferring control 
 to an Infinispan thread, then we'd simply move the issue from the former 
 to the latter pool. Eventually, the Infinispan pool would run out of 
 threads.
 
 Coming back to the specific problem Radim ran into: the forwarding of a 
 PUT doesn't hold any locks, so your argument below wouldn't hold. 
 However, of course this is only one specific scenario, and you're 
 probably right that we'd have to consider the more general case of a 
 thread holding locks...
 
 All said, I believe it would still be worthwhile looking into a more 
 non-blocking way of invoking RPCs, that doesn't occupy threads which 
 essentially only wait on IO (network traffic)... A simple state machine 
 approach could be the solution to this...
 
 
 On 2/1/13 10:54 AM, Dan Berindei wrote:
 Yeah, I wouldn't call this a simple solution...
 
 The distribution/replication interceptors are quite high in the 
 interceptor stack, so we'd have to save the state of the interceptor 
 stack (basically the thread's stack) somehow and resume processing it 
 on the thread receiving the responses. In a language that supports 
 continuations that would be a piece of cake, but since we're in Java 
 we'd have to completely change the way the interceptor stack works.
 
 Actually we do hold the lock on modified keys while the command is 
 replicated to the other owners. But think locking wouldn't be a 
 problem: we already allow locks to be owned by transactions instead of 
 threads, so it would just be a matter of creating a lite transaction 
 for non-transactional caches. Obviously the 
 TransactionSynchronizerInterceptor would have to go, but I see that as 
 a positive thing ;)
 
 So yeah, it could work, but it would take a huge amount of effort and 
 it's going to obfuscate the code. Plus, I'm not at all convinced that 
 it's going to improve performance that much compared to a new thread pool.
 
 Cheers
 Dan
 
 
 On Fri, Feb 1, 2013 at 10:59 AM, Radim Vansa rva...@redhat.com 
 mailto:rva...@redhat.com wrote:
 
Yeah, that would work if it is possible to break execution path
into the FutureListener from the middle of interceptor stack - I
am really not sure about that but as in current design no locks
should be held when a RPC is called, it may be possible.
 
Let's see what someone more informed (Dan?) would think about that.
 
Thanks, Bela
 
Radim
 
- Original Message -
| From: Bela Ban b...@redhat.com mailto:b...@redhat.com
| To: infinispan-dev@lists.jboss.org
mailto:infinispan-dev@lists.jboss.org
| Sent: Friday, February 1, 2013 9:39:43 AM
| Subject: Re: [infinispan-dev] Threadpools in a large cluster
|
| It looks like the core problem is an incoming RPC-1 which triggers
| another blocking RPC-2: the thread delivering RPC-1 is blocked
| waiting
| for the response from RPC-2, and can therefore not be used to serve
| other requests for the duration of RPC-2. If RPC-2 takes a while,
| e.g.
| waiting to acquire a lock in the remote node, then it is clear that
| the
| thread pool will quickly exceed its max size.
|
| A simple solution would be to prevent invoking blocking RPCs *from
| within* a received RPC. Let's take a look at an example:
| - A invokes a blocking PUT-1 on B
| - B forwards the request as blocking PUT-2 to C and D
| - When PUT-2 returns and B gets the responses from C and D (or the
| first
| one to respond, don't know exactly how this is implemented), it
sends
| the response back to A (PUT-1 terminates now at A)
|
| We could change this to the following:
| - A invokes a blocking PUT-1 on B
| - B receives PUT-1. Instead of invoking a blocking PUT-2 on C and D,
| it
| does the following:
|   - B invokes PUT-2 and gets a future
|   - B adds itself as a FutureListener, and it also stores the
| address of the original sender (A)
|   - When the FutureListener is invoked, B sends back the result
|   as a
| response to A
| - Whenever a member leaves the cluster, the corresponding
futures are
| cancelled and removed from the hashmaps

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-04 Thread Manik Surtani
On 4 Feb 2013, at 07:46, Dan Berindei dan.berin...@gmail.com wrote:

 Switching to a state machine approach would require rethinking and rewriting 
 all our interceptors, and I'm pretty sure the code would get more complex and 
 harder to debug (to say nothing about interpreting the logs). Are you sure 
 it's going to have that many benefits to make it worthwhile?

I think it could be worthwhile, but I agree it is not at all trivial.  So 
that's actually a hard call as a result - not so easy to prototype and 
benchmark.


--
Manik Surtani
ma...@jboss.org
twitter.com/maniksurtani

Platform Architect, JBoss Data Grid
http://red.ht/data-grid

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-03 Thread Bela Ban
If you send me the details, I'll take a look. I'm pretty busy with 
message batching, so I can't promise next week, but soon...

On 2/1/13 11:08 AM, Pedro Ruivo wrote:
 Hi,

 I had a similar problem when I tried GMU[1] in large cluster (40 vms),
 because the remote gets and the commit messages (I'm talking about ISPN
 commands) must wait for some conditions before being processed.

 I solved this problem by adding a feature in JGroups[2] that allows the
 request to be moved to another thread, releasing the OOB thread. The
 other thread will send the reply of the JGroups Request. Of course, I'm
 only moving commands that I know they can block.

 I can enter in some detail if you want =)

 Cheers,
 Pedro

 [1] http://www.gsd.inesc-id.pt/~romanop/files/papers/icdcs12.pdf
 [2] I would like to talk with Bela about this, because it makes my life
 easier to support total order in ISPN. I'll try to send an email this
 weekend =)

 On 01-02-2013 08:04, Radim Vansa wrote:
 Hi guys,

 after dealing with the large cluster for a while I find the way how we use 
 OOB threads in synchronous configuration non-robust.
 Imagine a situation where node which is not an owner of the key calls PUT. 
 Then the a RPC is called to the primary owner of that key, which reroutes 
 the request to all other owners and after these reply, it replies back.
 There are two problems:
 1) If we do simultanously X requests from non-owners to the primary owner 
 where X is OOB TP size, all the OOB threads are waiting for the responses 
 and there is no thread to process the OOB response and release the thread.
 2) Node A is primary owner of keyA, non-primary owner of keyB and B is 
 primary of keyB and non-primary of keyA. We got many requests for both keyA 
 and keyB from other nodes, therefore, all OOB threads from both nodes call 
 RPC to the non-primary owner but there's noone who could process the request.

 While we wait for the requests to timeout, the nodes with depleted OOB 
 threadpools start suspecting all other nodes because they can't receive 
 heartbeats etc...

 You can say increase your OOB tp size, but that's not always an option, I 
 have currently set it to 1000 threads and it's not enough. In the end, I 
 will be always limited by RAM and something tells me that even nodes with 
 few gigs of RAM should be able to form a huge cluster. We use 160 hotrod 
 worker threads in JDG, that means that 160 * clusterSize = 10240 (64 nodes 
 in my cluster) parallel requests can be executed, and if 10% targets the 
 same node with 1000 OOB threads, it stucks. It's about scaling and 
 robustness.

 Not that I'd have any good solution, but I'd really like to start a 
 discussion.
 Thinking about it a bit, the problem is that blocking call (calling RPC on 
 primary owner from message handler) can block non-blocking calls (such as 
 RPC response or command that never sends any more messages). Therefore, 
 having a flag on message this won't send another message could let the 
 message be executed in different threadpool, which will be never deadlocked. 
 In fact, the pools could share the threads but the non-blocking would have 
 always a few threads spare.
 It's a bad solution as maintaining which message could block in the other 
 node is really, really hard (we can be sure only in case of RPC responses), 
 especially when some locks come. I will welcome anything better.

 Radim


 ---
 Radim Vansa
 Quality Assurance Engineer
 JBoss Datagrid
 tel. +420532294559 ext. 62559

 Red Hat Czech, s.r.o.
 Brno, Purkyňova 99/71, PSČ 612 45
 Czech Republic


 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

-- 
Bela Ban, JGroups lead (http://www.jgroups.org)

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-01 Thread Bela Ban
It looks like the core problem is an incoming RPC-1 which triggers 
another blocking RPC-2: the thread delivering RPC-1 is blocked waiting 
for the response from RPC-2, and can therefore not be used to serve 
other requests for the duration of RPC-2. If RPC-2 takes a while, e.g. 
waiting to acquire a lock in the remote node, then it is clear that the 
thread pool will quickly exceed its max size.

A simple solution would be to prevent invoking blocking RPCs *from 
within* a received RPC. Let's take a look at an example:
- A invokes a blocking PUT-1 on B
- B forwards the request as blocking PUT-2 to C and D
- When PUT-2 returns and B gets the responses from C and D (or the first 
one to respond, don't know exactly how this is implemented), it sends 
the response back to A (PUT-1 terminates now at A)

We could change this to the following:
- A invokes a blocking PUT-1 on B
- B receives PUT-1. Instead of invoking a blocking PUT-2 on C and D, it 
does the following:
  - B invokes PUT-2 and gets a future
  - B adds itself as a FutureListener, and it also stores the 
address of the original sender (A)
  - When the FutureListener is invoked, B sends back the result as a 
response to A
- Whenever a member leaves the cluster, the corresponding futures are 
cancelled and removed from the hashmaps

This could probably be done differently (e.g. by sending asynchronous 
messages and implementing a finite state machine), but the core of the 
solution is the same; namely to avoid having an incoming thread block on 
a sync RPC.

Thoughts ?




On 2/1/13 9:04 AM, Radim Vansa wrote:
 Hi guys,

 after dealing with the large cluster for a while I find the way how we use 
 OOB threads in synchronous configuration non-robust.
 Imagine a situation where node which is not an owner of the key calls PUT. 
 Then the a RPC is called to the primary owner of that key, which reroutes the 
 request to all other owners and after these reply, it replies back.
 There are two problems:
 1) If we do simultanously X requests from non-owners to the primary owner 
 where X is OOB TP size, all the OOB threads are waiting for the responses and 
 there is no thread to process the OOB response and release the thread.
 2) Node A is primary owner of keyA, non-primary owner of keyB and B is 
 primary of keyB and non-primary of keyA. We got many requests for both keyA 
 and keyB from other nodes, therefore, all OOB threads from both nodes call 
 RPC to the non-primary owner but there's noone who could process the request.

 While we wait for the requests to timeout, the nodes with depleted OOB 
 threadpools start suspecting all other nodes because they can't receive 
 heartbeats etc...

 You can say increase your OOB tp size, but that's not always an option, I 
 have currently set it to 1000 threads and it's not enough. In the end, I will 
 be always limited by RAM and something tells me that even nodes with few gigs 
 of RAM should be able to form a huge cluster. We use 160 hotrod worker 
 threads in JDG, that means that 160 * clusterSize = 10240 (64 nodes in my 
 cluster) parallel requests can be executed, and if 10% targets the same node 
 with 1000 OOB threads, it stucks. It's about scaling and robustness.

 Not that I'd have any good solution, but I'd really like to start a 
 discussion.
 Thinking about it a bit, the problem is that blocking call (calling RPC on 
 primary owner from message handler) can block non-blocking calls (such as RPC 
 response or command that never sends any more messages). Therefore, having a 
 flag on message this won't send another message could let the message be 
 executed in different threadpool, which will be never deadlocked. In fact, 
 the pools could share the threads but the non-blocking would have always a 
 few threads spare.
 It's a bad solution as maintaining which message could block in the other 
 node is really, really hard (we can be sure only in case of RPC responses), 
 especially when some locks come. I will welcome anything better.

-- 
Bela Ban, JGroups lead (http://www.jgroups.org)

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] Threadpools in a large cluster

2013-02-01 Thread Radim Vansa
Yeah, that would work if it is possible to break execution path into the 
FutureListener from the middle of interceptor stack - I am really not sure 
about that but as in current design no locks should be held when a RPC is 
called, it may be possible.

Let's see what someone more informed (Dan?) would think about that.

Thanks, Bela

Radim

- Original Message -
| From: Bela Ban b...@redhat.com
| To: infinispan-dev@lists.jboss.org
| Sent: Friday, February 1, 2013 9:39:43 AM
| Subject: Re: [infinispan-dev] Threadpools in a large cluster
| 
| It looks like the core problem is an incoming RPC-1 which triggers
| another blocking RPC-2: the thread delivering RPC-1 is blocked
| waiting
| for the response from RPC-2, and can therefore not be used to serve
| other requests for the duration of RPC-2. If RPC-2 takes a while,
| e.g.
| waiting to acquire a lock in the remote node, then it is clear that
| the
| thread pool will quickly exceed its max size.
| 
| A simple solution would be to prevent invoking blocking RPCs *from
| within* a received RPC. Let's take a look at an example:
| - A invokes a blocking PUT-1 on B
| - B forwards the request as blocking PUT-2 to C and D
| - When PUT-2 returns and B gets the responses from C and D (or the
| first
| one to respond, don't know exactly how this is implemented), it sends
| the response back to A (PUT-1 terminates now at A)
| 
| We could change this to the following:
| - A invokes a blocking PUT-1 on B
| - B receives PUT-1. Instead of invoking a blocking PUT-2 on C and D,
| it
| does the following:
|   - B invokes PUT-2 and gets a future
|   - B adds itself as a FutureListener, and it also stores the
| address of the original sender (A)
|   - When the FutureListener is invoked, B sends back the result
|   as a
| response to A
| - Whenever a member leaves the cluster, the corresponding futures are
| cancelled and removed from the hashmaps
| 
| This could probably be done differently (e.g. by sending asynchronous
| messages and implementing a finite state machine), but the core of
| the
| solution is the same; namely to avoid having an incoming thread block
| on
| a sync RPC.
| 
| Thoughts ?
| 
| 
| 
| 
| On 2/1/13 9:04 AM, Radim Vansa wrote:
|  Hi guys,
| 
|  after dealing with the large cluster for a while I find the way how
|  we use OOB threads in synchronous configuration non-robust.
|  Imagine a situation where node which is not an owner of the key
|  calls PUT. Then the a RPC is called to the primary owner of that
|  key, which reroutes the request to all other owners and after
|  these reply, it replies back.
|  There are two problems:
|  1) If we do simultanously X requests from non-owners to the primary
|  owner where X is OOB TP size, all the OOB threads are waiting for
|  the responses and there is no thread to process the OOB response
|  and release the thread.
|  2) Node A is primary owner of keyA, non-primary owner of keyB and B
|  is primary of keyB and non-primary of keyA. We got many requests
|  for both keyA and keyB from other nodes, therefore, all OOB
|  threads from both nodes call RPC to the non-primary owner but
|  there's noone who could process the request.
| 
|  While we wait for the requests to timeout, the nodes with depleted
|  OOB threadpools start suspecting all other nodes because they
|  can't receive heartbeats etc...
| 
|  You can say increase your OOB tp size, but that's not always an
|  option, I have currently set it to 1000 threads and it's not
|  enough. In the end, I will be always limited by RAM and something
|  tells me that even nodes with few gigs of RAM should be able to
|  form a huge cluster. We use 160 hotrod worker threads in JDG, that
|  means that 160 * clusterSize = 10240 (64 nodes in my cluster)
|  parallel requests can be executed, and if 10% targets the same
|  node with 1000 OOB threads, it stucks. It's about scaling and
|  robustness.
| 
|  Not that I'd have any good solution, but I'd really like to start a
|  discussion.
|  Thinking about it a bit, the problem is that blocking call (calling
|  RPC on primary owner from message handler) can block non-blocking
|  calls (such as RPC response or command that never sends any more
|  messages). Therefore, having a flag on message this won't send
|  another message could let the message be executed in different
|  threadpool, which will be never deadlocked. In fact, the pools
|  could share the threads but the non-blocking would have always a
|  few threads spare.
|  It's a bad solution as maintaining which message could block in the
|  other node is really, really hard (we can be sure only in case of
|  RPC responses), especially when some locks come. I will welcome
|  anything better.
| 
| --
| Bela Ban, JGroups lead (http://www.jgroups.org)
| 
| ___
| infinispan-dev mailing list
| infinispan-dev@lists.jboss.org
| https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-01 Thread Dan Berindei
Radim, do these problems happen with the HotRod server, or only with
memcached?

HotRod requests handled by non-owners should be very rare, instead the vast
majority should be handled by the primary owner directly. So if this
happens with HotRod, we should focus on fixing the HotRod routing instead
of focusing on how to handle a large number of requests from non-owners.

That being said, even if a HotRod put request is handled by the primary
owner, it generates (numOwners - 1) extra OOB requests. So if you have
160 HotRod worker threads per node, you can expect 4 * 160 OOB messages per
node. Multiply that by 2, because responses are OOB as well, and you can
get 1280 OOB messages before you even start reusing any HotRod worker
thread. Have you tried decreasing the number of HotRod workers?

The thing is, our OOB thread pool can't use queueing because we'd get a
queue full of commit commands while all the OOB threads are waiting on keys
that those commit commands would unlock. As the OOB thread pool is full, we
discard messages, which I suspect slows things down quite a bit (especially
if it's a credit request/response message). So it may well be that a lower
number of HotRod working threads would perform better.

On the other hand, why is increasing the number of OOB threads a solution?
With -Xss 512k, you can get 2000 threads with only 1 GB of virtual memory
(the actual used memory is probably even less, unless you're using huge
pages). AFAIK the Linux kernel doesn't break a sweat with 10 threads
running, so having 2000 threads just hanging around, waiting for a
response, should be such a problem.

I did chat with Bela (or was it a break-out session?) about moving
Infinispan's request processing to another thread pool during the team
meeting in Palma. That would leave the OOB thread pool free to receive
response messages, FD heartbeats, credit requests/responses etc. The
downside, I guess, is that each request would have to be passed to another
thread, and the context switch may slow things down a bit. But since the
new thread pool would be in Infinispan, we could even do tricks like
executing a commit/rollback directly on the OOB thread.

In the end, I just didn't feel that working on this was justified,
considering the number of critical bugs we had. But maybe now's the time to
start experimenting...



On Fri, Feb 1, 2013 at 10:04 AM, Radim Vansa rva...@redhat.com wrote:

 Hi guys,

 after dealing with the large cluster for a while I find the way how we use
 OOB threads in synchronous configuration non-robust.
 Imagine a situation where node which is not an owner of the key calls PUT.
 Then the a RPC is called to the primary owner of that key, which reroutes
 the request to all other owners and after these reply, it replies back.
 There are two problems:
 1) If we do simultanously X requests from non-owners to the primary owner
 where X is OOB TP size, all the OOB threads are waiting for the responses
 and there is no thread to process the OOB response and release the thread.
 2) Node A is primary owner of keyA, non-primary owner of keyB and B is
 primary of keyB and non-primary of keyA. We got many requests for both keyA
 and keyB from other nodes, therefore, all OOB threads from both nodes call
 RPC to the non-primary owner but there's noone who could process the
 request.

 While we wait for the requests to timeout, the nodes with depleted OOB
 threadpools start suspecting all other nodes because they can't receive
 heartbeats etc...

 You can say increase your OOB tp size, but that's not always an option,
 I have currently set it to 1000 threads and it's not enough. In the end, I
 will be always limited by RAM and something tells me that even nodes with
 few gigs of RAM should be able to form a huge cluster. We use 160 hotrod
 worker threads in JDG, that means that 160 * clusterSize = 10240 (64 nodes
 in my cluster) parallel requests can be executed, and if 10% targets the
 same node with 1000 OOB threads, it stucks. It's about scaling and
 robustness.

 Not that I'd have any good solution, but I'd really like to start a
 discussion.
 Thinking about it a bit, the problem is that blocking call (calling RPC on
 primary owner from message handler) can block non-blocking calls (such as
 RPC response or command that never sends any more messages). Therefore,
 having a flag on message this won't send another message could let the
 message be executed in different threadpool, which will be never
 deadlocked. In fact, the pools could share the threads but the non-blocking
 would have always a few threads spare.
 It's a bad solution as maintaining which message could block in the other
 node is really, really hard (we can be sure only in case of RPC responses),
 especially when some locks come. I will welcome anything better.

 Radim


 ---
 Radim Vansa
 Quality Assurance Engineer
 JBoss Datagrid
 tel. +420532294559 ext. 62559

 Red Hat Czech, 

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-01 Thread Manik Surtani

On 1 Feb 2013, at 09:39, Dan Berindei dan.berin...@gmail.com wrote:

 Radim, do these problems happen with the HotRod server, or only with 
 memcached?
 
 HotRod requests handled by non-owners should be very rare, instead the vast 
 majority should be handled by the primary owner directly. So if this happens 
 with HotRod, we should focus on fixing the HotRod routing instead of focusing 
 on how to handle a large number of requests from non-owners.

Well, even Hot Rod only optionally uses smart routing.  Some client libraries 
don't have this capability.

 
 That being said, even if a HotRod put request is handled by the primary 
 owner, it generates (numOwners - 1) extra OOB requests. So if you have 160 
 HotRod worker threads per node, you can expect 4 * 160 OOB messages per node. 
 Multiply that by 2, because responses are OOB as well, and you can get 1280 
 OOB messages before you even start reusing any HotRod worker thread. Have you 
 tried decreasing the number of HotRod workers?
 
 The thing is, our OOB thread pool can't use queueing because we'd get a queue 
 full of commit commands while all the OOB threads are waiting on keys that 
 those commit commands would unlock. As the OOB thread pool is full, we 
 discard messages, which I suspect slows things down quite a bit (especially 
 if it's a credit request/response message). So it may well be that a lower 
 number of HotRod working threads would perform better.
 
 On the other hand, why is increasing the number of OOB threads a solution? 
 With -Xss 512k, you can get 2000 threads with only 1 GB of virtual memory 
 (the actual used memory is probably even less, unless you're using huge 
 pages). AFAIK the Linux kernel doesn't break a sweat with 10 threads 
 running, so having 2000 threads just hanging around, waiting for a response, 
 should be such a problem.
 
 I did chat with Bela (or was it a break-out session?) about moving 
 Infinispan's request processing to another thread pool during the team 
 meeting in Palma. That would leave the OOB thread pool free to receive 
 response messages, FD heartbeats, credit requests/responses etc. The 
 downside, I guess, is that each request would have to be passed to another 
 thread, and the context switch may slow things down a bit. But since the new 
 thread pool would be in Infinispan, we could even do tricks like executing a 
 commit/rollback directly on the OOB thread.

Right.  I always got the impression we were abusing the OOB pool.  But in the 
end, I think it makes sense (in JGroups) to separate a service thread pool (for 
heartbeats, credits, etc) and an application thread pool (what we'd use instead 
of OOB).  This way you could even tune your service thread pool to just have, 
say, 2 threads, and the application thread pool to 1000 or whatever.

 In the end, I just didn't feel that working on this was justified, 
 considering the number of critical bugs we had. But maybe now's the time to 
 start experimenting…
 
 
 
 On Fri, Feb 1, 2013 at 10:04 AM, Radim Vansa rva...@redhat.com wrote:
 Hi guys,
 
 after dealing with the large cluster for a while I find the way how we use 
 OOB threads in synchronous configuration non-robust.
 Imagine a situation where node which is not an owner of the key calls PUT. 
 Then the a RPC is called to the primary owner of that key, which reroutes the 
 request to all other owners and after these reply, it replies back.
 There are two problems:
 1) If we do simultanously X requests from non-owners to the primary owner 
 where X is OOB TP size, all the OOB threads are waiting for the responses and 
 there is no thread to process the OOB response and release the thread.
 2) Node A is primary owner of keyA, non-primary owner of keyB and B is 
 primary of keyB and non-primary of keyA. We got many requests for both keyA 
 and keyB from other nodes, therefore, all OOB threads from both nodes call 
 RPC to the non-primary owner but there's noone who could process the request.
 
 While we wait for the requests to timeout, the nodes with depleted OOB 
 threadpools start suspecting all other nodes because they can't receive 
 heartbeats etc...
 
 You can say increase your OOB tp size, but that's not always an option, I 
 have currently set it to 1000 threads and it's not enough. In the end, I will 
 be always limited by RAM and something tells me that even nodes with few gigs 
 of RAM should be able to form a huge cluster. We use 160 hotrod worker 
 threads in JDG, that means that 160 * clusterSize = 10240 (64 nodes in my 
 cluster) parallel requests can be executed, and if 10% targets the same node 
 with 1000 OOB threads, it stucks. It's about scaling and robustness.
 
 Not that I'd have any good solution, but I'd really like to start a 
 discussion.
 Thinking about it a bit, the problem is that blocking call (calling RPC on 
 primary owner from message handler) can block non-blocking calls (such as RPC 
 response or command that never sends any more messages). 

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-01 Thread Pedro Ruivo
Hi,

I had a similar problem when I tried GMU[1] in large cluster (40 vms), 
because the remote gets and the commit messages (I'm talking about ISPN 
commands) must wait for some conditions before being processed.

I solved this problem by adding a feature in JGroups[2] that allows the 
request to be moved to another thread, releasing the OOB thread. The 
other thread will send the reply of the JGroups Request. Of course, I'm 
only moving commands that I know they can block.

I can enter in some detail if you want =)

Cheers,
Pedro

[1] http://www.gsd.inesc-id.pt/~romanop/files/papers/icdcs12.pdf
[2] I would like to talk with Bela about this, because it makes my life 
easier to support total order in ISPN. I'll try to send an email this 
weekend =)

On 01-02-2013 08:04, Radim Vansa wrote:
 Hi guys,

 after dealing with the large cluster for a while I find the way how we use 
 OOB threads in synchronous configuration non-robust.
 Imagine a situation where node which is not an owner of the key calls PUT. 
 Then the a RPC is called to the primary owner of that key, which reroutes the 
 request to all other owners and after these reply, it replies back.
 There are two problems:
 1) If we do simultanously X requests from non-owners to the primary owner 
 where X is OOB TP size, all the OOB threads are waiting for the responses and 
 there is no thread to process the OOB response and release the thread.
 2) Node A is primary owner of keyA, non-primary owner of keyB and B is 
 primary of keyB and non-primary of keyA. We got many requests for both keyA 
 and keyB from other nodes, therefore, all OOB threads from both nodes call 
 RPC to the non-primary owner but there's noone who could process the request.

 While we wait for the requests to timeout, the nodes with depleted OOB 
 threadpools start suspecting all other nodes because they can't receive 
 heartbeats etc...

 You can say increase your OOB tp size, but that's not always an option, I 
 have currently set it to 1000 threads and it's not enough. In the end, I will 
 be always limited by RAM and something tells me that even nodes with few gigs 
 of RAM should be able to form a huge cluster. We use 160 hotrod worker 
 threads in JDG, that means that 160 * clusterSize = 10240 (64 nodes in my 
 cluster) parallel requests can be executed, and if 10% targets the same node 
 with 1000 OOB threads, it stucks. It's about scaling and robustness.

 Not that I'd have any good solution, but I'd really like to start a 
 discussion.
 Thinking about it a bit, the problem is that blocking call (calling RPC on 
 primary owner from message handler) can block non-blocking calls (such as RPC 
 response or command that never sends any more messages). Therefore, having a 
 flag on message this won't send another message could let the message be 
 executed in different threadpool, which will be never deadlocked. In fact, 
 the pools could share the threads but the non-blocking would have always a 
 few threads spare.
 It's a bad solution as maintaining which message could block in the other 
 node is really, really hard (we can be sure only in case of RPC responses), 
 especially when some locks come. I will welcome anything better.

 Radim


 ---
 Radim Vansa
 Quality Assurance Engineer
 JBoss Datagrid
 tel. +420532294559 ext. 62559

 Red Hat Czech, s.r.o.
 Brno, Purkyňova 99/71, PSČ 612 45
 Czech Republic


 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-01 Thread Dan Berindei
On Fri, Feb 1, 2013 at 12:13 PM, Manik Surtani msurt...@redhat.com wrote:


 On 1 Feb 2013, at 09:39, Dan Berindei dan.berin...@gmail.com wrote:

  Radim, do these problems happen with the HotRod server, or only with
 memcached?
 
  HotRod requests handled by non-owners should be very rare, instead the
 vast majority should be handled by the primary owner directly. So if this
 happens with HotRod, we should focus on fixing the HotRod routing instead
 of focusing on how to handle a large number of requests from non-owners.

 Well, even Hot Rod only optionally uses smart routing.  Some client
 libraries don't have this capability.


True, and I meant to say that with memcached it should be much worse, but
at least in Radim's tests I hope smart routing is enabled.



 
  That being said, even if a HotRod put request is handled by the primary
 owner, it generates (numOwners - 1) extra OOB requests. So if you have
 160 HotRod worker threads per node, you can expect 4 * 160 OOB messages per
 node. Multiply that by 2, because responses are OOB as well, and you can
 get 1280 OOB messages before you even start reusing any HotRod worker
 thread. Have you tried decreasing the number of HotRod workers?
 
  The thing is, our OOB thread pool can't use queueing because we'd get a
 queue full of commit commands while all the OOB threads are waiting on keys
 that those commit commands would unlock. As the OOB thread pool is full, we
 discard messages, which I suspect slows things down quite a bit (especially
 if it's a credit request/response message). So it may well be that a lower
 number of HotRod working threads would perform better.
 
  On the other hand, why is increasing the number of OOB threads a
 solution? With -Xss 512k, you can get 2000 threads with only 1 GB of
 virtual memory (the actual used memory is probably even less, unless you're
 using huge pages). AFAIK the Linux kernel doesn't break a sweat with 10
 threads running, so having 2000 threads just hanging around, waiting for a
 response, should be such a problem.
 
  I did chat with Bela (or was it a break-out session?) about moving
 Infinispan's request processing to another thread pool during the team
 meeting in Palma. That would leave the OOB thread pool free to receive
 response messages, FD heartbeats, credit requests/responses etc. The
 downside, I guess, is that each request would have to be passed to another
 thread, and the context switch may slow things down a bit. But since the
 new thread pool would be in Infinispan, we could even do tricks like
 executing a commit/rollback directly on the OOB thread.

 Right.  I always got the impression we were abusing the OOB pool.  But in
 the end, I think it makes sense (in JGroups) to separate a service thread
 pool (for heartbeats, credits, etc) and an application thread pool (what
 we'd use instead of OOB).  This way you could even tune your service thread
 pool to just have, say, 2 threads, and the application thread pool to 1000
 or whatever.


A separate service pool would be good, but I think we could go further and
treat ClusteredGet/Commit/Rollback commands the same way, because they
can't block waiting for other commands to be processed.



  In the end, I just didn't feel that working on this was justified,
 considering the number of critical bugs we had. But maybe now's the time to
 start experimenting…
 
 
 
  On Fri, Feb 1, 2013 at 10:04 AM, Radim Vansa rva...@redhat.com wrote:
  Hi guys,
 
  after dealing with the large cluster for a while I find the way how we
 use OOB threads in synchronous configuration non-robust.
  Imagine a situation where node which is not an owner of the key calls
 PUT. Then the a RPC is called to the primary owner of that key, which
 reroutes the request to all other owners and after these reply, it replies
 back.
  There are two problems:
  1) If we do simultanously X requests from non-owners to the primary
 owner where X is OOB TP size, all the OOB threads are waiting for the
 responses and there is no thread to process the OOB response and release
 the thread.
  2) Node A is primary owner of keyA, non-primary owner of keyB and B is
 primary of keyB and non-primary of keyA. We got many requests for both keyA
 and keyB from other nodes, therefore, all OOB threads from both nodes call
 RPC to the non-primary owner but there's noone who could process the
 request.
 
  While we wait for the requests to timeout, the nodes with depleted OOB
 threadpools start suspecting all other nodes because they can't receive
 heartbeats etc...
 
  You can say increase your OOB tp size, but that's not always an
 option, I have currently set it to 1000 threads and it's not enough. In the
 end, I will be always limited by RAM and something tells me that even nodes
 with few gigs of RAM should be able to form a huge cluster. We use 160
 hotrod worker threads in JDG, that means that 160 * clusterSize = 10240 (64
 nodes in my cluster) parallel requests can be executed, and if 10%