Re: [infinispan-dev] Threadpools in a large cluster

2013-02-08 Thread Bela Ban
excellent, and we'll update it with our findings from the London meeting
On 2/8/13 5:19 PM, Mircea Markus wrote:
> I've created a JIRA in the scope of Infinispan 5.3 to track the thread 
> pool improvements discussed in this email thread:
> ISPN-2808 - Make Infinispan use its own thread pool for sending OOB 
> messages in order to avoid thread deadlocks
>
> On 8 Feb 2013, at 14:44, Mircea Markus wrote:
>
>>
>> On 8 Feb 2013, at 14:31, Bela Ban wrote:
>>
>>>
>>> On 2/8/13 3:23 PM, Mircea Markus wrote:
 Hi Bela,

 Do you think we can have this particular improvement in JGroups by 18
 Feb? That week Pedro, Dan and I are going to start integrating TO
 protocols in ISPN and this is a requirement for it.
>>>
>>> It is done and part of the alpha1 release...
>>
>>  thanks Bela!
>>
>> Cheers,
>> -- 
>> Mircea Markus
>> Infinispan lead (www.infinispan.org )
>>
>>
>>
>>
>> ___
>> infinispan-dev mailing list
>> infinispan-dev@lists.jboss.org 
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> Cheers,
> -- 
> Mircea Markus
> Infinispan lead (www.infinispan.org )
>
>
>
>
>
>
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

-- 
Bela Ban, JGroups lead (http://www.jgroups.org)

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] Threadpools in a large cluster

2013-02-08 Thread Bela Ban
yes, I mis-tagged it... :-(
On 2/8/13 3:33 PM, Pedro Ruivo wrote:
> Hi Bela,
>
> Alpha1 does not exist in GitHub. Instead you have 3.3.0.Alpha2 =)
>
> So, can I start to use it?
>
> Thanks
> Pedro
>
> On 2/8/13 2:31 PM, Bela Ban wrote:
>> On 2/8/13 3:23 PM, Mircea Markus wrote:
>>> Hi Bela,
>>>
>>> Do you think we can have this particular improvement in JGroups by 18
>>> Feb? That week Pedro, Dan and I are going to start integrating TO
>>> protocols in ISPN and this is a requirement for it.
>> It is done and part of the alpha1 release...
>>
>>
>>
>>> On 8 Feb 2013, at 12:41, Pedro Ruivo wrote:
 On 2/8/13 12:26 PM, Mircea Markus wrote:
> On 7 Feb 2013, at 11:36, Manik Surtani wrote:
>
>> On 7 Feb 2013, at 11:29, Bela Ban> >  wrote:
>>
>>> I meant to say that this is up to you guys to decide inwhich
>>> Infinispan
>>> release this will be used, but it will be available in JGroups 3.3.
>>>
>>> What's the strategy/schedule for 6 and 5.3 anyway ?
>> 5.3 will be relatively quick, just an incremental release mainly
>> targeting reviewing and putting in a few contribs on the queue.
>> E.g., we're hoping for 5.3 to be feature-complete in no more than 2
>> months.
>>
>> 6 is where we get to break API - specifically with the new
>> packaging, etc.
>>
>> But this seems like a pretty important feature/fix so it may make
>> sense to get this into 5.3.
> Pedro - please correct me if I'm wrong.
> This feature (or the alternative Pedro has already implemented in
> his branch) is used be the TO code that we'll integrate in 5.3. Se
> in order to have TO in ISPN we need these jgroups improvements in.
 yes, I'll need it. otherwise, I have to create another RpcCommand to
 send back the response generated by the other thread
>>> Cheers,
>>> -- 
>>> Mircea Markus
>>> Infinispan lead (www.infinispan.org)
>>>
>>>
>>>
>>>
>>>
>>>
>>> ___
>>> infinispan-dev mailing list
>>> infinispan-dev@lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

-- 
Bela Ban, JGroups lead (http://www.jgroups.org)

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] Threadpools in a large cluster

2013-02-08 Thread Mircea Markus
I've created a JIRA in the scope of Infinispan 5.3 to track the thread pool 
improvements discussed in this email thread:
ISPN-2808 - Make Infinispan use its own thread pool for sending OOB messages in 
order to avoid thread deadlocks

On 8 Feb 2013, at 14:44, Mircea Markus wrote:

> 
> On 8 Feb 2013, at 14:31, Bela Ban wrote:
> 
>> 
>> On 2/8/13 3:23 PM, Mircea Markus wrote:
>>> Hi Bela,
>>> 
>>> Do you think we can have this particular improvement in JGroups by 18 
>>> Feb? That week Pedro, Dan and I are going to start integrating TO 
>>> protocols in ISPN and this is a requirement for it.
>> 
>> It is done and part of the alpha1 release...
> 
> 
>  thanks Bela!
> 
> Cheers,
> -- 
> Mircea Markus
> Infinispan lead (www.infinispan.org)
> 
> 
> 
> 
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)




___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-08 Thread Mircea Markus

On 8 Feb 2013, at 14:31, Bela Ban wrote:

> 
> On 2/8/13 3:23 PM, Mircea Markus wrote:
>> Hi Bela,
>> 
>> Do you think we can have this particular improvement in JGroups by 18 
>> Feb? That week Pedro, Dan and I are going to start integrating TO 
>> protocols in ISPN and this is a requirement for it.
> 
> It is done and part of the alpha1 release...


 thanks Bela!

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)




___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-08 Thread Pedro Ruivo
Hi Bela,

Alpha1 does not exist in GitHub. Instead you have 3.3.0.Alpha2 =)

So, can I start to use it?

Thanks
Pedro

On 2/8/13 2:31 PM, Bela Ban wrote:
> On 2/8/13 3:23 PM, Mircea Markus wrote:
>> Hi Bela,
>>
>> Do you think we can have this particular improvement in JGroups by 18
>> Feb? That week Pedro, Dan and I are going to start integrating TO
>> protocols in ISPN and this is a requirement for it.
> It is done and part of the alpha1 release...
>
>
>
>> On 8 Feb 2013, at 12:41, Pedro Ruivo wrote:
>>> On 2/8/13 12:26 PM, Mircea Markus wrote:
 On 7 Feb 2013, at 11:36, Manik Surtani wrote:

> On 7 Feb 2013, at 11:29, Bela Ban >  wrote:
>
>> I meant to say that this is up to you guys to decide inwhich
>> Infinispan
>> release this will be used, but it will be available in JGroups 3.3.
>>
>> What's the strategy/schedule for 6 and 5.3 anyway ?
> 5.3 will be relatively quick, just an incremental release mainly
> targeting reviewing and putting in a few contribs on the queue.
> E.g., we're hoping for 5.3 to be feature-complete in no more than 2
> months.
>
> 6 is where we get to break API - specifically with the new
> packaging, etc.
>
> But this seems like a pretty important feature/fix so it may make
> sense to get this into 5.3.
 Pedro - please correct me if I'm wrong.
 This feature (or the alternative Pedro has already implemented in
 his branch) is used be the TO code that we'll integrate in 5.3. Se
 in order to have TO in ISPN we need these jgroups improvements in.
>>> yes, I'll need it. otherwise, I have to create another RpcCommand to
>>> send back the response generated by the other thread
>> Cheers,
>> -- 
>> Mircea Markus
>> Infinispan lead (www.infinispan.org)
>>
>>
>>
>>
>>
>>
>> ___
>> infinispan-dev mailing list
>> infinispan-dev@lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] Threadpools in a large cluster

2013-02-08 Thread Bela Ban

On 2/8/13 3:23 PM, Mircea Markus wrote:
> Hi Bela,
>
> Do you think we can have this particular improvement in JGroups by 18 
> Feb? That week Pedro, Dan and I are going to start integrating TO 
> protocols in ISPN and this is a requirement for it.

It is done and part of the alpha1 release...



>
> On 8 Feb 2013, at 12:41, Pedro Ruivo wrote:
>> On 2/8/13 12:26 PM, Mircea Markus wrote:
>>>
>>> On 7 Feb 2013, at 11:36, Manik Surtani wrote:
>>>
 On 7 Feb 2013, at 11:29, Bela Ban >>> > wrote:

> I meant to say that this is up to you guys to decide inwhich 
> Infinispan
> release this will be used, but it will be available in JGroups 3.3.
>
> What's the strategy/schedule for 6 and 5.3 anyway ?

 5.3 will be relatively quick, just an incremental release mainly 
 targeting reviewing and putting in a few contribs on the queue. 
 E.g., we're hoping for 5.3 to be feature-complete in no more than 2 
 months.

 6 is where we get to break API - specifically with the new 
 packaging, etc.

 But this seems like a pretty important feature/fix so it may make 
 sense to get this into 5.3.
>>>
>>> Pedro - please correct me if I'm wrong.
>>> This feature (or the alternative Pedro has already implemented in 
>>> his branch) is used be the TO code that we'll integrate in 5.3. Se 
>>> in order to have TO in ISPN we need these jgroups improvements in.
>> yes, I'll need it. otherwise, I have to create another RpcCommand to 
>> send back the response generated by the other thread
>>>
>
> Cheers,
> -- 
> Mircea Markus
> Infinispan lead (www.infinispan.org )
>
>
>
>
>
>
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

-- 
Bela Ban, JGroups lead (http://www.jgroups.org)

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] Threadpools in a large cluster

2013-02-08 Thread Mircea Markus
Hi Bela,

Do you think we can have this particular improvement in JGroups by 18 Feb? That 
week Pedro, Dan and I are going to start integrating TO protocols in ISPN and 
this is a requirement for it.

On 8 Feb 2013, at 12:41, Pedro Ruivo wrote:
> On 2/8/13 12:26 PM, Mircea Markus wrote:
>> 
>> 
>> On 7 Feb 2013, at 11:36, Manik Surtani wrote:
>> 
>>> On 7 Feb 2013, at 11:29, Bela Ban  wrote:
>>> 
 I meant to say that this is up to you guys to decide inwhich Infinispan 
 release this will be used, but it will be available in JGroups 3.3.
 
 What's the strategy/schedule for 6 and 5.3 anyway ?
>>> 
>>> 5.3 will be relatively quick, just an incremental release mainly targeting 
>>> reviewing and putting in a few contribs on the queue. E.g., we're hoping 
>>> for 5.3 to be feature-complete in no more than 2 months.
>>> 
>>> 6 is where we get to break API - specifically with the new packaging, etc.
>>> 
>>> But this seems like a pretty important feature/fix so it may make sense to 
>>> get this into 5.3.
>> 
>> Pedro - please correct me if I'm wrong.
>> This feature (or the alternative Pedro has already implemented in his 
>> branch) is used be the TO code that we'll integrate in 5.3. Se in order to 
>> have TO in ISPN we need these jgroups improvements in.
> yes, I'll need it. otherwise, I have to create another RpcCommand to send 
> back the response generated by the other thread
>> 

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)




___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-08 Thread Pedro Ruivo



On 2/8/13 12:26 PM, Mircea Markus wrote:


On 7 Feb 2013, at 11:36, Manik Surtani wrote:

On 7 Feb 2013, at 11:29, Bela Ban > wrote:



I meant to say that this is up to you guys to decide inwhich Infinispan
release this will be used, but it will be available in JGroups 3.3.

What's the strategy/schedule for 6 and 5.3 anyway ?


5.3 will be relatively quick, just an incremental release mainly 
targeting reviewing and putting in a few contribs on the queue. E.g., 
we're hoping for 5.3 to be feature-complete in no more than 2 months.


6 is where we get to break API - specifically with the new packaging, 
etc.


But this seems like a pretty important feature/fix so it may make 
sense to get this into 5.3.


Pedro - please correct me if I'm wrong.
This feature (or the alternative Pedro has already implemented in his 
branch) is used be the TO code that we'll integrate in 5.3. Se in 
order to have TO in ISPN we need these jgroups improvements in.
yes, I'll need it. otherwise, I have to create another RpcCommand to 
send back the response generated by the other thread
Pedro - are there any other modifications in *your branch* of jgroups 
that require integration into jgroups upstream from a TO perspective?

No, just this...

Cheers
Pedro


Cheers,
--
Mircea Markus
Infinispan lead (www.infinispan.org )





___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-08 Thread Mircea Markus

On 7 Feb 2013, at 11:36, Manik Surtani wrote:

> On 7 Feb 2013, at 11:29, Bela Ban  wrote:
> 
>> I meant to say that this is up to you guys to decide inwhich Infinispan 
>> release this will be used, but it will be available in JGroups 3.3.
>> 
>> What's the strategy/schedule for 6 and 5.3 anyway ?
> 
> 5.3 will be relatively quick, just an incremental release mainly targeting 
> reviewing and putting in a few contribs on the queue. E.g., we're hoping for 
> 5.3 to be feature-complete in no more than 2 months.
> 
> 6 is where we get to break API - specifically with the new packaging, etc.
> 
> But this seems like a pretty important feature/fix so it may make sense to 
> get this into 5.3.

Pedro - please correct me if I'm wrong.
This feature (or the alternative Pedro has already implemented in his branch) 
is used be the TO code that we'll integrate in 5.3. Se in order to have TO in 
ISPN we need these jgroups improvements in. Pedro - are there any other 
modifications in *your branch* of jgroups that require integration into jgroups 
upstream from a TO perspective? 

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)




___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-07 Thread Radim Vansa
|  
| after dealing with the large cluster for a while I find the way how
| we use OOB threads in synchronous configuration non-robust.
| Imagine a situation where node which is not an owner of the key calls
| PUT. Then the a RPC is called to the primary owner of that key,
| which reroutes the request to all other owners and after these
| reply, it replies back.
| 
| 
| 
| This delegation RPC pattern happens for non-transactional caches
| only. Do you have the same problem with transactional caches as
| well?
| 

I started testing with non-tx cache as this has great performance now. No, I 
have not tested transactional caches yet. In future I'll definitely try it, but 
as we have some other big cluster test scheduled and next week our big lab 
won't be available don't expect it very soon.

Radim
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] Threadpools in a large cluster

2013-02-07 Thread Dan Berindei
On Thu, Feb 7, 2013 at 8:05 PM, Mircea Markus  wrote:

>
> On 1 Feb 2013, at 09:54, Dan Berindei wrote:
>
> Yeah, I wouldn't call this a "simple" solution...
>
> The distribution/replication interceptors are quite high in the
> interceptor stack, so we'd have to save the state of the interceptor stack
> (basically the thread's stack) somehow and resume processing it on the
> thread receiving the responses. In a language that supports continuations
> that would be a piece of cake, but since we're in Java we'd have to
> completely change the way the interceptor stack works.
>
> Actually we do hold the lock on modified keys while the command is
> replicated to the other owners. But think locking wouldn't be a problem: we
> already allow locks to be owned by transactions instead of threads, so it
> would just be a matter of creating a "lite transaction" for
> non-transactional caches. Obviously the TransactionSynchronizerInterceptor
> would have to go, but I see that as a positive thing ;)
>
> The TransactionSynchronizerInterceptor protected the CacheTransaction
> objects from multiple writes, we'd still need that because of the NBST
> forwarding.
>

We wouldn't need it if access to the Collection members in CacheTransaction
was properly synchronized. Perhaps hack is too strong a word, let's just
say I'm seeing TransactionSynchronizerInterceptor as a temporary solution :)


> So yeah, it could work, but it would take a huge amount of effort and it's
> going to obfuscate the code. Plus, I'm not at all convinced that it's going
> to improve performance that much compared to a new thread pool.
>
> +1
>
>
> Cheers
> Dan
>
>
> On Fri, Feb 1, 2013 at 10:59 AM, Radim Vansa  wrote:
>
>> Yeah, that would work if it is possible to break execution path into the
>> FutureListener from the middle of interceptor stack - I am really not sure
>> about that but as in current design no locks should be held when a RPC is
>> called, it may be possible.
>>
>> Let's see what someone more informed (Dan?) would think about that.
>>
>> Thanks, Bela
>>
>> Radim
>>
>> - Original Message -
>> | From: "Bela Ban" 
>> | To: infinispan-dev@lists.jboss.org
>> | Sent: Friday, February 1, 2013 9:39:43 AM
>> | Subject: Re: [infinispan-dev] Threadpools in a large cluster
>> |
>> | It looks like the core problem is an incoming RPC-1 which triggers
>> | another blocking RPC-2: the thread delivering RPC-1 is blocked
>> | waiting
>> | for the response from RPC-2, and can therefore not be used to serve
>> | other requests for the duration of RPC-2. If RPC-2 takes a while,
>> | e.g.
>> | waiting to acquire a lock in the remote node, then it is clear that
>> | the
>> | thread pool will quickly exceed its max size.
>> |
>> | A simple solution would be to prevent invoking blocking RPCs *from
>> | within* a received RPC. Let's take a look at an example:
>> | - A invokes a blocking PUT-1 on B
>> | - B forwards the request as blocking PUT-2 to C and D
>> | - When PUT-2 returns and B gets the responses from C and D (or the
>> | first
>> | one to respond, don't know exactly how this is implemented), it sends
>> | the response back to A (PUT-1 terminates now at A)
>> |
>> | We could change this to the following:
>> | - A invokes a blocking PUT-1 on B
>> | - B receives PUT-1. Instead of invoking a blocking PUT-2 on C and D,
>> | it
>> | does the following:
>> |   - B invokes PUT-2 and gets a future
>> |   - B adds itself as a FutureListener, and it also stores the
>> | address of the original sender (A)
>> |   - When the FutureListener is invoked, B sends back the result
>> |   as a
>> | response to A
>> | - Whenever a member leaves the cluster, the corresponding futures are
>> | cancelled and removed from the hashmaps
>> |
>> | This could probably be done differently (e.g. by sending asynchronous
>> | messages and implementing a finite state machine), but the core of
>> | the
>> | solution is the same; namely to avoid having an incoming thread block
>> | on
>> | a sync RPC.
>> |
>> | Thoughts ?
>> |
>> |
>> |
>> |
>> | On 2/1/13 9:04 AM, Radim Vansa wrote:
>> | > Hi guys,
>> | >
>> | > after dealing with the large cluster for a while I find the way how
>> | > we use OOB threads in synchronous configuration non-robust.
>> | > Imagine a situation where node which is not an owner of the key
>> | &

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-07 Thread Mircea Markus

On 1 Feb 2013, at 09:54, Dan Berindei wrote:

> Yeah, I wouldn't call this a "simple" solution...
> 
> The distribution/replication interceptors are quite high in the interceptor 
> stack, so we'd have to save the state of the interceptor stack (basically the 
> thread's stack) somehow and resume processing it on the thread receiving the 
> responses. In a language that supports continuations that would be a piece of 
> cake, but since we're in Java we'd have to completely change the way the 
> interceptor stack works.
> 
> Actually we do hold the lock on modified keys while the command is replicated 
> to the other owners. But think locking wouldn't be a problem: we already 
> allow locks to be owned by transactions instead of threads, so it would just 
> be a matter of creating a "lite transaction" for non-transactional caches. 
> Obviously the TransactionSynchronizerInterceptor would have to go, but I see 
> that as a positive thing ;)
The TransactionSynchronizerInterceptor protected the CacheTransaction objects 
from multiple writes, we'd still need that because of the NBST forwarding.
> 
> So yeah, it could work, but it would take a huge amount of effort and it's 
> going to obfuscate the code. Plus, I'm not at all convinced that it's going 
> to improve performance that much compared to a new thread pool.
+1
> 
> Cheers
> Dan
> 
> 
> On Fri, Feb 1, 2013 at 10:59 AM, Radim Vansa  wrote:
> Yeah, that would work if it is possible to break execution path into the 
> FutureListener from the middle of interceptor stack - I am really not sure 
> about that but as in current design no locks should be held when a RPC is 
> called, it may be possible.
> 
> Let's see what someone more informed (Dan?) would think about that.
> 
> Thanks, Bela
> 
> Radim
> 
> - Original Message -
> | From: "Bela Ban" 
> | To: infinispan-dev@lists.jboss.org
> | Sent: Friday, February 1, 2013 9:39:43 AM
> | Subject: Re: [infinispan-dev] Threadpools in a large cluster
> |
> | It looks like the core problem is an incoming RPC-1 which triggers
> | another blocking RPC-2: the thread delivering RPC-1 is blocked
> | waiting
> | for the response from RPC-2, and can therefore not be used to serve
> | other requests for the duration of RPC-2. If RPC-2 takes a while,
> | e.g.
> | waiting to acquire a lock in the remote node, then it is clear that
> | the
> | thread pool will quickly exceed its max size.
> |
> | A simple solution would be to prevent invoking blocking RPCs *from
> | within* a received RPC. Let's take a look at an example:
> | - A invokes a blocking PUT-1 on B
> | - B forwards the request as blocking PUT-2 to C and D
> | - When PUT-2 returns and B gets the responses from C and D (or the
> | first
> | one to respond, don't know exactly how this is implemented), it sends
> | the response back to A (PUT-1 terminates now at A)
> |
> | We could change this to the following:
> | - A invokes a blocking PUT-1 on B
> | - B receives PUT-1. Instead of invoking a blocking PUT-2 on C and D,
> | it
> | does the following:
> |   - B invokes PUT-2 and gets a future
> |   - B adds itself as a FutureListener, and it also stores the
> | address of the original sender (A)
> |   - When the FutureListener is invoked, B sends back the result
> |   as a
> | response to A
> | - Whenever a member leaves the cluster, the corresponding futures are
> | cancelled and removed from the hashmaps
> |
> | This could probably be done differently (e.g. by sending asynchronous
> | messages and implementing a finite state machine), but the core of
> | the
> | solution is the same; namely to avoid having an incoming thread block
> | on
> | a sync RPC.
> |
> | Thoughts ?
> |
> |
> |
> |
> | On 2/1/13 9:04 AM, Radim Vansa wrote:
> | > Hi guys,
> | >
> | > after dealing with the large cluster for a while I find the way how
> | > we use OOB threads in synchronous configuration non-robust.
> | > Imagine a situation where node which is not an owner of the key
> | > calls PUT. Then the a RPC is called to the primary owner of that
> | > key, which reroutes the request to all other owners and after
> | > these reply, it replies back.
> | > There are two problems:
> | > 1) If we do simultanously X requests from non-owners to the primary
> | > owner where X is OOB TP size, all the OOB threads are waiting for
> | > the responses and there is no thread to process the OOB response
> | > and release the thread.
> | > 2) Node A is primary owner of keyA, non-primary owner of keyB and B
> | > is primary of keyB and n

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-07 Thread Mircea Markus

On 1 Feb 2013, at 08:04, Radim Vansa wrote:

> Hi guys,
> 
> after dealing with the large cluster for a while I find the way how we use 
> OOB threads in synchronous configuration non-robust.
> Imagine a situation where node which is not an owner of the key calls PUT. 
> Then the a RPC is called to the primary owner of that key, which reroutes the 
> request to all other owners and after these reply, it replies back.

This delegation RPC pattern happens for non-transactional caches only. Do you 
have the same problem with transactional caches as well?

> There are two problems:
> 1) If we do simultanously X requests from non-owners to the primary owner 
> where X is OOB TP size, all the OOB threads are waiting for the responses and 
> there is no thread to process the OOB response and release the thread.
> 2) Node A is primary owner of keyA, non-primary owner of keyB and B is 
> primary of keyB and non-primary of keyA. We got many requests for both keyA 
> and keyB from other nodes, therefore, all OOB threads from both nodes call 
> RPC to the non-primary owner but there's noone who could process the request.
> 
> While we wait for the requests to timeout, the nodes with depleted OOB 
> threadpools start suspecting all other nodes because they can't receive 
> heartbeats etc...
> 
> You can say "increase your OOB tp size", but that's not always an option, I 
> have currently set it to 1000 threads and it's not enough. In the end, I will 
> be always limited by RAM and something tells me that even nodes with few gigs 
> of RAM should be able to form a huge cluster. We use 160 hotrod worker 
> threads in JDG, that means that 160 * clusterSize = 10240 (64 nodes in my 
> cluster) parallel requests can be executed, and if 10% targets the same node 
> with 1000 OOB threads, it stucks. It's about scaling and robustness.
> 
> Not that I'd have any good solution, but I'd really like to start a 
> discussion.
> Thinking about it a bit, the problem is that blocking call (calling RPC on 
> primary owner from message handler) can block non-blocking calls (such as RPC 
> response or command that never sends any more messages). Therefore, having a 
> flag on message "this won't send another message" could let the message be 
> executed in different threadpool, which will be never deadlocked. In fact, 
> the pools could share the threads but the non-blocking would have always a 
> few threads spare.
> It's a bad solution as maintaining which message could block in the other 
> node is really, really hard (we can be sure only in case of RPC responses), 
> especially when some locks come. I will welcome anything better.
> 
> Radim
> 
> 
> ---
> Radim Vansa
> Quality Assurance Engineer
> JBoss Datagrid
> tel. +420532294559 ext. 62559
> 
> Red Hat Czech, s.r.o.
> Brno, Purkyňova 99/71, PSČ 612 45
> Czech Republic
> 
> 
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)




___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-07 Thread Bela Ban

On 2/7/13 2:55 PM, Pedro Ruivo wrote:
>
>> The default would be to invoke the old handle(Message) method. The
>> dispatching mechanism could be changed to use the async method by
>> setting an attribute in MessageDispatcher (which in turn sets it in
>> RequestCorrelator).
>>
>> How would you do this ? Remember, we cannot change or remove
>> handle(Message) as subclasses of RpcDispatcher or MessageDispatcher, or
>> impls of RequestHandler are out there and any change to handle(Message)
>> would break them.
>>
>> Would you simply provide a separate AsyncRequestHandler interface, not
>> extending RequestHandler ? This would require RequestCorrelator and
>> MessageDispatcher to have 2 refs instead of 1. With the current approach
>> I can do an instanceof on the RequestHandler.
>>
>> I eventually like to merge RequestHandler and AsyncRequestHandler into 1
>> class, but this can be done in 4.0 at the earliest time.
>>
> My opinion, I would use a separate a interface and 2 references (while
> RequestHandler is still active).

That's not good because that would be an API change and would break code 
in MuxRpcDispatcher/MuxRequestCorrelator.
I've pushed my branch JGRP-1587 to the repo, so you can look at the 
code. This isn't final yet of course.

>   However, I will change the
> AsyncRequestHandler interface to:
>
> Object handle(Message message, Response response) throws Exception

OK, that's what it is currently


>
> and the Response interface would be as follow:
>
> void sendResponse(Object reply, boolean isException) //or sendReply(...);


Yes


>
> void setAsyncResponse(boolean value); //or setAsyncReply(...);
>
> boolean isAsyncResponse(); //or isAsyncReply();


I don't like this. Intuitively, we create a response or not, depending 
on information in the request. Also, 2 more method to implement.


>
> And with this, it can supports both behaviors, with a minor addition:
>
> 1) To work in async mode:
>   the application invokes setAsyncResponse(true) and eventually it
> will invoke the sendResponse(...)
>
> 2) To work in sync mode:
>   the application invokes setAsyncResponse(false). I think this
> should be the default value
>
> in the RequestCorrelator, it checks if isAsyncReponse() value. If true,
> it does not send the response (it only sends it if it is an exception
> caught). If the value is false, it sends the return value as a response
> (the current behavior). In pseudo code:
>
> try {
>   ret = asyncRequestHandler.handle(message, response);
>   if (response expected and !response.isAsynResponse())
> reponse.sendResponse(ret, false);
> catch Throwable t
>   reponse.sendResponse(t, true)
>
> With this suggestion, when you remove the RequestHandler interface (in
> 4.0 or 5.0), the user only needs to add the Response parameter and
> everything will work as usual.
>
> For Infinispan 5.3, it can be adopted as soon as it is implemented in
> JGroups without using the async method.
>
> To avoid the problem of sending twice a response, I would put an
> AtomicBoolean or a similar synchronization mechanism in the Response
> implementation.
>
> opinions?
>
> //more below
 - AsyncRequestHandler will look similar to the following:
 void handle(Message request, Handback hb, boolean requires_response)
 throws Throwable;
 - Handback is an interface, and its impl contains header information
 (e.g. request ID)
 - Handback has a sendReply(Object reply, boolean is_exception) method
 which sends a response (or exception) back to the caller
 - When requires_response is false, the AsyncRequestHandler doesn't need
 to invoke sendReply()
>>> My 2 cents, I think that the boolean requires_response should be in
>>> Handback implementation to avoid passing an extra argument
>> I actually removed the requires_response parameter. If no response is
>> required, rsp in handle(Message req, Response rsp) will simply be null.
> I would avoid put the rsp in null and I would do an empty implementation
> of the Response interface, in which the methods do nothing.

Can't do that as I want to do stuff like:

void handle(Message req, response rsp) {
 try {
 // invoke the request
 }
 catch(Throwable t) {
  if(rsp != null)
   rsp.send(t, true); // send the exception back to the caller
  else
   logError();
 }
}



-- 
Bela Ban, JGroups lead (http://www.jgroups.org)

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] Threadpools in a large cluster

2013-02-07 Thread Dan Berindei
On Thu, Feb 7, 2013 at 3:55 PM, Pedro Ruivo  wrote:

> Hi, see inline
>
> Cheers,
> Pedro
>
> On 2/7/13 11:42 AM, Bela Ban wrote:
> > On 2/7/13 12:29 PM, Pedro Ruivo wrote:
> >> Hi Bela
> >>
> >> On 2/7/13 4:53 AM, Bela Ban wrote:
> >>> Hi Pedro,
> >>>
> >>> this is almost exactly what I wanted to implement !
> >>>
> >>> Question:
> >>> - In RequestCorrelator.handleRequest():
> >>>
> >>> protected void handleRequest(Message req, Header hdr) {
> >>> Object retval;
> >>> boolean threwException = false;
> >>> MessageRequest messageRequest = new MessageRequestImpl(req, hdr);
> >>> try {
> >>> retval=request_handler.handle(messageRequest);
> >>> } catch(Throwable t) {
> >>> retval=t;
> >>> threwException = true;
> >>> }
> >>> messageRequest.sendReply(retval, threwException);//<-- should be moved
> >>> up, or called only if threwException == true
> >>> }
> >>>
> >>>
> >>> , you create a MessageRequestImpl and pass it to the RequestHandler.
> The
> >>> request handler then dispatches the request (possibly) to a thread pool
> >>> and calls MessageRequestImpl.sendReply() when done.
> >>>
> >>> However, you also call MessageRequest.sendReply() before returning from
> >>> handleRequest(). I think this is an error, and
> >>> MessageRequest.sendReply() should be moved up inside the catch clause,
> >>> or be called only if threwException is true, so that we send a reply on
> >>> behalf of the RequestHandler if and only if it threw an exception (e.g.
> >>> before it dispatches the request to a thread pool). Otherwise, we'd
> send
> >>> a reply *twice* !
> >> In my defense, I was assuming if the application uses the sendReply()
> >> method, it must return a special return value: DO_NOT_REPLY (in
> >> RequestHandler interface).
> >> This return value is automatically ignored:
> >>
> >> public final void sendReply(Object reply, boolean exceptionThrown) {
> >> if(!header.rsp_expected || reply == RequestHandler.DO_NOT_REPLY)
> >> return;
> > OK
> >
> >
> >>> A few changes I have in mind (need to think about it more):
> >>>
> >>> - I want to leave the existing RequestHandler interface in place, so
> >>> current implementation continue to work
> >>> - There will be a new AsyncRequestHandler interface (possibly extending
> >>> RequestHandler, so an implementation can decide to implement both). The
> >>> RequestCorrelator needs to have either request_handler or
> >>> async_request_handler set. If the former is set, the logic is
> unchanged.
> >>> If the latter is set I'll invoke the async dispatching code
> >> I'm not sure if it is a good idea to have the AsyncRequestHandler
> >> extending the RequestHandler interface. If the application implements
> >> both methods (Object handle(Message) and void handle(Message, ...)) how
> >> do you know which method should be invoked?
> >
> > The default would be to invoke the old handle(Message) method. The
> > dispatching mechanism could be changed to use the async method by
> > setting an attribute in MessageDispatcher (which in turn sets it in
> > RequestCorrelator).
> >
> > How would you do this ? Remember, we cannot change or remove
> > handle(Message) as subclasses of RpcDispatcher or MessageDispatcher, or
> > impls of RequestHandler are out there and any change to handle(Message)
> > would break them.
> >
> > Would you simply provide a separate AsyncRequestHandler interface, not
> > extending RequestHandler ? This would require RequestCorrelator and
> > MessageDispatcher to have 2 refs instead of 1. With the current approach
> > I can do an instanceof on the RequestHandler.
> >
> > I eventually like to merge RequestHandler and AsyncRequestHandler into 1
> > class, but this can be done in 4.0 at the earliest time.
> >
> My opinion, I would use a separate a interface and 2 references (while
>

I'd go for 2 references as well, I think the null check is easier for the
JIT/CPU to optimize.

I also think the two interfaces represent two different use cases, so it
should be clear to a reader that a class that implements
AsyncRequestHandler really is using async delivery.


RequestHandler is still active). However, I will change the
> AsyncRequestHandler interface to:
>
> Object handle(Message message, Response response) throws Exception
>
> and the Response interface would be as follow:
>
> void sendResponse(Object reply, boolean isException) //or sendReply(...);
>
> void setAsyncResponse(boolean value); //or setAsyncReply(...);
>
> boolean isAsyncResponse(); //or isAsyncReply();
>
> And with this, it can supports both behaviors, with a minor addition:
>
> 1) To work in async mode:
>  the application invokes setAsyncResponse(true) and eventually it
> will invoke the sendResponse(...)
>
> 2) To work in sync mode:
>  the application invokes setAsyncResponse(false). I think this
> should be the default value
>
> in the RequestCorrelator, it checks if isAsyncReponse() value. If true,
> it does not send the response (it only sends it if it is an exception
> caught). If the value is false,

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-07 Thread Pedro Ruivo
Hi, see inline

Cheers,
Pedro

On 2/7/13 11:42 AM, Bela Ban wrote:
> On 2/7/13 12:29 PM, Pedro Ruivo wrote:
>> Hi Bela
>>
>> On 2/7/13 4:53 AM, Bela Ban wrote:
>>> Hi Pedro,
>>>
>>> this is almost exactly what I wanted to implement !
>>>
>>> Question:
>>> - In RequestCorrelator.handleRequest():
>>>
>>> protected void handleRequest(Message req, Header hdr) {
>>> Object retval;
>>> boolean threwException = false;
>>> MessageRequest messageRequest = new MessageRequestImpl(req, hdr);
>>> try {
>>> retval=request_handler.handle(messageRequest);
>>> } catch(Throwable t) {
>>> retval=t;
>>> threwException = true;
>>> }
>>> messageRequest.sendReply(retval, threwException);//<-- should be moved
>>> up, or called only if threwException == true
>>> }
>>>
>>>
>>> , you create a MessageRequestImpl and pass it to the RequestHandler. The
>>> request handler then dispatches the request (possibly) to a thread pool
>>> and calls MessageRequestImpl.sendReply() when done.
>>>
>>> However, you also call MessageRequest.sendReply() before returning from
>>> handleRequest(). I think this is an error, and
>>> MessageRequest.sendReply() should be moved up inside the catch clause,
>>> or be called only if threwException is true, so that we send a reply on
>>> behalf of the RequestHandler if and only if it threw an exception (e.g.
>>> before it dispatches the request to a thread pool). Otherwise, we'd send
>>> a reply *twice* !
>> In my defense, I was assuming if the application uses the sendReply()
>> method, it must return a special return value: DO_NOT_REPLY (in
>> RequestHandler interface).
>> This return value is automatically ignored:
>>
>> public final void sendReply(Object reply, boolean exceptionThrown) {
>> if(!header.rsp_expected || reply == RequestHandler.DO_NOT_REPLY)
>> return;
> OK
>
>
>>> A few changes I have in mind (need to think about it more):
>>>
>>> - I want to leave the existing RequestHandler interface in place, so
>>> current implementation continue to work
>>> - There will be a new AsyncRequestHandler interface (possibly extending
>>> RequestHandler, so an implementation can decide to implement both). The
>>> RequestCorrelator needs to have either request_handler or
>>> async_request_handler set. If the former is set, the logic is unchanged.
>>> If the latter is set I'll invoke the async dispatching code
>> I'm not sure if it is a good idea to have the AsyncRequestHandler
>> extending the RequestHandler interface. If the application implements
>> both methods (Object handle(Message) and void handle(Message, ...)) how
>> do you know which method should be invoked?
>
> The default would be to invoke the old handle(Message) method. The
> dispatching mechanism could be changed to use the async method by
> setting an attribute in MessageDispatcher (which in turn sets it in
> RequestCorrelator).
>
> How would you do this ? Remember, we cannot change or remove
> handle(Message) as subclasses of RpcDispatcher or MessageDispatcher, or
> impls of RequestHandler are out there and any change to handle(Message)
> would break them.
>
> Would you simply provide a separate AsyncRequestHandler interface, not
> extending RequestHandler ? This would require RequestCorrelator and
> MessageDispatcher to have 2 refs instead of 1. With the current approach
> I can do an instanceof on the RequestHandler.
>
> I eventually like to merge RequestHandler and AsyncRequestHandler into 1
> class, but this can be done in 4.0 at the earliest time.
>
My opinion, I would use a separate a interface and 2 references (while 
RequestHandler is still active). However, I will change the 
AsyncRequestHandler interface to:

Object handle(Message message, Response response) throws Exception

and the Response interface would be as follow:

void sendResponse(Object reply, boolean isException) //or sendReply(...);

void setAsyncResponse(boolean value); //or setAsyncReply(...);

boolean isAsyncResponse(); //or isAsyncReply();

And with this, it can supports both behaviors, with a minor addition:

1) To work in async mode:
 the application invokes setAsyncResponse(true) and eventually it 
will invoke the sendResponse(...)

2) To work in sync mode:
 the application invokes setAsyncResponse(false). I think this 
should be the default value

in the RequestCorrelator, it checks if isAsyncReponse() value. If true, 
it does not send the response (it only sends it if it is an exception 
caught). If the value is false, it sends the return value as a response 
(the current behavior). In pseudo code:

try {
 ret = asyncRequestHandler.handle(message, response);
 if (response expected and !response.isAsynResponse()) 
reponse.sendResponse(ret, false);
catch Throwable t
 reponse.sendResponse(t, true)

With this suggestion, when you remove the RequestHandler interface (in 
4.0 or 5.0), the user only needs to add the Response parameter and 
everything will work as usual.

For Infinispan 5.3, it can be adopted as soon as it is implemented in 
JGroups wit

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-07 Thread Bela Ban


On 2/7/13 12:36 PM, Manik Surtani wrote:
> On 7 Feb 2013, at 11:29, Bela Ban  wrote:
>
>> I meant to say that this is up to you guys to decide inwhich Infinispan
>> release this will be used, but it will be available in JGroups 3.3.
>>
>> What's the strategy/schedule for 6 and 5.3 anyway ?
> 5.3 will be relatively quick, just an incremental release mainly targeting 
> reviewing and putting in a few contribs on the queue.  E.g., we're hoping for 
> 5.3 to be feature-complete in no more than 2 months.

OK, so let's target JGroups 3.3 to be included in Infinispan 5.3. Then 
you have a choice of doing this in 5.3 or waiting until 6.

>
> 6 is where we get to break API - specifically with the new packaging, etc.
>
> But this seems like a pretty important feature/fix so it may make sense to 
> get this into 5.3.
>
>

-- 
Bela Ban, JGroups lead (http://www.jgroups.org)

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] Threadpools in a large cluster

2013-02-07 Thread Bela Ban

On 2/7/13 1:38 PM, Dan Berindei wrote:
>
>
>
>
>
> No, I think it'll apply to all messages. A simple implementation could
> dispatch OOB messages to the thread pool, as they don't need to be
> ordered. Regular messages could be added to a queue where they are
> processed sequentially by a *single* thread. Pedro does implement
> ordering based on transactions (see his prev email), and I think there
> are some other good use cases for regular messages. I think one thing
> that could be done for regular messages is to implement something like
> SCOPE (remember ?) for async RPCs: updates to different web sessions
> could be processed concurrently, only updates to the *same* session
> would have to be ordered.
>
>
> Yeah, I agree implementing the regular message ordering ourselves 
> would give us a little more room for optimizations. But it would make 
> our part more complicated, too. Well, not for Total Ordering, because 
> Pedro already implemented it, but for our regular async scenarios we'd 
> need to add a thread pool (we want to allow 2 threads from different 
> sources to access different keys at the same time).

We don't need to implement this *now*, but let's agree on the API 
(feedback from Pedro and you guys), and add this in JGroups 3.3.

You can start out with a very simple implementation, or no 
implementation at all (then you'd have the same behavior as now), but 
the point is that you can do this at your own pace, and don't depend on 
any changes in JGroups (once 3.3 is used by Infinispan).

I agree that optimization of thread/request processing *while still 
preserving ordering where needed* might be a bit tricky, but the 
potential benefits are great. You could for example add request 
priorities by adding a parameter/flag to RPCs and dispatching them to a 
priority queue in your thread pool, without me having to change anything.

-- 
Bela Ban, JGroups lead (http://www.jgroups.org)

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] Threadpools in a large cluster

2013-02-07 Thread Dan Berindei
On Thu, Feb 7, 2013 at 12:43 PM, Bela Ban  wrote:

>
> On 2/7/13 11:09 AM, Dan Berindei wrote:
> >
> >
> > A few changes I have in mind (need to think about it more):
> >
> > - I want to leave the existing RequestHandler interface in place, so
> > current implementation continue to work
> > - There will be a new AsyncRequestHandler interface (possibly
> > extending
> > RequestHandler, so an implementation can decide to implement
> > both). The
> > RequestCorrelator needs to have either request_handler or
> > async_request_handler set. If the former is set, the logic is
> > unchanged.
> > If the latter is set I'll invoke the async dispatching code
> >
> > - AsyncRequestHandler will look similar to the following:
> > void handle(Message request, Handback hb, boolean requires_response)
> > throws Throwable;
> > - Handback is an interface, and its impl contains header information
> > (e.g. request ID)
> > - Handback has a sendReply(Object reply, boolean is_exception) method
> > which sends a response (or exception) back to the caller
> >
> >
> > +1 for a new interface. TBH I hadn't read the RequestCorrelator code,
> > so I had assumed it was already asynchronous, and only RpcDispatcher
> > was synchronous.
>
>
> Nope, unfortunately not.
>
>
>

> >
> > I'm not so sure about the Handback name, how about calling it Response
> > instead?
>
>
> It *is* actually called Response (can you read my mind?) :-)
>
>
Nice :)


> >
> > - When requires_response is false, the AsyncRequestHandler doesn't
> > need to invoke sendReply()
> >
> >
> > I think this should be the other way around: when requires_response is
> > true, the AsyncRequestHandler *can* invoke sendReply(), but is not
> > required to (the call will just time out on the caller node); when
> > requires_response is false, invoking sendReply() should throw an
> > exception.
>
>
> The way I actually implemented it this morning is to omit the boolean
> parameter altogether:
> void handle(Message request, Response response) throws Exception;
>
> Response is null for async requests.
>
>
Sounds good.


>
>
>
>
> >
> >
> > - Message batching
> > - The above interfaces need to take message batching into account,
> > e.g.
> > the ability to handle multiple requests concurrently (if they
> > don't need
> > to be executed sequentially)
> >
> >
> > You mean handle() is still going to be called once for each request,
> > but second handle() call won't necessarily wait for the first
> > message's sendReply() call?
>
> Yes. I was thinking of adding a second method to the interface, which
> has a message batch as parameter. However, we'd also have to pass in an
> array of Response objects and it looked a bit clumsy.
>
>
Agree, it would look quite clumsy.


> >
> > Is this going to apply only to OOB messages, or to regular messages as
> > well? I think I'd prefer it if it only applied to OOB messages,
> > otherwise we'd have to implement our own ordering for regular/async
> > commands.
>
> No, I think it'll apply to all messages. A simple implementation could
> dispatch OOB messages to the thread pool, as they don't need to be
> ordered. Regular messages could be added to a queue where they are
> processed sequentially by a *single* thread. Pedro does implement
> ordering based on transactions (see his prev email), and I think there
> are some other good use cases for regular messages. I think one thing
> that could be done for regular messages is to implement something like
> SCOPE (remember ?) for async RPCs: updates to different web sessions
> could be processed concurrently, only updates to the *same* session
> would have to be ordered.
>
>
Yeah, I agree implementing the regular message ordering ourselves would
give us a little more room for optimizations. But it would make our part
more complicated, too. Well, not for Total Ordering, because Pedro already
implemented it, but for our regular async scenarios we'd need to add a
thread pool (we want to allow 2 threads from different sources to access
different keys at the same time).



> This API is not in stone, we can always change it. Once I'm done with
> this and have batching II implemented, plus some other JIRAs, I'll ping
> you guys and we should have a meeting discussing
> - Async invocation API
> - Message batching (also in conjunction with the above)
> - Message bundling and OOB / DONT_BUNDLE; bundling of OOB messages
>
> --
> Bela Ban, JGroups lead (http://www.jgroups.org)
>
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-07 Thread Bela Ban

On 2/7/13 12:29 PM, Pedro Ruivo wrote:
> Hi Bela
>
> On 2/7/13 4:53 AM, Bela Ban wrote:
>> Hi Pedro,
>>
>> this is almost exactly what I wanted to implement !
>>
>> Question:
>> - In RequestCorrelator.handleRequest():
>>
>> protected void handleRequest(Message req, Header hdr) {
>> Object retval;
>> boolean threwException = false;
>> MessageRequest messageRequest = new MessageRequestImpl(req, hdr);
>> try {
>> retval=request_handler.handle(messageRequest);
>> } catch(Throwable t) {
>> retval=t;
>> threwException = true;
>> }
>> messageRequest.sendReply(retval, threwException);//<-- should be moved
>> up, or called only if threwException == true
>> }
>>
>>
>> , you create a MessageRequestImpl and pass it to the RequestHandler. The
>> request handler then dispatches the request (possibly) to a thread pool
>> and calls MessageRequestImpl.sendReply() when done.
>>
>> However, you also call MessageRequest.sendReply() before returning from
>> handleRequest(). I think this is an error, and
>> MessageRequest.sendReply() should be moved up inside the catch clause,
>> or be called only if threwException is true, so that we send a reply on
>> behalf of the RequestHandler if and only if it threw an exception (e.g.
>> before it dispatches the request to a thread pool). Otherwise, we'd send
>> a reply *twice* !
> In my defense, I was assuming if the application uses the sendReply()
> method, it must return a special return value: DO_NOT_REPLY (in
> RequestHandler interface).
> This return value is automatically ignored:
>
> public final void sendReply(Object reply, boolean exceptionThrown) {
> if(!header.rsp_expected || reply == RequestHandler.DO_NOT_REPLY)
> return;

OK


>
>> A few changes I have in mind (need to think about it more):
>>
>> - I want to leave the existing RequestHandler interface in place, so
>> current implementation continue to work
>> - There will be a new AsyncRequestHandler interface (possibly extending
>> RequestHandler, so an implementation can decide to implement both). The
>> RequestCorrelator needs to have either request_handler or
>> async_request_handler set. If the former is set, the logic is unchanged.
>> If the latter is set I'll invoke the async dispatching code
> I'm not sure if it is a good idea to have the AsyncRequestHandler
> extending the RequestHandler interface. If the application implements
> both methods (Object handle(Message) and void handle(Message, ...)) how
> do you know which method should be invoked?


The default would be to invoke the old handle(Message) method. The 
dispatching mechanism could be changed to use the async method by 
setting an attribute in MessageDispatcher (which in turn sets it in 
RequestCorrelator).

How would you do this ? Remember, we cannot change or remove 
handle(Message) as subclasses of RpcDispatcher or MessageDispatcher, or 
impls of RequestHandler are out there and any change to handle(Message) 
would break them.

Would you simply provide a separate AsyncRequestHandler interface, not 
extending RequestHandler ? This would require RequestCorrelator and 
MessageDispatcher to have 2 refs instead of 1. With the current approach 
I can do an instanceof on the RequestHandler.

I eventually like to merge RequestHandler and AsyncRequestHandler into 1 
class, but this can be done in 4.0 at the earliest time.

>> - AsyncRequestHandler will look similar to the following:
>> void handle(Message request, Handback hb, boolean requires_response)
>> throws Throwable;
>> - Handback is an interface, and its impl contains header information
>> (e.g. request ID)
>> - Handback has a sendReply(Object reply, boolean is_exception) method
>> which sends a response (or exception) back to the caller
>> - When requires_response is false, the AsyncRequestHandler doesn't need
>> to invoke sendReply()
> My 2 cents, I think that the boolean requires_response should be in
> Handback implementation to avoid passing an extra argument

I actually removed the requires_response parameter. If no response is 
required, rsp in handle(Message req, Response rsp) will simply be null.

-- 
Bela Ban, JGroups lead (http://www.jgroups.org)

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] Threadpools in a large cluster

2013-02-07 Thread Manik Surtani

On 7 Feb 2013, at 11:29, Bela Ban  wrote:

> I meant to say that this is up to you guys to decide inwhich Infinispan 
> release this will be used, but it will be available in JGroups 3.3.
> 
> What's the strategy/schedule for 6 and 5.3 anyway ?

5.3 will be relatively quick, just an incremental release mainly targeting 
reviewing and putting in a few contribs on the queue.  E.g., we're hoping for 
5.3 to be feature-complete in no more than 2 months.

6 is where we get to break API - specifically with the new packaging, etc.

But this seems like a pretty important feature/fix so it may make sense to get 
this into 5.3.

- M

> 
> On 2/7/13 11:30 AM, Bela Ban wrote:
>> No, this *could* be in Infinispan 5.3 (it will be in JGroups 3.3).
>> 
>> A MessageDispatcher (RpcDispatcher) instance picks which dispatching
>> mechanism it wants to use. I have RequestHandler (the default) and a
>> sub-interface AsyncRequestHandler. MessageDispatcher (and subclass
>> RpcDispatcher) will implement both (they already do implement
>> handle(Message)).
>> 
>> So now a user simply sets an attribute in MessageDispatcher to select
>> async dispatching (sync is the default).
>> 
>> What needs to be done from the Infinispan side is to override
>> handle(Message,Response), and implement handling of requests in a thread
>> pool. The current behavior (inherited from MessageDispatcher) will be to
>> call handle(Message) which CommandAwareDispatcher already implements.
>> 
>> The Infinispan side can be done in 10 minutes. However, the real work
>> will be the dispatching of incoming requests to threads from the
>> Infinispan thread pool, and the impl of the thread pool, which doesn't
>> exist yet. I guess preserving ordering of requests will be the important
>> part.
>> 
>> If you have your own thread pool, sync RPCs can be sent without OOB, but
>> the handle() method in CommandAwareDispatcher can decide, based on the
>> mode (e.g. sync) whether to queue the request behind other requests, or
>> whether to invoke it directly.
>> 
>> I wanted to implement this quickly in JGroups so the hooks are in place
>> for Infinispan to use them later, once a pool has been implemented.
>> 
>> 
>> On 2/7/13 10:56 AM, Manik Surtani wrote:
>>> Very interesting.  However I presume this would be something for Infinispan 
>>> 6.0?  Any thoughts on backward compat?
>>> 
>>> On 7 Feb 2013, at 04:53, Bela Ban  wrote:
>>> 
 Hi Pedro,
 
 this is almost exactly what I wanted to implement !
 
 Question:
 - In RequestCorrelator.handleRequest():
 
 protected void handleRequest(Message req, Header hdr) {
 Object retval;
 boolean threwException = false;
 MessageRequest messageRequest = new MessageRequestImpl(req, hdr);
 try {
 retval=request_handler.handle(messageRequest);
 } catch(Throwable t) {
 retval=t;
 threwException = true;
 }
 messageRequest.sendReply(retval, threwException);// <-- should be moved
 up, or called only if threwException == true
 }
 
 
 , you create a MessageRequestImpl and pass it to the RequestHandler. The
 request handler then dispatches the request (possibly) to a thread pool
 and calls MessageRequestImpl.sendReply() when done.
 
 However, you also call MessageRequest.sendReply() before returning from
 handleRequest(). I think this is an error, and
 MessageRequest.sendReply() should be moved up inside the catch clause,
 or be called only if threwException is true, so that we send a reply on
 behalf of the RequestHandler if and only if it threw an exception (e.g.
 before it dispatches the request to a thread pool). Otherwise, we'd send
 a reply *twice* !
 
 A few changes I have in mind (need to think about it more):
 
 - I want to leave the existing RequestHandler interface in place, so
 current implementation continue to work
 - There will be a new AsyncRequestHandler interface (possibly extending
 RequestHandler, so an implementation can decide to implement both). The
 RequestCorrelator needs to have either request_handler or
 async_request_handler set. If the former is set, the logic is unchanged.
 If the latter is set I'll invoke the async dispatching code
 
 - AsyncRequestHandler will look similar to the following:
 void handle(Message request, Handback hb, boolean requires_response)
 throws Throwable;
 - Handback is an interface, and its impl contains header information
 (e.g. request ID)
 - Handback has a sendReply(Object reply, boolean is_exception) method
 which sends a response (or exception) back to the caller
 - When requires_response is false, the AsyncRequestHandler doesn't need
 to invoke sendReply()
 
 - Message batching
 - The above interfaces need to take message batching into account, e.g.
 the ability to handle multiple requests concurrently (if they don't need
 to be executed sequentially)

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-07 Thread Pedro Ruivo
Hi Bela

On 2/7/13 4:53 AM, Bela Ban wrote:
> Hi Pedro,
>
> this is almost exactly what I wanted to implement !
>
> Question:
> - In RequestCorrelator.handleRequest():
>
> protected void handleRequest(Message req, Header hdr) {
> Object retval;
> boolean threwException = false;
> MessageRequest messageRequest = new MessageRequestImpl(req, hdr);
> try {
> retval=request_handler.handle(messageRequest);
> } catch(Throwable t) {
> retval=t;
> threwException = true;
> }
> messageRequest.sendReply(retval, threwException);//<-- should be moved
> up, or called only if threwException == true
> }
>
>
> , you create a MessageRequestImpl and pass it to the RequestHandler. The
> request handler then dispatches the request (possibly) to a thread pool
> and calls MessageRequestImpl.sendReply() when done.
>
> However, you also call MessageRequest.sendReply() before returning from
> handleRequest(). I think this is an error, and
> MessageRequest.sendReply() should be moved up inside the catch clause,
> or be called only if threwException is true, so that we send a reply on
> behalf of the RequestHandler if and only if it threw an exception (e.g.
> before it dispatches the request to a thread pool). Otherwise, we'd send
> a reply *twice* !
In my defense, I was assuming if the application uses the sendReply() 
method, it must return a special return value: DO_NOT_REPLY (in 
RequestHandler interface).
This return value is automatically ignored:

public final void sendReply(Object reply, boolean exceptionThrown) {
if(!header.rsp_expected || reply == RequestHandler.DO_NOT_REPLY)
return;

> A few changes I have in mind (need to think about it more):
>
> - I want to leave the existing RequestHandler interface in place, so
> current implementation continue to work
> - There will be a new AsyncRequestHandler interface (possibly extending
> RequestHandler, so an implementation can decide to implement both). The
> RequestCorrelator needs to have either request_handler or
> async_request_handler set. If the former is set, the logic is unchanged.
> If the latter is set I'll invoke the async dispatching code
I'm not sure if it is a good idea to have the AsyncRequestHandler 
extending the RequestHandler interface. If the application implements 
both methods (Object handle(Message) and void handle(Message, ...)) how 
do you know which method should be invoked?
> - AsyncRequestHandler will look similar to the following:
> void handle(Message request, Handback hb, boolean requires_response)
> throws Throwable;
> - Handback is an interface, and its impl contains header information
> (e.g. request ID)
> - Handback has a sendReply(Object reply, boolean is_exception) method
> which sends a response (or exception) back to the caller
> - When requires_response is false, the AsyncRequestHandler doesn't need
> to invoke sendReply()
My 2 cents, I think that the boolean requires_response should be in 
Handback implementation to avoid passing an extra argument
> - Message batching
> - The above interfaces need to take message batching into account, e.g.
> the ability to handle multiple requests concurrently (if they don't need
> to be executed sequentially)
>
>
> Thoughts ?
>
>
> On 2/6/13 8:29 PM, Pedro Ruivo wrote:
>> Hi all,
>>
>> Recently I came up with a solution that can help with the thread pool
>> problem motivated by the following:
>>
>> In one of the first implementation of Total Order based commit
>> protocol (TO), I had the requirement to move the PrepareCommand to
>> another thread pool. In resume, the TO protocol delivers the
>> PrepareCommand in a deterministic order in all the nodes, by a single
>> deliver thread. To ensure consistency, if it delivers two conflicting
>> transactions, the second transaction must wait until the first
>> transaction finishes. However, blocking single deliver thread is not a
>> good solution, because no more transaction can be validated, even if
>> they don't conflict, while the thread is blocked.
>>
>> So, after creating a dependency graph (i.e. the second transaction
>> knows that it must wait for the first transaction to finish) I move
>> the PrepareCommand to another thread pool. Initially, I implemented a
>> new command, called PrepareResponseCommand, that sends back the reply
>> of the PrepareCommand. This solution has one disadvantage: I had to
>> implement an ack collector in ISPN, while JGroups already offers me
>> that with a synchronous communication.
>>
>> Recently (2 or 3 months ago) I implemented a simple modification in
>> JGroups. In a more generic approach, it allows other threads to reply
>> to a RPC request (such as the PrepareCommand). In the previous
>> scenario, I replaced the PrepareResponseCommand and the ack collector
>> implementation with a synchronous RPC invocation. I've used this
>> solution in other issues in the Cloud-TM's ISPN fork.
>>
>> This solution is quite simple to implement and may help you to move
>> the commands to ISPN internal thread pools. The modifications I've

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-07 Thread Bela Ban
I meant to say that this is up to you guys to decide inwhich Infinispan 
release this will be used, but it will be available in JGroups 3.3.

What's the strategy/schedule for 6 and 5.3 anyway ?

On 2/7/13 11:30 AM, Bela Ban wrote:
> No, this *could* be in Infinispan 5.3 (it will be in JGroups 3.3).
>
> A MessageDispatcher (RpcDispatcher) instance picks which dispatching
> mechanism it wants to use. I have RequestHandler (the default) and a
> sub-interface AsyncRequestHandler. MessageDispatcher (and subclass
> RpcDispatcher) will implement both (they already do implement
> handle(Message)).
>
> So now a user simply sets an attribute in MessageDispatcher to select
> async dispatching (sync is the default).
>
> What needs to be done from the Infinispan side is to override
> handle(Message,Response), and implement handling of requests in a thread
> pool. The current behavior (inherited from MessageDispatcher) will be to
> call handle(Message) which CommandAwareDispatcher already implements.
>
> The Infinispan side can be done in 10 minutes. However, the real work
> will be the dispatching of incoming requests to threads from the
> Infinispan thread pool, and the impl of the thread pool, which doesn't
> exist yet. I guess preserving ordering of requests will be the important
> part.
>
> If you have your own thread pool, sync RPCs can be sent without OOB, but
> the handle() method in CommandAwareDispatcher can decide, based on the
> mode (e.g. sync) whether to queue the request behind other requests, or
> whether to invoke it directly.
>
> I wanted to implement this quickly in JGroups so the hooks are in place
> for Infinispan to use them later, once a pool has been implemented.
>
>
> On 2/7/13 10:56 AM, Manik Surtani wrote:
>> Very interesting.  However I presume this would be something for Infinispan 
>> 6.0?  Any thoughts on backward compat?
>>
>> On 7 Feb 2013, at 04:53, Bela Ban  wrote:
>>
>>> Hi Pedro,
>>>
>>> this is almost exactly what I wanted to implement !
>>>
>>> Question:
>>> - In RequestCorrelator.handleRequest():
>>>
>>> protected void handleRequest(Message req, Header hdr) {
>>> Object retval;
>>> boolean threwException = false;
>>> MessageRequest messageRequest = new MessageRequestImpl(req, hdr);
>>> try {
>>> retval=request_handler.handle(messageRequest);
>>> } catch(Throwable t) {
>>> retval=t;
>>> threwException = true;
>>> }
>>> messageRequest.sendReply(retval, threwException);// <-- should be moved
>>> up, or called only if threwException == true
>>> }
>>>
>>>
>>> , you create a MessageRequestImpl and pass it to the RequestHandler. The
>>> request handler then dispatches the request (possibly) to a thread pool
>>> and calls MessageRequestImpl.sendReply() when done.
>>>
>>> However, you also call MessageRequest.sendReply() before returning from
>>> handleRequest(). I think this is an error, and
>>> MessageRequest.sendReply() should be moved up inside the catch clause,
>>> or be called only if threwException is true, so that we send a reply on
>>> behalf of the RequestHandler if and only if it threw an exception (e.g.
>>> before it dispatches the request to a thread pool). Otherwise, we'd send
>>> a reply *twice* !
>>>
>>> A few changes I have in mind (need to think about it more):
>>>
>>> - I want to leave the existing RequestHandler interface in place, so
>>> current implementation continue to work
>>> - There will be a new AsyncRequestHandler interface (possibly extending
>>> RequestHandler, so an implementation can decide to implement both). The
>>> RequestCorrelator needs to have either request_handler or
>>> async_request_handler set. If the former is set, the logic is unchanged.
>>> If the latter is set I'll invoke the async dispatching code
>>>
>>> - AsyncRequestHandler will look similar to the following:
>>> void handle(Message request, Handback hb, boolean requires_response)
>>> throws Throwable;
>>> - Handback is an interface, and its impl contains header information
>>> (e.g. request ID)
>>> - Handback has a sendReply(Object reply, boolean is_exception) method
>>> which sends a response (or exception) back to the caller
>>> - When requires_response is false, the AsyncRequestHandler doesn't need
>>> to invoke sendReply()
>>>
>>> - Message batching
>>> - The above interfaces need to take message batching into account, e.g.
>>> the ability to handle multiple requests concurrently (if they don't need
>>> to be executed sequentially)
>>>
>>>
>>> Thoughts ?
>>>
>>>
>>> On 2/6/13 8:29 PM, Pedro Ruivo wrote:
 Hi all,

 Recently I came up with a solution that can help with the thread pool
 problem motivated by the following:

 In one of the first implementation of Total Order based commit
 protocol (TO), I had the requirement to move the PrepareCommand to
 another thread pool. In resume, the TO protocol delivers the
 PrepareCommand in a deterministic order in all the nodes, by a single
 deliver thread. To ensure consistency, if it delivers two confli

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-07 Thread Manik Surtani

On 7 Feb 2013, at 10:30, Bela Ban  wrote:

> No, this *could* be in Infinispan 5.3 (it will be in JGroups 3.3).
> 
> A MessageDispatcher (RpcDispatcher) instance picks which dispatching 
> mechanism it wants to use. I have RequestHandler (the default) and a 
> sub-interface AsyncRequestHandler. MessageDispatcher (and subclass 
> RpcDispatcher) will implement both (they already do implement 
> handle(Message)).
> 
> So now a user simply sets an attribute in MessageDispatcher to select 
> async dispatching (sync is the default).
> 
> What needs to be done from the Infinispan side is to override 
> handle(Message,Response), and implement handling of requests in a thread 
> pool. The current behavior (inherited from MessageDispatcher) will be to 
> call handle(Message) which CommandAwareDispatcher already implements.
> 
> The Infinispan side can be done in 10 minutes. However, the real work 
> will be the dispatching of incoming requests to threads from the 
> Infinispan thread pool, and the impl of the thread pool, which doesn't 
> exist yet. I guess preserving ordering of requests will be the important 
> part.

This isn't particularly hard either ... 

> 
> If you have your own thread pool, sync RPCs can be sent without OOB, but 
> the handle() method in CommandAwareDispatcher can decide, based on the 
> mode (e.g. sync) whether to queue the request behind other requests, or 
> whether to invoke it directly.
> 
> I wanted to implement this quickly in JGroups so the hooks are in place 
> for Infinispan to use them later, once a pool has been implemented.
> 
> 
> On 2/7/13 10:56 AM, Manik Surtani wrote:
>> Very interesting.  However I presume this would be something for Infinispan 
>> 6.0?  Any thoughts on backward compat?
>> 
>> On 7 Feb 2013, at 04:53, Bela Ban  wrote:
>> 
>>> Hi Pedro,
>>> 
>>> this is almost exactly what I wanted to implement !
>>> 
>>> Question:
>>> - In RequestCorrelator.handleRequest():
>>> 
>>> protected void handleRequest(Message req, Header hdr) {
>>> Object retval;
>>> boolean threwException = false;
>>> MessageRequest messageRequest = new MessageRequestImpl(req, hdr);
>>> try {
>>> retval=request_handler.handle(messageRequest);
>>> } catch(Throwable t) {
>>> retval=t;
>>> threwException = true;
>>> }
>>> messageRequest.sendReply(retval, threwException);// <-- should be moved
>>> up, or called only if threwException == true
>>> }
>>> 
>>> 
>>> , you create a MessageRequestImpl and pass it to the RequestHandler. The
>>> request handler then dispatches the request (possibly) to a thread pool
>>> and calls MessageRequestImpl.sendReply() when done.
>>> 
>>> However, you also call MessageRequest.sendReply() before returning from
>>> handleRequest(). I think this is an error, and
>>> MessageRequest.sendReply() should be moved up inside the catch clause,
>>> or be called only if threwException is true, so that we send a reply on
>>> behalf of the RequestHandler if and only if it threw an exception (e.g.
>>> before it dispatches the request to a thread pool). Otherwise, we'd send
>>> a reply *twice* !
>>> 
>>> A few changes I have in mind (need to think about it more):
>>> 
>>> - I want to leave the existing RequestHandler interface in place, so
>>> current implementation continue to work
>>> - There will be a new AsyncRequestHandler interface (possibly extending
>>> RequestHandler, so an implementation can decide to implement both). The
>>> RequestCorrelator needs to have either request_handler or
>>> async_request_handler set. If the former is set, the logic is unchanged.
>>> If the latter is set I'll invoke the async dispatching code
>>> 
>>> - AsyncRequestHandler will look similar to the following:
>>> void handle(Message request, Handback hb, boolean requires_response)
>>> throws Throwable;
>>> - Handback is an interface, and its impl contains header information
>>> (e.g. request ID)
>>> - Handback has a sendReply(Object reply, boolean is_exception) method
>>> which sends a response (or exception) back to the caller
>>> - When requires_response is false, the AsyncRequestHandler doesn't need
>>> to invoke sendReply()
>>> 
>>> - Message batching
>>> - The above interfaces need to take message batching into account, e.g.
>>> the ability to handle multiple requests concurrently (if they don't need
>>> to be executed sequentially)
>>> 
>>> 
>>> Thoughts ?
>>> 
>>> 
>>> On 2/6/13 8:29 PM, Pedro Ruivo wrote:
 Hi all,
 
 Recently I came up with a solution that can help with the thread pool
 problem motivated by the following:
 
 In one of the first implementation of Total Order based commit
 protocol (TO), I had the requirement to move the PrepareCommand to
 another thread pool. In resume, the TO protocol delivers the
 PrepareCommand in a deterministic order in all the nodes, by a single
 deliver thread. To ensure consistency, if it delivers two conflicting
 transactions, the second transaction must wait until the first
 transaction finishes. H

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-07 Thread Bela Ban
No, this *could* be in Infinispan 5.3 (it will be in JGroups 3.3).

A MessageDispatcher (RpcDispatcher) instance picks which dispatching 
mechanism it wants to use. I have RequestHandler (the default) and a 
sub-interface AsyncRequestHandler. MessageDispatcher (and subclass 
RpcDispatcher) will implement both (they already do implement 
handle(Message)).

So now a user simply sets an attribute in MessageDispatcher to select 
async dispatching (sync is the default).

What needs to be done from the Infinispan side is to override 
handle(Message,Response), and implement handling of requests in a thread 
pool. The current behavior (inherited from MessageDispatcher) will be to 
call handle(Message) which CommandAwareDispatcher already implements.

The Infinispan side can be done in 10 minutes. However, the real work 
will be the dispatching of incoming requests to threads from the 
Infinispan thread pool, and the impl of the thread pool, which doesn't 
exist yet. I guess preserving ordering of requests will be the important 
part.

If you have your own thread pool, sync RPCs can be sent without OOB, but 
the handle() method in CommandAwareDispatcher can decide, based on the 
mode (e.g. sync) whether to queue the request behind other requests, or 
whether to invoke it directly.

I wanted to implement this quickly in JGroups so the hooks are in place 
for Infinispan to use them later, once a pool has been implemented.


On 2/7/13 10:56 AM, Manik Surtani wrote:
> Very interesting.  However I presume this would be something for Infinispan 
> 6.0?  Any thoughts on backward compat?
>
> On 7 Feb 2013, at 04:53, Bela Ban  wrote:
>
>> Hi Pedro,
>>
>> this is almost exactly what I wanted to implement !
>>
>> Question:
>> - In RequestCorrelator.handleRequest():
>>
>> protected void handleRequest(Message req, Header hdr) {
>> Object retval;
>> boolean threwException = false;
>> MessageRequest messageRequest = new MessageRequestImpl(req, hdr);
>> try {
>> retval=request_handler.handle(messageRequest);
>> } catch(Throwable t) {
>> retval=t;
>> threwException = true;
>> }
>> messageRequest.sendReply(retval, threwException);// <-- should be moved
>> up, or called only if threwException == true
>> }
>>
>>
>> , you create a MessageRequestImpl and pass it to the RequestHandler. The
>> request handler then dispatches the request (possibly) to a thread pool
>> and calls MessageRequestImpl.sendReply() when done.
>>
>> However, you also call MessageRequest.sendReply() before returning from
>> handleRequest(). I think this is an error, and
>> MessageRequest.sendReply() should be moved up inside the catch clause,
>> or be called only if threwException is true, so that we send a reply on
>> behalf of the RequestHandler if and only if it threw an exception (e.g.
>> before it dispatches the request to a thread pool). Otherwise, we'd send
>> a reply *twice* !
>>
>> A few changes I have in mind (need to think about it more):
>>
>> - I want to leave the existing RequestHandler interface in place, so
>> current implementation continue to work
>> - There will be a new AsyncRequestHandler interface (possibly extending
>> RequestHandler, so an implementation can decide to implement both). The
>> RequestCorrelator needs to have either request_handler or
>> async_request_handler set. If the former is set, the logic is unchanged.
>> If the latter is set I'll invoke the async dispatching code
>>
>> - AsyncRequestHandler will look similar to the following:
>> void handle(Message request, Handback hb, boolean requires_response)
>> throws Throwable;
>> - Handback is an interface, and its impl contains header information
>> (e.g. request ID)
>> - Handback has a sendReply(Object reply, boolean is_exception) method
>> which sends a response (or exception) back to the caller
>> - When requires_response is false, the AsyncRequestHandler doesn't need
>> to invoke sendReply()
>>
>> - Message batching
>> - The above interfaces need to take message batching into account, e.g.
>> the ability to handle multiple requests concurrently (if they don't need
>> to be executed sequentially)
>>
>>
>> Thoughts ?
>>
>>
>> On 2/6/13 8:29 PM, Pedro Ruivo wrote:
>>> Hi all,
>>>
>>> Recently I came up with a solution that can help with the thread pool
>>> problem motivated by the following:
>>>
>>> In one of the first implementation of Total Order based commit
>>> protocol (TO), I had the requirement to move the PrepareCommand to
>>> another thread pool. In resume, the TO protocol delivers the
>>> PrepareCommand in a deterministic order in all the nodes, by a single
>>> deliver thread. To ensure consistency, if it delivers two conflicting
>>> transactions, the second transaction must wait until the first
>>> transaction finishes. However, blocking single deliver thread is not a
>>> good solution, because no more transaction can be validated, even if
>>> they don't conflict, while the thread is blocked.
>>>
>>> So, after creating a dependency graph (i.e. the second transaction

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-07 Thread Bela Ban

On 2/7/13 11:09 AM, Dan Berindei wrote:
>
>
> A few changes I have in mind (need to think about it more):
>
> - I want to leave the existing RequestHandler interface in place, so
> current implementation continue to work
> - There will be a new AsyncRequestHandler interface (possibly
> extending
> RequestHandler, so an implementation can decide to implement
> both). The
> RequestCorrelator needs to have either request_handler or
> async_request_handler set. If the former is set, the logic is
> unchanged.
> If the latter is set I'll invoke the async dispatching code
>
> - AsyncRequestHandler will look similar to the following:
> void handle(Message request, Handback hb, boolean requires_response)
> throws Throwable;
> - Handback is an interface, and its impl contains header information
> (e.g. request ID)
> - Handback has a sendReply(Object reply, boolean is_exception) method
> which sends a response (or exception) back to the caller
>
>
> +1 for a new interface. TBH I hadn't read the RequestCorrelator code, 
> so I had assumed it was already asynchronous, and only RpcDispatcher 
> was synchronous.


Nope, unfortunately not.


>
> I'm not so sure about the Handback name, how about calling it Response 
> instead?


It *is* actually called Response (can you read my mind?) :-)

>
> - When requires_response is false, the AsyncRequestHandler doesn't
> need to invoke sendReply()
>
>
> I think this should be the other way around: when requires_response is 
> true, the AsyncRequestHandler *can* invoke sendReply(), but is not 
> required to (the call will just time out on the caller node); when 
> requires_response is false, invoking sendReply() should throw an 
> exception.


The way I actually implemented it this morning is to omit the boolean 
parameter altogether:
void handle(Message request, Response response) throws Exception;

Response is null for async requests.





>
>
> - Message batching
> - The above interfaces need to take message batching into account,
> e.g.
> the ability to handle multiple requests concurrently (if they
> don't need
> to be executed sequentially)
>
>
> You mean handle() is still going to be called once for each request, 
> but second handle() call won't necessarily wait for the first 
> message's sendReply() call?

Yes. I was thinking of adding a second method to the interface, which 
has a message batch as parameter. However, we'd also have to pass in an 
array of Response objects and it looked a bit clumsy.

>
> Is this going to apply only to OOB messages, or to regular messages as 
> well? I think I'd prefer it if it only applied to OOB messages, 
> otherwise we'd have to implement our own ordering for regular/async 
> commands.

No, I think it'll apply to all messages. A simple implementation could 
dispatch OOB messages to the thread pool, as they don't need to be 
ordered. Regular messages could be added to a queue where they are 
processed sequentially by a *single* thread. Pedro does implement 
ordering based on transactions (see his prev email), and I think there 
are some other good use cases for regular messages. I think one thing 
that could be done for regular messages is to implement something like 
SCOPE (remember ?) for async RPCs: updates to different web sessions 
could be processed concurrently, only updates to the *same* session 
would have to be ordered.

This API is not in stone, we can always change it. Once I'm done with 
this and have batching II implemented, plus some other JIRAs, I'll ping 
you guys and we should have a meeting discussing
- Async invocation API
- Message batching (also in conjunction with the above)
- Message bundling and OOB / DONT_BUNDLE; bundling of OOB messages

-- 
Bela Ban, JGroups lead (http://www.jgroups.org)

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] Threadpools in a large cluster

2013-02-07 Thread Dan Berindei
On Thu, Feb 7, 2013 at 6:53 AM, Bela Ban  wrote:

> Hi Pedro,
>
> this is almost exactly what I wanted to implement !
>
> Question:
> - In RequestCorrelator.handleRequest():
>
> protected void handleRequest(Message req, Header hdr) {
> Object retval;
> boolean threwException = false;
> MessageRequest messageRequest = new MessageRequestImpl(req, hdr);
> try {
> retval=request_handler.handle(messageRequest);
> } catch(Throwable t) {
> retval=t;
> threwException = true;
> }
> messageRequest.sendReply(retval, threwException);// <-- should be moved
> up, or called only if threwException == true
> }
>
>
> , you create a MessageRequestImpl and pass it to the RequestHandler. The
> request handler then dispatches the request (possibly) to a thread pool
> and calls MessageRequestImpl.sendReply() when done.
>
> However, you also call MessageRequest.sendReply() before returning from
> handleRequest(). I think this is an error, and
> MessageRequest.sendReply() should be moved up inside the catch clause,
> or be called only if threwException is true, so that we send a reply on
> behalf of the RequestHandler if and only if it threw an exception (e.g.
> before it dispatches the request to a thread pool). Otherwise, we'd send
> a reply *twice* !
>
> A few changes I have in mind (need to think about it more):
>
> - I want to leave the existing RequestHandler interface in place, so
> current implementation continue to work
> - There will be a new AsyncRequestHandler interface (possibly extending
> RequestHandler, so an implementation can decide to implement both). The
> RequestCorrelator needs to have either request_handler or
> async_request_handler set. If the former is set, the logic is unchanged.
> If the latter is set I'll invoke the async dispatching code
>
> - AsyncRequestHandler will look similar to the following:
> void handle(Message request, Handback hb, boolean requires_response)
> throws Throwable;
> - Handback is an interface, and its impl contains header information
> (e.g. request ID)
> - Handback has a sendReply(Object reply, boolean is_exception) method
> which sends a response (or exception) back to the caller
>

+1 for a new interface. TBH I hadn't read the RequestCorrelator code, so I
had assumed it was already asynchronous, and only RpcDispatcher was
synchronous.

I'm not so sure about the Handback name, how about calling it Response
instead?



> - When requires_response is false, the AsyncRequestHandler doesn't need
> to invoke sendReply()
>
>
I think this should be the other way around: when requires_response is
true, the AsyncRequestHandler *can* invoke sendReply(), but is not required
to (the call will just time out on the caller node); when requires_response
is false, invoking sendReply() should throw an exception.


- Message batching
> - The above interfaces need to take message batching into account, e.g.
> the ability to handle multiple requests concurrently (if they don't need
> to be executed sequentially)
>
>
You mean handle() is still going to be called once for each request, but
second handle() call won't necessarily wait for the first message's
sendReply() call?

Is this going to apply only to OOB messages, or to regular messages as
well? I think I'd prefer it if it only applied to OOB messages, otherwise
we'd have to implement our own ordering for regular/async commands.



>
> Thoughts ?
>
>
>

> On 2/6/13 8:29 PM, Pedro Ruivo wrote:
> > Hi all,
> >
> > Recently I came up with a solution that can help with the thread pool
> > problem motivated by the following:
> >
> > In one of the first implementation of Total Order based commit
> > protocol (TO), I had the requirement to move the PrepareCommand to
> > another thread pool. In resume, the TO protocol delivers the
> > PrepareCommand in a deterministic order in all the nodes, by a single
> > deliver thread. To ensure consistency, if it delivers two conflicting
> > transactions, the second transaction must wait until the first
> > transaction finishes. However, blocking single deliver thread is not a
> > good solution, because no more transaction can be validated, even if
> > they don't conflict, while the thread is blocked.
> >
> > So, after creating a dependency graph (i.e. the second transaction
> > knows that it must wait for the first transaction to finish) I move
> > the PrepareCommand to another thread pool. Initially, I implemented a
> > new command, called PrepareResponseCommand, that sends back the reply
> > of the PrepareCommand. This solution has one disadvantage: I had to
> > implement an ack collector in ISPN, while JGroups already offers me
> > that with a synchronous communication.
> >
> > Recently (2 or 3 months ago) I implemented a simple modification in
> > JGroups. In a more generic approach, it allows other threads to reply
> > to a RPC request (such as the PrepareCommand). In the previous
> > scenario, I replaced the PrepareResponseCommand and the ack collector
> > implementation with a synchronous RPC invocation. I

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-07 Thread Manik Surtani
Very interesting.  However I presume this would be something for Infinispan 
6.0?  Any thoughts on backward compat?

On 7 Feb 2013, at 04:53, Bela Ban  wrote:

> Hi Pedro,
> 
> this is almost exactly what I wanted to implement !
> 
> Question:
> - In RequestCorrelator.handleRequest():
> 
> protected void handleRequest(Message req, Header hdr) {
> Object retval;
> boolean threwException = false;
> MessageRequest messageRequest = new MessageRequestImpl(req, hdr);
> try {
> retval=request_handler.handle(messageRequest);
> } catch(Throwable t) {
> retval=t;
> threwException = true;
> }
> messageRequest.sendReply(retval, threwException);// <-- should be moved 
> up, or called only if threwException == true
> }
> 
> 
> , you create a MessageRequestImpl and pass it to the RequestHandler. The 
> request handler then dispatches the request (possibly) to a thread pool 
> and calls MessageRequestImpl.sendReply() when done.
> 
> However, you also call MessageRequest.sendReply() before returning from 
> handleRequest(). I think this is an error, and 
> MessageRequest.sendReply() should be moved up inside the catch clause, 
> or be called only if threwException is true, so that we send a reply on 
> behalf of the RequestHandler if and only if it threw an exception (e.g. 
> before it dispatches the request to a thread pool). Otherwise, we'd send 
> a reply *twice* !
> 
> A few changes I have in mind (need to think about it more):
> 
> - I want to leave the existing RequestHandler interface in place, so 
> current implementation continue to work
> - There will be a new AsyncRequestHandler interface (possibly extending 
> RequestHandler, so an implementation can decide to implement both). The 
> RequestCorrelator needs to have either request_handler or 
> async_request_handler set. If the former is set, the logic is unchanged. 
> If the latter is set I'll invoke the async dispatching code
> 
> - AsyncRequestHandler will look similar to the following:
> void handle(Message request, Handback hb, boolean requires_response) 
> throws Throwable;
> - Handback is an interface, and its impl contains header information 
> (e.g. request ID)
> - Handback has a sendReply(Object reply, boolean is_exception) method 
> which sends a response (or exception) back to the caller
> - When requires_response is false, the AsyncRequestHandler doesn't need 
> to invoke sendReply()
> 
> - Message batching
> - The above interfaces need to take message batching into account, e.g. 
> the ability to handle multiple requests concurrently (if they don't need 
> to be executed sequentially)
> 
> 
> Thoughts ?
> 
> 
> On 2/6/13 8:29 PM, Pedro Ruivo wrote:
>> Hi all,
>> 
>> Recently I came up with a solution that can help with the thread pool 
>> problem motivated by the following:
>> 
>> In one of the first implementation of Total Order based commit 
>> protocol (TO), I had the requirement to move the PrepareCommand to 
>> another thread pool. In resume, the TO protocol delivers the 
>> PrepareCommand in a deterministic order in all the nodes, by a single 
>> deliver thread. To ensure consistency, if it delivers two conflicting 
>> transactions, the second transaction must wait until the first 
>> transaction finishes. However, blocking single deliver thread is not a 
>> good solution, because no more transaction can be validated, even if 
>> they don't conflict, while the thread is blocked.
>> 
>> So, after creating a dependency graph (i.e. the second transaction 
>> knows that it must wait for the first transaction to finish) I move 
>> the PrepareCommand to another thread pool. Initially, I implemented a 
>> new command, called PrepareResponseCommand, that sends back the reply 
>> of the PrepareCommand. This solution has one disadvantage: I had to 
>> implement an ack collector in ISPN, while JGroups already offers me 
>> that with a synchronous communication.
>> 
>> Recently (2 or 3 months ago) I implemented a simple modification in 
>> JGroups. In a more generic approach, it allows other threads to reply 
>> to a RPC request (such as the PrepareCommand). In the previous 
>> scenario, I replaced the PrepareResponseCommand and the ack collector 
>> implementation with a synchronous RPC invocation. I've used this 
>> solution in other issues in the Cloud-TM's ISPN fork.
>> 
>> This solution is quite simple to implement and may help you to move 
>> the commands to ISPN internal thread pools. The modifications I've 
>> made are the following:
>> 
>> 1) I added a new interface (see [1]) that is sent to the application 
>> instead of the Message object (see [4]). This interface contains the 
>> Message and it has a method to allow the application send the reply to 
>> that particular request.
>> 2) I added a new object in [4] with the meaning: this return value is 
>> not the reply to the RPC request. This is the returned value that I 
>> return when I want to release the thread, because ISPN should return 
>> some object in the handle() method. Of course, 

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-06 Thread Bela Ban
Hi Pedro,

this is almost exactly what I wanted to implement !

Question:
- In RequestCorrelator.handleRequest():

protected void handleRequest(Message req, Header hdr) {
Object retval;
boolean threwException = false;
MessageRequest messageRequest = new MessageRequestImpl(req, hdr);
try {
retval=request_handler.handle(messageRequest);
} catch(Throwable t) {
retval=t;
threwException = true;
}
messageRequest.sendReply(retval, threwException);// <-- should be moved 
up, or called only if threwException == true
}


, you create a MessageRequestImpl and pass it to the RequestHandler. The 
request handler then dispatches the request (possibly) to a thread pool 
and calls MessageRequestImpl.sendReply() when done.

However, you also call MessageRequest.sendReply() before returning from 
handleRequest(). I think this is an error, and 
MessageRequest.sendReply() should be moved up inside the catch clause, 
or be called only if threwException is true, so that we send a reply on 
behalf of the RequestHandler if and only if it threw an exception (e.g. 
before it dispatches the request to a thread pool). Otherwise, we'd send 
a reply *twice* !

A few changes I have in mind (need to think about it more):

- I want to leave the existing RequestHandler interface in place, so 
current implementation continue to work
- There will be a new AsyncRequestHandler interface (possibly extending 
RequestHandler, so an implementation can decide to implement both). The 
RequestCorrelator needs to have either request_handler or 
async_request_handler set. If the former is set, the logic is unchanged. 
If the latter is set I'll invoke the async dispatching code

- AsyncRequestHandler will look similar to the following:
void handle(Message request, Handback hb, boolean requires_response) 
throws Throwable;
- Handback is an interface, and its impl contains header information 
(e.g. request ID)
- Handback has a sendReply(Object reply, boolean is_exception) method 
which sends a response (or exception) back to the caller
- When requires_response is false, the AsyncRequestHandler doesn't need 
to invoke sendReply()

- Message batching
- The above interfaces need to take message batching into account, e.g. 
the ability to handle multiple requests concurrently (if they don't need 
to be executed sequentially)


Thoughts ?


On 2/6/13 8:29 PM, Pedro Ruivo wrote:
> Hi all,
>
> Recently I came up with a solution that can help with the thread pool 
> problem motivated by the following:
>
> In one of the first implementation of Total Order based commit 
> protocol (TO), I had the requirement to move the PrepareCommand to 
> another thread pool. In resume, the TO protocol delivers the 
> PrepareCommand in a deterministic order in all the nodes, by a single 
> deliver thread. To ensure consistency, if it delivers two conflicting 
> transactions, the second transaction must wait until the first 
> transaction finishes. However, blocking single deliver thread is not a 
> good solution, because no more transaction can be validated, even if 
> they don't conflict, while the thread is blocked.
>
> So, after creating a dependency graph (i.e. the second transaction 
> knows that it must wait for the first transaction to finish) I move 
> the PrepareCommand to another thread pool. Initially, I implemented a 
> new command, called PrepareResponseCommand, that sends back the reply 
> of the PrepareCommand. This solution has one disadvantage: I had to 
> implement an ack collector in ISPN, while JGroups already offers me 
> that with a synchronous communication.
>
> Recently (2 or 3 months ago) I implemented a simple modification in 
> JGroups. In a more generic approach, it allows other threads to reply 
> to a RPC request (such as the PrepareCommand). In the previous 
> scenario, I replaced the PrepareResponseCommand and the ack collector 
> implementation with a synchronous RPC invocation. I've used this 
> solution in other issues in the Cloud-TM's ISPN fork.
>
> This solution is quite simple to implement and may help you to move 
> the commands to ISPN internal thread pools. The modifications I've 
> made are the following:
>
> 1) I added a new interface (see [1]) that is sent to the application 
> instead of the Message object (see [4]). This interface contains the 
> Message and it has a method to allow the application send the reply to 
> that particular request.
> 2) I added a new object in [4] with the meaning: this return value is 
> not the reply to the RPC request. This is the returned value that I 
> return when I want to release the thread, because ISPN should return 
> some object in the handle() method. Of course, I know that ISPN will 
> invoke the sendReply() in some other place, otherwise, I will get a 
> TimeoutException in the sender side.
> 3) Also I've changed the RequestCorrelator implementation to support 
> the previous modifications (see [2] and [3])
>
> In the Cloud-TM's ISPN fork I added a reference in the 
> BaseCacheRpcCommand to

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-06 Thread Pedro Ruivo

Hi all,

Recently I came up with a solution that can help with the thread pool 
problem motivated by the following:


In one of the first implementation of Total Order based commit protocol 
(TO), I had the requirement to move the PrepareCommand to another thread 
pool. In resume, the TO protocol delivers the PrepareCommand in a 
deterministic order in all the nodes, by a single deliver thread. To 
ensure consistency, if it delivers two conflicting transactions, the 
second transaction must wait until the first transaction finishes. 
However, blocking single deliver thread is not a good solution, because 
no more transaction can be validated, even if they don't conflict, while 
the thread is blocked.


So, after creating a dependency graph (i.e. the second transaction knows 
that it must wait for the first transaction to finish) I move the 
PrepareCommand to another thread pool. Initially, I implemented a new 
command, called PrepareResponseCommand, that sends back the reply of the 
PrepareCommand. This solution has one disadvantage: I had to implement 
an ack collector in ISPN, while JGroups already offers me that with a 
synchronous communication.


Recently (2 or 3 months ago) I implemented a simple modification in 
JGroups. In a more generic approach, it allows other threads to reply to 
a RPC request (such as the PrepareCommand). In the previous scenario, I 
replaced the PrepareResponseCommand and the ack collector implementation 
with a synchronous RPC invocation. I've used this solution in other 
issues in the Cloud-TM's ISPN fork.


This solution is quite simple to implement and may help you to move the 
commands to ISPN internal thread pools. The modifications I've made are 
the following:


1) I added a new interface (see [1]) that is sent to the application 
instead of the Message object (see [4]). This interface contains the 
Message and it has a method to allow the application send the reply to 
that particular request.
2) I added a new object in [4] with the meaning: this return value is 
not the reply to the RPC request. This is the returned value that I 
return when I want to release the thread, because ISPN should return 
some object in the handle() method. Of course, I know that ISPN will 
invoke the sendReply() in some other place, otherwise, I will get a 
TimeoutException in the sender side.
3) Also I've changed the RequestCorrelator implementation to support the 
previous modifications (see [2] and [3])


In the Cloud-TM's ISPN fork I added a reference in the 
BaseCacheRpcCommand to [1] and added the method sendReply() [5]. In 
addition, I have the following uses cases working perfectly with this:


1) Total Order

The scenario described in the beginning. The ParallelTotalOrderManager 
returns the DO_NOT_REPLY object when it receives a remote PrepareCommand 
(see [6] line 77). When the PrepareCommand is finally processed by the 
rest of the interceptor chain, it invokes the PreapreCommand.sendReply() 
(see [6] line 230).


2) GMU remote get

GMU ensures SERIALIZABLE Isolation Level and the remote gets must ensure 
that the node that is processing the request has a minimum version 
available to ensure data consistency. The problem in ours initial 
implementation in large cluster, is the number of remote gets are very 
high and all the OOB are being blocked because of this condition.


Same thing I've done with the ClusteredRemoteGet as you can in see [7], 
line 93 and 105.


3) GMU CommitCommand

In GMU, the CommitCommand cannot be processed by any order. If T1 is 
serialized before T2, the commit command of T1 must be processed before 
the commit command of T2, even if the transactions do not have 
conflicts. This generates the same problem above and the same solution 
was adopted.


I know that you have discussed some solutions and I would like to know 
what it is your opinion about what I've described.


If you have questions, please let me know.

Cheers,
Pedro

[1] 
https://github.com/pruivo/JGroups/blob/t_cloudtm/src/org/jgroups/blocks/MessageRequest.java 

[2] 
https://github.com/pruivo/JGroups/blob/t_cloudtm/src/org/jgroups/blocks/RequestCorrelator.java#L463 

[3] 
https://github.com/pruivo/JGroups/blob/t_cloudtm/src/org/jgroups/blocks/RequestCorrelator.java#L495 

[4] 
https://github.com/pruivo/JGroups/blob/t_cloudtm/src/org/jgroups/blocks/RequestHandler.java 

[5] 
https://github.com/pruivo/infinispan/blob/cloudtm_v1/core/src/main/java/org/infinispan/commands/remote/BaseRpcCommand.java#L75 


Re: [infinispan-dev] Threadpools in a large cluster

2013-02-04 Thread Manik Surtani
On 4 Feb 2013, at 07:46, Dan Berindei  wrote:

> Switching to a state machine approach would require rethinking and rewriting 
> all our interceptors, and I'm pretty sure the code would get more complex and 
> harder to debug (to say nothing about interpreting the logs). Are you sure 
> it's going to have that many benefits to make it worthwhile?

I think it could be worthwhile, but I agree it is not at all trivial.  So 
that's actually a hard call as a result - not so easy to prototype and 
benchmark.


--
Manik Surtani
ma...@jboss.org
twitter.com/maniksurtani

Platform Architect, JBoss Data Grid
http://red.ht/data-grid

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-04 Thread Manik Surtani
I agree that an application thread pool just pushes the issue of the OOB pool 
running out of threads elsewhere, but that is only one of the two problems 
Radim has.

The other is that nodes get suspected and kicked out because heartbeat messages 
get blocked as well.  Same thing with FC credit messages.  By having a separate 
application pool, at least we guarantee that the cluster service messages get 
handled… 


On 3 Feb 2013, at 11:23, Bela Ban  wrote:

> A new thread pool owned by Infinispan is certainly something desirable, 
> as discussed in Palma, but I think it wouldn't solve the issue Radim ran 
> into, namely threads being used despite the fact that they only wait for 
> another blocking RPC to finish.
> 
> If we made the JGroups thread return immediately by transferring control 
> to an Infinispan thread, then we'd simply move the issue from the former 
> to the latter pool. Eventually, the Infinispan pool would run out of 
> threads.
> 
> Coming back to the specific problem Radim ran into: the forwarding of a 
> PUT doesn't hold any locks, so your argument below wouldn't hold. 
> However, of course this is only one specific scenario, and you're 
> probably right that we'd have to consider the more general case of a 
> thread holding locks...
> 
> All said, I believe it would still be worthwhile looking into a more 
> non-blocking way of invoking RPCs, that doesn't occupy threads which 
> essentially only wait on IO (network traffic)... A simple state machine 
> approach could be the solution to this...
> 
> 
> On 2/1/13 10:54 AM, Dan Berindei wrote:
>> Yeah, I wouldn't call this a "simple" solution...
>> 
>> The distribution/replication interceptors are quite high in the 
>> interceptor stack, so we'd have to save the state of the interceptor 
>> stack (basically the thread's stack) somehow and resume processing it 
>> on the thread receiving the responses. In a language that supports 
>> continuations that would be a piece of cake, but since we're in Java 
>> we'd have to completely change the way the interceptor stack works.
>> 
>> Actually we do hold the lock on modified keys while the command is 
>> replicated to the other owners. But think locking wouldn't be a 
>> problem: we already allow locks to be owned by transactions instead of 
>> threads, so it would just be a matter of creating a "lite transaction" 
>> for non-transactional caches. Obviously the 
>> TransactionSynchronizerInterceptor would have to go, but I see that as 
>> a positive thing ;)
>> 
>> So yeah, it could work, but it would take a huge amount of effort and 
>> it's going to obfuscate the code. Plus, I'm not at all convinced that 
>> it's going to improve performance that much compared to a new thread pool.
>> 
>> Cheers
>> Dan
>> 
>> 
>> On Fri, Feb 1, 2013 at 10:59 AM, Radim Vansa > <mailto:rva...@redhat.com>> wrote:
>> 
>>Yeah, that would work if it is possible to break execution path
>>into the FutureListener from the middle of interceptor stack - I
>>am really not sure about that but as in current design no locks
>>should be held when a RPC is called, it may be possible.
>> 
>>Let's see what someone more informed (Dan?) would think about that.
>> 
>>Thanks, Bela
>> 
>>Radim
>> 
>>- Original Message -
>>| From: "Bela Ban" mailto:b...@redhat.com>>
>>| To: infinispan-dev@lists.jboss.org
>><mailto:infinispan-dev@lists.jboss.org>
>>| Sent: Friday, February 1, 2013 9:39:43 AM
>>| Subject: Re: [infinispan-dev] Threadpools in a large cluster
>>|
>>| It looks like the core problem is an incoming RPC-1 which triggers
>>| another blocking RPC-2: the thread delivering RPC-1 is blocked
>>| waiting
>>| for the response from RPC-2, and can therefore not be used to serve
>>| other requests for the duration of RPC-2. If RPC-2 takes a while,
>>| e.g.
>>| waiting to acquire a lock in the remote node, then it is clear that
>>| the
>>| thread pool will quickly exceed its max size.
>>|
>>| A simple solution would be to prevent invoking blocking RPCs *from
>>| within* a received RPC. Let's take a look at an example:
>>| - A invokes a blocking PUT-1 on B
>>| - B forwards the request as blocking PUT-2 to C and D
>>| - When PUT-2 returns and B gets the responses from C and D (or the
>>| first
>>| one to respond, don'

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-03 Thread Dan Berindei
On Sun, Feb 3, 2013 at 1:23 PM, Bela Ban  wrote:

> A new thread pool owned by Infinispan is certainly something desirable,
> as discussed in Palma, but I think it wouldn't solve the issue Radim ran
> into, namely threads being used despite the fact that they only wait for
> another blocking RPC to finish.
>
>
IMO the fact that threads are being blocked waiting for an RPC to return is
not a big deal. The real problem is when all the OOB threads are used up,
causing a deadlock: existing OOB threads are blocked waiting for RPC
responses, and the RPC responses are blocked by until a OOB thread is freed.



> If we made the JGroups thread return immediately by transferring control
> to an Infinispan thread, then we'd simply move the issue from the former
> to the latter pool. Eventually, the Infinispan pool would run out of
> threads.
>
>
Yeah, but JGroups would still be able to process RPC responses, and by
doing that it will free some of the OOB threads.

For transactional caches there's an additional benefit: if commit/rollback
commands were handled directly on the OOB pool then neither thread pool
would have dependencies between tasks, so we could enable queueing for the
OOB pool and the Infinispan pool without causing deadlocks.



> Coming back to the specific problem Radim ran into: the forwarding of a
> PUT doesn't hold any locks, so your argument below wouldn't hold.
> However, of course this is only one specific scenario, and you're
> probably right that we'd have to consider the more general case of a
> thread holding locks...
>
>
Actually, NonTransactionalLockingInterceptor acquires a lock on the key
before the execution of the RPC (from
NonTxConcurrentDistributionInterceptor), and is keeping that lock for the
entire duration of the RPC.

We make other RPCs while holding the key lock as well, particularly to
invalidate the L1 entries.



> All said, I believe it would still be worthwhile looking into a more
> non-blocking way of invoking RPCs, that doesn't occupy threads which
> essentially only wait on IO (network traffic)... A simple state machine
> approach could be the solution to this...
>
>
Switching to a state machine approach would require rethinking and
rewriting all our interceptors, and I'm pretty sure the code would get more
complex and harder to debug (to say nothing about interpreting the logs).
Are you sure it's going to have that many benefits to make it worthwhile?



> On 2/1/13 10:54 AM, Dan Berindei wrote:
> > Yeah, I wouldn't call this a "simple" solution...
> >
> > The distribution/replication interceptors are quite high in the
> > interceptor stack, so we'd have to save the state of the interceptor
> > stack (basically the thread's stack) somehow and resume processing it
> > on the thread receiving the responses. In a language that supports
> > continuations that would be a piece of cake, but since we're in Java
> > we'd have to completely change the way the interceptor stack works.
> >
> > Actually we do hold the lock on modified keys while the command is
> > replicated to the other owners. But think locking wouldn't be a
> > problem: we already allow locks to be owned by transactions instead of
> > threads, so it would just be a matter of creating a "lite transaction"
> > for non-transactional caches. Obviously the
> > TransactionSynchronizerInterceptor would have to go, but I see that as
> > a positive thing ;)
> >
> > So yeah, it could work, but it would take a huge amount of effort and
> > it's going to obfuscate the code. Plus, I'm not at all convinced that
> > it's going to improve performance that much compared to a new thread
> pool.
> >
> > Cheers
> > Dan
> >
> >
> > On Fri, Feb 1, 2013 at 10:59 AM, Radim Vansa  > <mailto:rva...@redhat.com>> wrote:
> >
> > Yeah, that would work if it is possible to break execution path
> > into the FutureListener from the middle of interceptor stack - I
> > am really not sure about that but as in current design no locks
> > should be held when a RPC is called, it may be possible.
> >
> >     Let's see what someone more informed (Dan?) would think about that.
> >
> > Thanks, Bela
> >
> > Radim
> >
> > - Original Message -
> > | From: "Bela Ban" mailto:b...@redhat.com>>
> > | To: infinispan-dev@lists.jboss.org
> > <mailto:infinispan-dev@lists.jboss.org>
> > | Sent: Friday, February 1, 2013 9:39:43 AM
> > | Subject: Re: [infinispan-dev] Threadpools in a large cluster
> > |
&

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-03 Thread Bela Ban
If you send me the details, I'll take a look. I'm pretty busy with 
message batching, so I can't promise next week, but soon...

On 2/1/13 11:08 AM, Pedro Ruivo wrote:
> Hi,
>
> I had a similar problem when I tried GMU[1] in "large" cluster (40 vms),
> because the remote gets and the commit messages (I'm talking about ISPN
> commands) must wait for some conditions before being processed.
>
> I solved this problem by adding a feature in JGroups[2] that allows the
> request to be moved to another thread, releasing the OOB thread. The
> other thread will send the reply of the JGroups Request. Of course, I'm
> only moving commands that I know they can block.
>
> I can enter in some detail if you want =)
>
> Cheers,
> Pedro
>
> [1] http://www.gsd.inesc-id.pt/~romanop/files/papers/icdcs12.pdf
> [2] I would like to talk with Bela about this, because it makes my life
> easier to support total order in ISPN. I'll try to send an email this
> weekend =)
>
> On 01-02-2013 08:04, Radim Vansa wrote:
>> Hi guys,
>>
>> after dealing with the large cluster for a while I find the way how we use 
>> OOB threads in synchronous configuration non-robust.
>> Imagine a situation where node which is not an owner of the key calls PUT. 
>> Then the a RPC is called to the primary owner of that key, which reroutes 
>> the request to all other owners and after these reply, it replies back.
>> There are two problems:
>> 1) If we do simultanously X requests from non-owners to the primary owner 
>> where X is OOB TP size, all the OOB threads are waiting for the responses 
>> and there is no thread to process the OOB response and release the thread.
>> 2) Node A is primary owner of keyA, non-primary owner of keyB and B is 
>> primary of keyB and non-primary of keyA. We got many requests for both keyA 
>> and keyB from other nodes, therefore, all OOB threads from both nodes call 
>> RPC to the non-primary owner but there's noone who could process the request.
>>
>> While we wait for the requests to timeout, the nodes with depleted OOB 
>> threadpools start suspecting all other nodes because they can't receive 
>> heartbeats etc...
>>
>> You can say "increase your OOB tp size", but that's not always an option, I 
>> have currently set it to 1000 threads and it's not enough. In the end, I 
>> will be always limited by RAM and something tells me that even nodes with 
>> few gigs of RAM should be able to form a huge cluster. We use 160 hotrod 
>> worker threads in JDG, that means that 160 * clusterSize = 10240 (64 nodes 
>> in my cluster) parallel requests can be executed, and if 10% targets the 
>> same node with 1000 OOB threads, it stucks. It's about scaling and 
>> robustness.
>>
>> Not that I'd have any good solution, but I'd really like to start a 
>> discussion.
>> Thinking about it a bit, the problem is that blocking call (calling RPC on 
>> primary owner from message handler) can block non-blocking calls (such as 
>> RPC response or command that never sends any more messages). Therefore, 
>> having a flag on message "this won't send another message" could let the 
>> message be executed in different threadpool, which will be never deadlocked. 
>> In fact, the pools could share the threads but the non-blocking would have 
>> always a few threads spare.
>> It's a bad solution as maintaining which message could block in the other 
>> node is really, really hard (we can be sure only in case of RPC responses), 
>> especially when some locks come. I will welcome anything better.
>>
>> Radim
>>
>>
>> ---
>> Radim Vansa
>> Quality Assurance Engineer
>> JBoss Datagrid
>> tel. +420532294559 ext. 62559
>>
>> Red Hat Czech, s.r.o.
>> Brno, Purkyňova 99/71, PSČ 612 45
>> Czech Republic
>>
>>
>> ___
>> infinispan-dev mailing list
>> infinispan-dev@lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

-- 
Bela Ban, JGroups lead (http://www.jgroups.org)

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-03 Thread Bela Ban
A new thread pool owned by Infinispan is certainly something desirable, 
as discussed in Palma, but I think it wouldn't solve the issue Radim ran 
into, namely threads being used despite the fact that they only wait for 
another blocking RPC to finish.

If we made the JGroups thread return immediately by transferring control 
to an Infinispan thread, then we'd simply move the issue from the former 
to the latter pool. Eventually, the Infinispan pool would run out of 
threads.

Coming back to the specific problem Radim ran into: the forwarding of a 
PUT doesn't hold any locks, so your argument below wouldn't hold. 
However, of course this is only one specific scenario, and you're 
probably right that we'd have to consider the more general case of a 
thread holding locks...

All said, I believe it would still be worthwhile looking into a more 
non-blocking way of invoking RPCs, that doesn't occupy threads which 
essentially only wait on IO (network traffic)... A simple state machine 
approach could be the solution to this...


On 2/1/13 10:54 AM, Dan Berindei wrote:
> Yeah, I wouldn't call this a "simple" solution...
>
> The distribution/replication interceptors are quite high in the 
> interceptor stack, so we'd have to save the state of the interceptor 
> stack (basically the thread's stack) somehow and resume processing it 
> on the thread receiving the responses. In a language that supports 
> continuations that would be a piece of cake, but since we're in Java 
> we'd have to completely change the way the interceptor stack works.
>
> Actually we do hold the lock on modified keys while the command is 
> replicated to the other owners. But think locking wouldn't be a 
> problem: we already allow locks to be owned by transactions instead of 
> threads, so it would just be a matter of creating a "lite transaction" 
> for non-transactional caches. Obviously the 
> TransactionSynchronizerInterceptor would have to go, but I see that as 
> a positive thing ;)
>
> So yeah, it could work, but it would take a huge amount of effort and 
> it's going to obfuscate the code. Plus, I'm not at all convinced that 
> it's going to improve performance that much compared to a new thread pool.
>
> Cheers
> Dan
>
>
> On Fri, Feb 1, 2013 at 10:59 AM, Radim Vansa  <mailto:rva...@redhat.com>> wrote:
>
> Yeah, that would work if it is possible to break execution path
> into the FutureListener from the middle of interceptor stack - I
> am really not sure about that but as in current design no locks
> should be held when a RPC is called, it may be possible.
>
> Let's see what someone more informed (Dan?) would think about that.
>
> Thanks, Bela
>
> Radim
>
> - Original Message -----
>     | From: "Bela Ban" mailto:b...@redhat.com>>
> | To: infinispan-dev@lists.jboss.org
> <mailto:infinispan-dev@lists.jboss.org>
> | Sent: Friday, February 1, 2013 9:39:43 AM
> | Subject: Re: [infinispan-dev] Threadpools in a large cluster
> |
> | It looks like the core problem is an incoming RPC-1 which triggers
> | another blocking RPC-2: the thread delivering RPC-1 is blocked
> | waiting
> | for the response from RPC-2, and can therefore not be used to serve
> | other requests for the duration of RPC-2. If RPC-2 takes a while,
> | e.g.
> | waiting to acquire a lock in the remote node, then it is clear that
> | the
> | thread pool will quickly exceed its max size.
> |
> | A simple solution would be to prevent invoking blocking RPCs *from
> | within* a received RPC. Let's take a look at an example:
> | - A invokes a blocking PUT-1 on B
> | - B forwards the request as blocking PUT-2 to C and D
> | - When PUT-2 returns and B gets the responses from C and D (or the
> | first
> | one to respond, don't know exactly how this is implemented), it
> sends
> | the response back to A (PUT-1 terminates now at A)
> |
> | We could change this to the following:
> | - A invokes a blocking PUT-1 on B
> | - B receives PUT-1. Instead of invoking a blocking PUT-2 on C and D,
> | it
> | does the following:
> |   - B invokes PUT-2 and gets a future
> |   - B adds itself as a FutureListener, and it also stores the
> | address of the original sender (A)
> |   - When the FutureListener is invoked, B sends back the result
> |   as a
> | response to A
> | - Whenever a member leaves the cluster, the corresponding
> futures are
> | cancelled and removed from the hashmaps
> |
>

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-01 Thread Dan Berindei
On Fri, Feb 1, 2013 at 12:13 PM, Manik Surtani  wrote:

>
> On 1 Feb 2013, at 09:39, Dan Berindei  wrote:
>
> > Radim, do these problems happen with the HotRod server, or only with
> memcached?
> >
> > HotRod requests handled by non-owners should be very rare, instead the
> vast majority should be handled by the primary owner directly. So if this
> happens with HotRod, we should focus on fixing the HotRod routing instead
> of focusing on how to handle a large number of requests from non-owners.
>
> Well, even Hot Rod only optionally uses smart routing.  Some client
> libraries don't have this capability.
>
>
True, and I meant to say that with memcached it should be much worse, but
at least in Radim's tests I hope smart routing is enabled.



> >
> > That being said, even if a HotRod put request is handled by the primary
> owner, it "generates" (numOwners - 1) extra OOB requests. So if you have
> 160 HotRod worker threads per node, you can expect 4 * 160 OOB messages per
> node. Multiply that by 2, because responses are OOB as well, and you can
> get 1280 OOB messages before you even start reusing any HotRod worker
> thread. Have you tried decreasing the number of HotRod workers?
> >
> > The thing is, our OOB thread pool can't use queueing because we'd get a
> queue full of commit commands while all the OOB threads are waiting on keys
> that those commit commands would unlock. As the OOB thread pool is full, we
> discard messages, which I suspect slows things down quite a bit (especially
> if it's a credit request/response message). So it may well be that a lower
> number of HotRod working threads would perform better.
> >
> > On the other hand, why is increasing the number of OOB threads a
> solution? With -Xss 512k, you can get 2000 threads with only 1 GB of
> virtual memory (the actual used memory is probably even less, unless you're
> using huge pages). AFAIK the Linux kernel doesn't break a sweat with 10
> threads running, so having 2000 threads just hanging around, waiting for a
> response, should be such a problem.
> >
> > I did chat with Bela (or was it a break-out session?) about moving
> Infinispan's request processing to another thread pool during the team
> meeting in Palma. That would leave the OOB thread pool free to receive
> response messages, FD heartbeats, credit requests/responses etc. The
> downside, I guess, is that each request would have to be passed to another
> thread, and the context switch may slow things down a bit. But since the
> new thread pool would be in Infinispan, we could even do tricks like
> executing a commit/rollback directly on the OOB thread.
>
> Right.  I always got the impression we were abusing the OOB pool.  But in
> the end, I think it makes sense (in JGroups) to separate a service thread
> pool (for heartbeats, credits, etc) and an application thread pool (what
> we'd use instead of OOB).  This way you could even tune your service thread
> pool to just have, say, 2 threads, and the application thread pool to 1000
> or whatever.
>
>
A separate service pool would be good, but I think we could go further and
treat ClusteredGet/Commit/Rollback commands the same way, because they
can't block waiting for other commands to be processed.



> > In the end, I just didn't feel that working on this was justified,
> considering the number of critical bugs we had. But maybe now's the time to
> start experimenting…
> >
> >
> >
> > On Fri, Feb 1, 2013 at 10:04 AM, Radim Vansa  wrote:
> > Hi guys,
> >
> > after dealing with the large cluster for a while I find the way how we
> use OOB threads in synchronous configuration non-robust.
> > Imagine a situation where node which is not an owner of the key calls
> PUT. Then the a RPC is called to the primary owner of that key, which
> reroutes the request to all other owners and after these reply, it replies
> back.
> > There are two problems:
> > 1) If we do simultanously X requests from non-owners to the primary
> owner where X is OOB TP size, all the OOB threads are waiting for the
> responses and there is no thread to process the OOB response and release
> the thread.
> > 2) Node A is primary owner of keyA, non-primary owner of keyB and B is
> primary of keyB and non-primary of keyA. We got many requests for both keyA
> and keyB from other nodes, therefore, all OOB threads from both nodes call
> RPC to the non-primary owner but there's noone who could process the
> request.
> >
> > While we wait for the requests to timeout, the nodes with depleted OOB
> threadpools start suspecting all other nodes because they can't receive
> heartbeats etc...
> >
> > You can say "increase your OOB tp size", but that's not always an
> option, I have currently set it to 1000 threads and it's not enough. In the
> end, I will be always limited by RAM and something tells me that even nodes
> with few gigs of RAM should be able to form a huge cluster. We use 160
> hotrod worker threads in JDG, that means that 160 * clusterSize = 10240 (64
> nodes

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-01 Thread Dan Berindei
On Fri, Feb 1, 2013 at 12:40 PM, Radim Vansa  wrote:

> |
> | Radim, do these problems happen with the HotRod server, or only with
> | memcached?
>
> I didn't test memcached, only HotRod. The thing I was seing were many OOB
> threads stuck when sending messages from handleRemoteWrite.
>
> |
> | HotRod requests handled by non-owners should be very rare, instead
> | the vast majority should be handled by the primary owner directly.
> | So if this happens with HotRod, we should focus on fixing the HotRod
> | routing instead of focusing on how to handle a large number of
> | requests from non-owners.
> |
> |
> |
> | That being said, even if a HotRod put request is handled by the
> | primary owner, it "generates" (numOwners - 1) extra OOB requests. So
> | if you have 160 HotRod worker threads per node, you can expect 4 *
> | 160 OOB messages per node. Multiply that by 2, because responses are
> | OOB as well, and you can get 1280 OOB messages before you even start
> | reusing any HotRod worker thread. Have you tried decreasing the
> | number of HotRod workers?
>
> Decreasing the number of workers would be obvious solution how to scale it
> down. To be honest, I haven't tried that because it would certainly lower
> the overall throughput, and it is not a systematic solution IMO.
>
>
You never know, a lower number of worker threads may mean lower contention,
fewer context switches, so I think it's worth experimenting.


> |
> | The thing is, our OOB thread pool can't use queueing because we'd get
> | a queue full of commit commands while all the OOB threads are
> | waiting on keys that those commit commands would unlock. As the OOB
> | thread pool is full, we discard messages, which I suspect slows
> | things down quite a bit (especially if it's a credit
> | request/response message). So it may well be that a lower number of
> | HotRod working threads would perform better.
>
> We have already had similar talk where you convinced me that having queue
> for the thread pool wouldn't help much.
>
>
Having a queue for the OOB thread pool definitely won't help for
transactional caches, as you'll get deadlocks between prepare commands
waiting for cache keys and commit commands waiting in the OOB queue. I'm
guessing with the primary owners in non-transactional caches now forwarding
commands to the backup owners you can get deadlocks there as well if you
enable queueing. So yeah, enabling queueing still isn't an option...


> |
> | On the other hand, why is increasing the number of OOB threads a
> | solution? With -Xss 512k, you can get 2000 threads with only 1 GB of
> | virtual memory (the actual used memory is probably even less, unless
> | you're using huge pages). AFAIK the Linux kernel doesn't break a
> | sweat with 10 threads running, so having 2000 threads just
> | hanging around, waiting for a response, should be such a problem.
> |
> I don't say that it won't work (you're right that it's just virtual
> memory), but I have thought that Infinispan should scale and be robust even
> for the edge cases.
>
>
Define edge cases :)

The users usually require a certain number of simultaneous clients, a
certain throughput, etc. I don't think anyone will say "Yeah, we'll use
Infinispan, but only if it uses less than 1000 threads".


> |
> | I did chat with Bela (or was it a break-out session?) about moving
> | Infinispan's request processing to another thread pool during the
> | team meeting in Palma. That would leave the OOB thread pool free to
> | receive response messages, FD heartbeats, credit requests/responses
> | etc. The downside, I guess, is that each request would have to be
> | passed to another thread, and the context switch may slow things
> | down a bit. But since the new thread pool would be in Infinispan, we
> | could even do tricks like executing a commit/rollback directly on
> | the OOB thread.
>
> Hmm, for some messages (nonblocking) the context switch could be spared.
> It depends how complicated is to determine whether the message will block
> before entering the interceptor chain.
>
>
You don't have to be very specific, optimizing for
ClusteredGet/Commit/Rollback commands should be enough.


>  |
> |
> | In the end, I just didn't feel that working on this was justified,
> | considering the number of critical bugs we had. But maybe now's the
> | time to start experimenting...
> |
>
> I agree and I'm happy that ISPN is mostly working now.
>
> I have tried to rerun the scenario with upper OOB limit 2000 and it did
> not help (originaly I was using 200 and increased to 1000), node stops
> responding in one moment... So maybe OOB is not the only villain. I'll keep
> investigating.
>
>
I'm wondering if you could collect some statistics about the JGroups thread
pools, how many threads are busy at each point during the test. How many
HotRod workers are busy in the entire cluster when the OOB thread pool gets
full should be interesting as well...



> Radim
>
>
> |
> |
> |
> |
> |
> | On Fri, Feb 1, 2013 at 

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-01 Thread Radim Vansa
| 
| Radim, do these problems happen with the HotRod server, or only with
| memcached?

I didn't test memcached, only HotRod. The thing I was seing were many OOB 
threads stuck when sending messages from handleRemoteWrite.

| 
| HotRod requests handled by non-owners should be very rare, instead
| the vast majority should be handled by the primary owner directly.
| So if this happens with HotRod, we should focus on fixing the HotRod
| routing instead of focusing on how to handle a large number of
| requests from non-owners.
| 
| 
| 
| That being said, even if a HotRod put request is handled by the
| primary owner, it "generates" (numOwners - 1) extra OOB requests. So
| if you have 160 HotRod worker threads per node, you can expect 4 *
| 160 OOB messages per node. Multiply that by 2, because responses are
| OOB as well, and you can get 1280 OOB messages before you even start
| reusing any HotRod worker thread. Have you tried decreasing the
| number of HotRod workers?

Decreasing the number of workers would be obvious solution how to scale it 
down. To be honest, I haven't tried that because it would certainly lower the 
overall throughput, and it is not a systematic solution IMO.

| 
| The thing is, our OOB thread pool can't use queueing because we'd get
| a queue full of commit commands while all the OOB threads are
| waiting on keys that those commit commands would unlock. As the OOB
| thread pool is full, we discard messages, which I suspect slows
| things down quite a bit (especially if it's a credit
| request/response message). So it may well be that a lower number of
| HotRod working threads would perform better.
 
We have already had similar talk where you convinced me that having queue for 
the thread pool wouldn't help much.

| 
| On the other hand, why is increasing the number of OOB threads a
| solution? With -Xss 512k, you can get 2000 threads with only 1 GB of
| virtual memory (the actual used memory is probably even less, unless
| you're using huge pages). AFAIK the Linux kernel doesn't break a
| sweat with 10 threads running, so having 2000 threads just
| hanging around, waiting for a response, should be such a problem.
| 
I don't say that it won't work (you're right that it's just virtual memory), 
but I have thought that Infinispan should scale and be robust even for the edge 
cases.

| 
| I did chat with Bela (or was it a break-out session?) about moving
| Infinispan's request processing to another thread pool during the
| team meeting in Palma. That would leave the OOB thread pool free to
| receive response messages, FD heartbeats, credit requests/responses
| etc. The downside, I guess, is that each request would have to be
| passed to another thread, and the context switch may slow things
| down a bit. But since the new thread pool would be in Infinispan, we
| could even do tricks like executing a commit/rollback directly on
| the OOB thread.

Hmm, for some messages (nonblocking) the context switch could be spared. It 
depends how complicated is to determine whether the message will block before 
entering the interceptor chain.

| 
| 
| In the end, I just didn't feel that working on this was justified,
| considering the number of critical bugs we had. But maybe now's the
| time to start experimenting...
| 

I agree and I'm happy that ISPN is mostly working now.

I have tried to rerun the scenario with upper OOB limit 2000 and it did not 
help (originaly I was using 200 and increased to 1000), node stops responding 
in one moment... So maybe OOB is not the only villain. I'll keep investigating.

Radim


| 
| 
| 
| 
| 
| On Fri, Feb 1, 2013 at 10:04 AM, Radim Vansa < rva...@redhat.com >
| wrote:
| 
| 
| Hi guys,
| 
| after dealing with the large cluster for a while I find the way how
| we use OOB threads in synchronous configuration non-robust.
| Imagine a situation where node which is not an owner of the key calls
| PUT. Then the a RPC is called to the primary owner of that key,
| which reroutes the request to all other owners and after these
| reply, it replies back.
| There are two problems:
| 1) If we do simultanously X requests from non-owners to the primary
| owner where X is OOB TP size, all the OOB threads are waiting for
| the responses and there is no thread to process the OOB response and
| release the thread.
| 2) Node A is primary owner of keyA, non-primary owner of keyB and B
| is primary of keyB and non-primary of keyA. We got many requests for
| both keyA and keyB from other nodes, therefore, all OOB threads from
| both nodes call RPC to the non-primary owner but there's noone who
| could process the request.
| 
| While we wait for the requests to timeout, the nodes with depleted
| OOB threadpools start suspecting all other nodes because they can't
| receive heartbeats etc...
| 
| You can say "increase your OOB tp size", but that's not always an
| option, I have currently set it to 1000 threads and it's not enough.
| In the end, I will be always limited by RAM and someth

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-01 Thread Pedro Ruivo
Hi,

I had a similar problem when I tried GMU[1] in "large" cluster (40 vms), 
because the remote gets and the commit messages (I'm talking about ISPN 
commands) must wait for some conditions before being processed.

I solved this problem by adding a feature in JGroups[2] that allows the 
request to be moved to another thread, releasing the OOB thread. The 
other thread will send the reply of the JGroups Request. Of course, I'm 
only moving commands that I know they can block.

I can enter in some detail if you want =)

Cheers,
Pedro

[1] http://www.gsd.inesc-id.pt/~romanop/files/papers/icdcs12.pdf
[2] I would like to talk with Bela about this, because it makes my life 
easier to support total order in ISPN. I'll try to send an email this 
weekend =)

On 01-02-2013 08:04, Radim Vansa wrote:
> Hi guys,
>
> after dealing with the large cluster for a while I find the way how we use 
> OOB threads in synchronous configuration non-robust.
> Imagine a situation where node which is not an owner of the key calls PUT. 
> Then the a RPC is called to the primary owner of that key, which reroutes the 
> request to all other owners and after these reply, it replies back.
> There are two problems:
> 1) If we do simultanously X requests from non-owners to the primary owner 
> where X is OOB TP size, all the OOB threads are waiting for the responses and 
> there is no thread to process the OOB response and release the thread.
> 2) Node A is primary owner of keyA, non-primary owner of keyB and B is 
> primary of keyB and non-primary of keyA. We got many requests for both keyA 
> and keyB from other nodes, therefore, all OOB threads from both nodes call 
> RPC to the non-primary owner but there's noone who could process the request.
>
> While we wait for the requests to timeout, the nodes with depleted OOB 
> threadpools start suspecting all other nodes because they can't receive 
> heartbeats etc...
>
> You can say "increase your OOB tp size", but that's not always an option, I 
> have currently set it to 1000 threads and it's not enough. In the end, I will 
> be always limited by RAM and something tells me that even nodes with few gigs 
> of RAM should be able to form a huge cluster. We use 160 hotrod worker 
> threads in JDG, that means that 160 * clusterSize = 10240 (64 nodes in my 
> cluster) parallel requests can be executed, and if 10% targets the same node 
> with 1000 OOB threads, it stucks. It's about scaling and robustness.
>
> Not that I'd have any good solution, but I'd really like to start a 
> discussion.
> Thinking about it a bit, the problem is that blocking call (calling RPC on 
> primary owner from message handler) can block non-blocking calls (such as RPC 
> response or command that never sends any more messages). Therefore, having a 
> flag on message "this won't send another message" could let the message be 
> executed in different threadpool, which will be never deadlocked. In fact, 
> the pools could share the threads but the non-blocking would have always a 
> few threads spare.
> It's a bad solution as maintaining which message could block in the other 
> node is really, really hard (we can be sure only in case of RPC responses), 
> especially when some locks come. I will welcome anything better.
>
> Radim
>
>
> ---
> Radim Vansa
> Quality Assurance Engineer
> JBoss Datagrid
> tel. +420532294559 ext. 62559
>
> Red Hat Czech, s.r.o.
> Brno, Purkyňova 99/71, PSČ 612 45
> Czech Republic
>
>
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-01 Thread Manik Surtani

On 1 Feb 2013, at 09:39, Dan Berindei  wrote:

> Radim, do these problems happen with the HotRod server, or only with 
> memcached?
> 
> HotRod requests handled by non-owners should be very rare, instead the vast 
> majority should be handled by the primary owner directly. So if this happens 
> with HotRod, we should focus on fixing the HotRod routing instead of focusing 
> on how to handle a large number of requests from non-owners.

Well, even Hot Rod only optionally uses smart routing.  Some client libraries 
don't have this capability.

> 
> That being said, even if a HotRod put request is handled by the primary 
> owner, it "generates" (numOwners - 1) extra OOB requests. So if you have 160 
> HotRod worker threads per node, you can expect 4 * 160 OOB messages per node. 
> Multiply that by 2, because responses are OOB as well, and you can get 1280 
> OOB messages before you even start reusing any HotRod worker thread. Have you 
> tried decreasing the number of HotRod workers?
> 
> The thing is, our OOB thread pool can't use queueing because we'd get a queue 
> full of commit commands while all the OOB threads are waiting on keys that 
> those commit commands would unlock. As the OOB thread pool is full, we 
> discard messages, which I suspect slows things down quite a bit (especially 
> if it's a credit request/response message). So it may well be that a lower 
> number of HotRod working threads would perform better.
> 
> On the other hand, why is increasing the number of OOB threads a solution? 
> With -Xss 512k, you can get 2000 threads with only 1 GB of virtual memory 
> (the actual used memory is probably even less, unless you're using huge 
> pages). AFAIK the Linux kernel doesn't break a sweat with 10 threads 
> running, so having 2000 threads just hanging around, waiting for a response, 
> should be such a problem.
> 
> I did chat with Bela (or was it a break-out session?) about moving 
> Infinispan's request processing to another thread pool during the team 
> meeting in Palma. That would leave the OOB thread pool free to receive 
> response messages, FD heartbeats, credit requests/responses etc. The 
> downside, I guess, is that each request would have to be passed to another 
> thread, and the context switch may slow things down a bit. But since the new 
> thread pool would be in Infinispan, we could even do tricks like executing a 
> commit/rollback directly on the OOB thread.

Right.  I always got the impression we were abusing the OOB pool.  But in the 
end, I think it makes sense (in JGroups) to separate a service thread pool (for 
heartbeats, credits, etc) and an application thread pool (what we'd use instead 
of OOB).  This way you could even tune your service thread pool to just have, 
say, 2 threads, and the application thread pool to 1000 or whatever.

> In the end, I just didn't feel that working on this was justified, 
> considering the number of critical bugs we had. But maybe now's the time to 
> start experimenting…
> 
> 
> 
> On Fri, Feb 1, 2013 at 10:04 AM, Radim Vansa  wrote:
> Hi guys,
> 
> after dealing with the large cluster for a while I find the way how we use 
> OOB threads in synchronous configuration non-robust.
> Imagine a situation where node which is not an owner of the key calls PUT. 
> Then the a RPC is called to the primary owner of that key, which reroutes the 
> request to all other owners and after these reply, it replies back.
> There are two problems:
> 1) If we do simultanously X requests from non-owners to the primary owner 
> where X is OOB TP size, all the OOB threads are waiting for the responses and 
> there is no thread to process the OOB response and release the thread.
> 2) Node A is primary owner of keyA, non-primary owner of keyB and B is 
> primary of keyB and non-primary of keyA. We got many requests for both keyA 
> and keyB from other nodes, therefore, all OOB threads from both nodes call 
> RPC to the non-primary owner but there's noone who could process the request.
> 
> While we wait for the requests to timeout, the nodes with depleted OOB 
> threadpools start suspecting all other nodes because they can't receive 
> heartbeats etc...
> 
> You can say "increase your OOB tp size", but that's not always an option, I 
> have currently set it to 1000 threads and it's not enough. In the end, I will 
> be always limited by RAM and something tells me that even nodes with few gigs 
> of RAM should be able to form a huge cluster. We use 160 hotrod worker 
> threads in JDG, that means that 160 * clusterSize = 10240 (64 nodes in my 
> cluster) parallel requests can be executed, and if 10% targets the same node 
> with 1000 OOB threads, it stucks. It's about scaling and robustness.
> 
> Not that I'd have any good solution, but I'd really like to start a 
> discussion.
> Thinking about it a bit, the problem is that blocking call (calling RPC on 
> primary owner from message handler) can block non-blocking calls (such as RPC 
> response or command t

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-01 Thread Dan Berindei
Yeah, I wouldn't call this a "simple" solution...

The distribution/replication interceptors are quite high in the interceptor
stack, so we'd have to save the state of the interceptor stack (basically
the thread's stack) somehow and resume processing it on the thread
receiving the responses. In a language that supports continuations that
would be a piece of cake, but since we're in Java we'd have to completely
change the way the interceptor stack works.

Actually we do hold the lock on modified keys while the command is
replicated to the other owners. But think locking wouldn't be a problem: we
already allow locks to be owned by transactions instead of threads, so it
would just be a matter of creating a "lite transaction" for
non-transactional caches. Obviously the TransactionSynchronizerInterceptor
would have to go, but I see that as a positive thing ;)

So yeah, it could work, but it would take a huge amount of effort and it's
going to obfuscate the code. Plus, I'm not at all convinced that it's going
to improve performance that much compared to a new thread pool.

Cheers
Dan


On Fri, Feb 1, 2013 at 10:59 AM, Radim Vansa  wrote:

> Yeah, that would work if it is possible to break execution path into the
> FutureListener from the middle of interceptor stack - I am really not sure
> about that but as in current design no locks should be held when a RPC is
> called, it may be possible.
>
> Let's see what someone more informed (Dan?) would think about that.
>
> Thanks, Bela
>
> Radim
>
> - Original Message -
> | From: "Bela Ban" 
> | To: infinispan-dev@lists.jboss.org
> | Sent: Friday, February 1, 2013 9:39:43 AM
> | Subject: Re: [infinispan-dev] Threadpools in a large cluster
> |
> | It looks like the core problem is an incoming RPC-1 which triggers
> | another blocking RPC-2: the thread delivering RPC-1 is blocked
> | waiting
> | for the response from RPC-2, and can therefore not be used to serve
> | other requests for the duration of RPC-2. If RPC-2 takes a while,
> | e.g.
> | waiting to acquire a lock in the remote node, then it is clear that
> | the
> | thread pool will quickly exceed its max size.
> |
> | A simple solution would be to prevent invoking blocking RPCs *from
> | within* a received RPC. Let's take a look at an example:
> | - A invokes a blocking PUT-1 on B
> | - B forwards the request as blocking PUT-2 to C and D
> | - When PUT-2 returns and B gets the responses from C and D (or the
> | first
> | one to respond, don't know exactly how this is implemented), it sends
> | the response back to A (PUT-1 terminates now at A)
> |
> | We could change this to the following:
> | - A invokes a blocking PUT-1 on B
> | - B receives PUT-1. Instead of invoking a blocking PUT-2 on C and D,
> | it
> | does the following:
> |   - B invokes PUT-2 and gets a future
> |   - B adds itself as a FutureListener, and it also stores the
> | address of the original sender (A)
> |   - When the FutureListener is invoked, B sends back the result
> |   as a
> | response to A
> | - Whenever a member leaves the cluster, the corresponding futures are
> | cancelled and removed from the hashmaps
> |
> | This could probably be done differently (e.g. by sending asynchronous
> | messages and implementing a finite state machine), but the core of
> | the
> | solution is the same; namely to avoid having an incoming thread block
> | on
> | a sync RPC.
> |
> | Thoughts ?
> |
> |
> |
> |
> | On 2/1/13 9:04 AM, Radim Vansa wrote:
> | > Hi guys,
> | >
> | > after dealing with the large cluster for a while I find the way how
> | > we use OOB threads in synchronous configuration non-robust.
> | > Imagine a situation where node which is not an owner of the key
> | > calls PUT. Then the a RPC is called to the primary owner of that
> | > key, which reroutes the request to all other owners and after
> | > these reply, it replies back.
> | > There are two problems:
> | > 1) If we do simultanously X requests from non-owners to the primary
> | > owner where X is OOB TP size, all the OOB threads are waiting for
> | > the responses and there is no thread to process the OOB response
> | > and release the thread.
> | > 2) Node A is primary owner of keyA, non-primary owner of keyB and B
> | > is primary of keyB and non-primary of keyA. We got many requests
> | > for both keyA and keyB from other nodes, therefore, all OOB
> | > threads from both nodes call RPC to the non-primary owner but
> | > there's noone who could process the request.
> | >
> | > While we wait for the requests to timeout, the nodes with depleted
> | > OO

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-01 Thread Dan Berindei
Radim, do these problems happen with the HotRod server, or only with
memcached?

HotRod requests handled by non-owners should be very rare, instead the vast
majority should be handled by the primary owner directly. So if this
happens with HotRod, we should focus on fixing the HotRod routing instead
of focusing on how to handle a large number of requests from non-owners.

That being said, even if a HotRod put request is handled by the primary
owner, it "generates" (numOwners - 1) extra OOB requests. So if you have
160 HotRod worker threads per node, you can expect 4 * 160 OOB messages per
node. Multiply that by 2, because responses are OOB as well, and you can
get 1280 OOB messages before you even start reusing any HotRod worker
thread. Have you tried decreasing the number of HotRod workers?

The thing is, our OOB thread pool can't use queueing because we'd get a
queue full of commit commands while all the OOB threads are waiting on keys
that those commit commands would unlock. As the OOB thread pool is full, we
discard messages, which I suspect slows things down quite a bit (especially
if it's a credit request/response message). So it may well be that a lower
number of HotRod working threads would perform better.

On the other hand, why is increasing the number of OOB threads a solution?
With -Xss 512k, you can get 2000 threads with only 1 GB of virtual memory
(the actual used memory is probably even less, unless you're using huge
pages). AFAIK the Linux kernel doesn't break a sweat with 10 threads
running, so having 2000 threads just hanging around, waiting for a
response, should be such a problem.

I did chat with Bela (or was it a break-out session?) about moving
Infinispan's request processing to another thread pool during the team
meeting in Palma. That would leave the OOB thread pool free to receive
response messages, FD heartbeats, credit requests/responses etc. The
downside, I guess, is that each request would have to be passed to another
thread, and the context switch may slow things down a bit. But since the
new thread pool would be in Infinispan, we could even do tricks like
executing a commit/rollback directly on the OOB thread.

In the end, I just didn't feel that working on this was justified,
considering the number of critical bugs we had. But maybe now's the time to
start experimenting...



On Fri, Feb 1, 2013 at 10:04 AM, Radim Vansa  wrote:

> Hi guys,
>
> after dealing with the large cluster for a while I find the way how we use
> OOB threads in synchronous configuration non-robust.
> Imagine a situation where node which is not an owner of the key calls PUT.
> Then the a RPC is called to the primary owner of that key, which reroutes
> the request to all other owners and after these reply, it replies back.
> There are two problems:
> 1) If we do simultanously X requests from non-owners to the primary owner
> where X is OOB TP size, all the OOB threads are waiting for the responses
> and there is no thread to process the OOB response and release the thread.
> 2) Node A is primary owner of keyA, non-primary owner of keyB and B is
> primary of keyB and non-primary of keyA. We got many requests for both keyA
> and keyB from other nodes, therefore, all OOB threads from both nodes call
> RPC to the non-primary owner but there's noone who could process the
> request.
>
> While we wait for the requests to timeout, the nodes with depleted OOB
> threadpools start suspecting all other nodes because they can't receive
> heartbeats etc...
>
> You can say "increase your OOB tp size", but that's not always an option,
> I have currently set it to 1000 threads and it's not enough. In the end, I
> will be always limited by RAM and something tells me that even nodes with
> few gigs of RAM should be able to form a huge cluster. We use 160 hotrod
> worker threads in JDG, that means that 160 * clusterSize = 10240 (64 nodes
> in my cluster) parallel requests can be executed, and if 10% targets the
> same node with 1000 OOB threads, it stucks. It's about scaling and
> robustness.
>
> Not that I'd have any good solution, but I'd really like to start a
> discussion.
> Thinking about it a bit, the problem is that blocking call (calling RPC on
> primary owner from message handler) can block non-blocking calls (such as
> RPC response or command that never sends any more messages). Therefore,
> having a flag on message "this won't send another message" could let the
> message be executed in different threadpool, which will be never
> deadlocked. In fact, the pools could share the threads but the non-blocking
> would have always a few threads spare.
> It's a bad solution as maintaining which message could block in the other
> node is really, really hard (we can be sure only in case of RPC responses),
> especially when some locks come. I will welcome anything better.
>
> Radim
>
>
> ---
> Radim Vansa
> Quality Assurance Engineer
> JBoss Datagrid
> tel. +420532

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-01 Thread Radim Vansa
Yeah, that would work if it is possible to break execution path into the 
FutureListener from the middle of interceptor stack - I am really not sure 
about that but as in current design no locks should be held when a RPC is 
called, it may be possible.

Let's see what someone more informed (Dan?) would think about that.

Thanks, Bela

Radim

- Original Message -
| From: "Bela Ban" 
| To: infinispan-dev@lists.jboss.org
| Sent: Friday, February 1, 2013 9:39:43 AM
| Subject: Re: [infinispan-dev] Threadpools in a large cluster
| 
| It looks like the core problem is an incoming RPC-1 which triggers
| another blocking RPC-2: the thread delivering RPC-1 is blocked
| waiting
| for the response from RPC-2, and can therefore not be used to serve
| other requests for the duration of RPC-2. If RPC-2 takes a while,
| e.g.
| waiting to acquire a lock in the remote node, then it is clear that
| the
| thread pool will quickly exceed its max size.
| 
| A simple solution would be to prevent invoking blocking RPCs *from
| within* a received RPC. Let's take a look at an example:
| - A invokes a blocking PUT-1 on B
| - B forwards the request as blocking PUT-2 to C and D
| - When PUT-2 returns and B gets the responses from C and D (or the
| first
| one to respond, don't know exactly how this is implemented), it sends
| the response back to A (PUT-1 terminates now at A)
| 
| We could change this to the following:
| - A invokes a blocking PUT-1 on B
| - B receives PUT-1. Instead of invoking a blocking PUT-2 on C and D,
| it
| does the following:
|   - B invokes PUT-2 and gets a future
|   - B adds itself as a FutureListener, and it also stores the
| address of the original sender (A)
|   - When the FutureListener is invoked, B sends back the result
|   as a
| response to A
| - Whenever a member leaves the cluster, the corresponding futures are
| cancelled and removed from the hashmaps
| 
| This could probably be done differently (e.g. by sending asynchronous
| messages and implementing a finite state machine), but the core of
| the
| solution is the same; namely to avoid having an incoming thread block
| on
| a sync RPC.
| 
| Thoughts ?
| 
| 
| 
| 
| On 2/1/13 9:04 AM, Radim Vansa wrote:
| > Hi guys,
| >
| > after dealing with the large cluster for a while I find the way how
| > we use OOB threads in synchronous configuration non-robust.
| > Imagine a situation where node which is not an owner of the key
| > calls PUT. Then the a RPC is called to the primary owner of that
| > key, which reroutes the request to all other owners and after
| > these reply, it replies back.
| > There are two problems:
| > 1) If we do simultanously X requests from non-owners to the primary
| > owner where X is OOB TP size, all the OOB threads are waiting for
| > the responses and there is no thread to process the OOB response
| > and release the thread.
| > 2) Node A is primary owner of keyA, non-primary owner of keyB and B
| > is primary of keyB and non-primary of keyA. We got many requests
| > for both keyA and keyB from other nodes, therefore, all OOB
| > threads from both nodes call RPC to the non-primary owner but
| > there's noone who could process the request.
| >
| > While we wait for the requests to timeout, the nodes with depleted
| > OOB threadpools start suspecting all other nodes because they
| > can't receive heartbeats etc...
| >
| > You can say "increase your OOB tp size", but that's not always an
| > option, I have currently set it to 1000 threads and it's not
| > enough. In the end, I will be always limited by RAM and something
| > tells me that even nodes with few gigs of RAM should be able to
| > form a huge cluster. We use 160 hotrod worker threads in JDG, that
| > means that 160 * clusterSize = 10240 (64 nodes in my cluster)
| > parallel requests can be executed, and if 10% targets the same
| > node with 1000 OOB threads, it stucks. It's about scaling and
| > robustness.
| >
| > Not that I'd have any good solution, but I'd really like to start a
| > discussion.
| > Thinking about it a bit, the problem is that blocking call (calling
| > RPC on primary owner from message handler) can block non-blocking
| > calls (such as RPC response or command that never sends any more
| > messages). Therefore, having a flag on message "this won't send
| > another message" could let the message be executed in different
| > threadpool, which will be never deadlocked. In fact, the pools
| > could share the threads but the non-blocking would have always a
| > few threads spare.
| > It's a bad solution as maintaining which message could block in the
| > other node is really, really hard (we can be sure only in case of
| > RPC responses), especially when some locks come. I will welcome
| > an

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-01 Thread Bela Ban
It looks like the core problem is an incoming RPC-1 which triggers 
another blocking RPC-2: the thread delivering RPC-1 is blocked waiting 
for the response from RPC-2, and can therefore not be used to serve 
other requests for the duration of RPC-2. If RPC-2 takes a while, e.g. 
waiting to acquire a lock in the remote node, then it is clear that the 
thread pool will quickly exceed its max size.

A simple solution would be to prevent invoking blocking RPCs *from 
within* a received RPC. Let's take a look at an example:
- A invokes a blocking PUT-1 on B
- B forwards the request as blocking PUT-2 to C and D
- When PUT-2 returns and B gets the responses from C and D (or the first 
one to respond, don't know exactly how this is implemented), it sends 
the response back to A (PUT-1 terminates now at A)

We could change this to the following:
- A invokes a blocking PUT-1 on B
- B receives PUT-1. Instead of invoking a blocking PUT-2 on C and D, it 
does the following:
  - B invokes PUT-2 and gets a future
  - B adds itself as a FutureListener, and it also stores the 
address of the original sender (A)
  - When the FutureListener is invoked, B sends back the result as a 
response to A
- Whenever a member leaves the cluster, the corresponding futures are 
cancelled and removed from the hashmaps

This could probably be done differently (e.g. by sending asynchronous 
messages and implementing a finite state machine), but the core of the 
solution is the same; namely to avoid having an incoming thread block on 
a sync RPC.

Thoughts ?




On 2/1/13 9:04 AM, Radim Vansa wrote:
> Hi guys,
>
> after dealing with the large cluster for a while I find the way how we use 
> OOB threads in synchronous configuration non-robust.
> Imagine a situation where node which is not an owner of the key calls PUT. 
> Then the a RPC is called to the primary owner of that key, which reroutes the 
> request to all other owners and after these reply, it replies back.
> There are two problems:
> 1) If we do simultanously X requests from non-owners to the primary owner 
> where X is OOB TP size, all the OOB threads are waiting for the responses and 
> there is no thread to process the OOB response and release the thread.
> 2) Node A is primary owner of keyA, non-primary owner of keyB and B is 
> primary of keyB and non-primary of keyA. We got many requests for both keyA 
> and keyB from other nodes, therefore, all OOB threads from both nodes call 
> RPC to the non-primary owner but there's noone who could process the request.
>
> While we wait for the requests to timeout, the nodes with depleted OOB 
> threadpools start suspecting all other nodes because they can't receive 
> heartbeats etc...
>
> You can say "increase your OOB tp size", but that's not always an option, I 
> have currently set it to 1000 threads and it's not enough. In the end, I will 
> be always limited by RAM and something tells me that even nodes with few gigs 
> of RAM should be able to form a huge cluster. We use 160 hotrod worker 
> threads in JDG, that means that 160 * clusterSize = 10240 (64 nodes in my 
> cluster) parallel requests can be executed, and if 10% targets the same node 
> with 1000 OOB threads, it stucks. It's about scaling and robustness.
>
> Not that I'd have any good solution, but I'd really like to start a 
> discussion.
> Thinking about it a bit, the problem is that blocking call (calling RPC on 
> primary owner from message handler) can block non-blocking calls (such as RPC 
> response or command that never sends any more messages). Therefore, having a 
> flag on message "this won't send another message" could let the message be 
> executed in different threadpool, which will be never deadlocked. In fact, 
> the pools could share the threads but the non-blocking would have always a 
> few threads spare.
> It's a bad solution as maintaining which message could block in the other 
> node is really, really hard (we can be sure only in case of RPC responses), 
> especially when some locks come. I will welcome anything better.

-- 
Bela Ban, JGroups lead (http://www.jgroups.org)

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


[infinispan-dev] Threadpools in a large cluster

2013-02-01 Thread Radim Vansa
Hi guys,

after dealing with the large cluster for a while I find the way how we use OOB 
threads in synchronous configuration non-robust.
Imagine a situation where node which is not an owner of the key calls PUT. Then 
the a RPC is called to the primary owner of that key, which reroutes the 
request to all other owners and after these reply, it replies back.
There are two problems:
1) If we do simultanously X requests from non-owners to the primary owner where 
X is OOB TP size, all the OOB threads are waiting for the responses and there 
is no thread to process the OOB response and release the thread.
2) Node A is primary owner of keyA, non-primary owner of keyB and B is primary 
of keyB and non-primary of keyA. We got many requests for both keyA and keyB 
from other nodes, therefore, all OOB threads from both nodes call RPC to the 
non-primary owner but there's noone who could process the request.

While we wait for the requests to timeout, the nodes with depleted OOB 
threadpools start suspecting all other nodes because they can't receive 
heartbeats etc...

You can say "increase your OOB tp size", but that's not always an option, I 
have currently set it to 1000 threads and it's not enough. In the end, I will 
be always limited by RAM and something tells me that even nodes with few gigs 
of RAM should be able to form a huge cluster. We use 160 hotrod worker threads 
in JDG, that means that 160 * clusterSize = 10240 (64 nodes in my cluster) 
parallel requests can be executed, and if 10% targets the same node with 1000 
OOB threads, it stucks. It's about scaling and robustness.

Not that I'd have any good solution, but I'd really like to start a discussion.
Thinking about it a bit, the problem is that blocking call (calling RPC on 
primary owner from message handler) can block non-blocking calls (such as RPC 
response or command that never sends any more messages). Therefore, having a 
flag on message "this won't send another message" could let the message be 
executed in different threadpool, which will be never deadlocked. In fact, the 
pools could share the threads but the non-blocking would have always a few 
threads spare.
It's a bad solution as maintaining which message could block in the other node 
is really, really hard (we can be sure only in case of RPC responses), 
especially when some locks come. I will welcome anything better.

Radim


---
Radim Vansa
Quality Assurance Engineer
JBoss Datagrid
tel. +420532294559 ext. 62559

Red Hat Czech, s.r.o.
Brno, Purkyňova 99/71, PSČ 612 45
Czech Republic


___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev