Hi David,

Dan and I had a talk about integrating my changes to the distribution 
code to 5.x. As I mentioned below, the current code is quite brittle wrt 
concurrent startup, so this will get fixed with my changes. I hope we 
can backport this to the 4.2.x branch as well. As a matter of fact, I 
actually made my changes on a branch off of 4.2.x.

On 4/5/11 2:51 PM, david marion wrote:
>
> Bela,
>
>    Yes, it is a replicated cache and I used your udp-largecluster.xml file 
> and just modified it slightly. It does appear that the distributed cache is 
> in a deadlock (or there is a race condition), the coordinator comes up, but 
> the other caches do not, they sit there and wait. I was able to get a 
> distributed cache up and running on 100+ nodes, now I cannot get 5 of them 
> running.
>
>> Date: Tue, 5 Apr 2011 11:09:54 +0200
>> From: [email protected]
>> To: [email protected]
>> Subject: Re: [infinispan-dev] Infinispan Large Scale support
>>
>>
>>
>> On 4/4/11 5:45 PM, david marion wrote:
>>>
>>>
>>> Good news! I was able to use the system property from ISPN-83 and remove 
>>> the FLUSH from the jgroups config with 4.2.1.FINAL, and start-up times are 
>>> much much better. We have a replicated cache on about 420+ nodes up in 
>>> under 2 minutes.
>>
>>
>> Great ! Just to confirm: this is 420+ Infinispan instances, with
>> replication enabled, correct ?
>>
>> Did you use a specific JGroups config (e.g. udp-largecluster.xml) ?
>>
>>
>>> I am seeing an issue with the distributed cache though with as little as 5 
>>> nodes.
>>>
>>> In the coordinator log I see
>>>
>>> org.infinispan.distribution.DistributionmanagerImpl: Detected a view 
>>> change. Member list changed.......
>>> org.infinispan.distribution.DistributionmanagerImpl: This is a JOIN event! 
>>> Wait for notification from new joiner<name>
>>>
>>> In the log from the joining node I see:
>>>
>>> org.infinispan.distribution.JoinTask: Commencing rehash on node:<name>. 
>>> Before start, distributionManager.joinComplete=false
>>> org.infinispan.distribution.JoinTask: Requesting old consistent hash from 
>>> coordinator
>>>
>>> I jstack'd the joiner, the DefaultCacheManager.getCache() method is waiting 
>>> on 
>>> org.infinispan.distribution.DistributionManagerImpl.waitForJoinToComplete() 
>>> and the Rehasher thread
>>> is waiting on:
>>>
>>> at 
>>> org.infinispan.util.concurrent.ReclosableLatch.await(ReclosableLatch.java:75)
>>> at 
>>> org.infinipsan.remoting.transport.jgroups.JGroupsDistSync.blockUntilNoJoinsInProgress(JGroupsDistSync.java:113)
>>>
>>> Any thoughts?
>>
>>
>> I recently took a look at the distribution code, and this part is very
>> brittle with respect to parallel startup and merging. Plus, I believe
>> the (blocking) RPC to fetch the old CH from the coordinator might
>> deadlock in certain cases...
>>
>> I've got a pull request for a push based rebalancing versus pull based
>> rebalancing pending. It'll likely make it into 5.x, as a matter of fact
>> I've got a chat about this this afternoon.
>>
>>
>>
>>
>>>> Date: Wed, 23 Mar 2011 15:58:19 +0100
>>>> From: [email protected]
>>>> To: [email protected]
>>>> Subject: Re: [infinispan-dev] Infinispan Large Scale support
>>>>
>>>>
>>>>
>>>> On 3/23/11 2:39 PM, david marion wrote:
>>>>>
>>>>> Bela,
>>>>>
>>>>> Is there a way to start up the JGroups stack on every node without using 
>>>>> Infinispan?
>>>>
>>>>
>>>> You could use ViewDemo [1] or Draw. Or write your own small test
>>>> program; if you take a look at ViewDemo's src, you'll see that it's onyl
>>>> a page of code.
>>>>
>>>>
>>>>> Is there some functional test that I can run or something? I know I can't 
>>>>> remove the FLUSH from Infinispan until 5.0.0 and I don't know if I can 
>>>>> upgrade the underlying
>>>>> JGroups jar.
>>>>
>>>>
>>>> I suggest test with the latest JGroups (2.12.0) and +FLUSH and -FLUSH.
>>>> The +FLUSH config should be less painful now, with the introduction of
>>>> view bundling: we need to run flush fewer times than before.
>>>>
>>>>
>>>> [1] http://community.jboss.org/wiki/TestingJBoss
>>>>
>>>> --
>>>> Bela Ban
>>>> Lead JGroups / Clustering Team
>>>> JBoss
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> [email protected]
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> [email protected]
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>> --
>> Bela Ban
>> Lead JGroups / Clustering Team
>> JBoss
>> _______________________________________________
>> infinispan-dev mailing list
>> [email protected]
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>                                       
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> [email protected]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

-- 
Bela Ban
Lead JGroups / Clustering Team
JBoss
_______________________________________________
infinispan-dev mailing list
[email protected]
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Reply via email to