Hi Bela,

I'm interested in your changes as well, as concurrent startup vexes my cache 
usage as well.  Is there a wiki or JIRA I could look at to understand the 
fundamental differences?

Thanks,

Erik

-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of Bela Ban
Sent: Wednesday, April 06, 2011 2:46 AM
To: [email protected]
Subject: Re: [infinispan-dev] Infinispan Large Scale support

Hi David,

Dan and I had a talk about integrating my changes to the distribution code to 
5.x. As I mentioned below, the current code is quite brittle wrt concurrent 
startup, so this will get fixed with my changes. I hope we can backport this to 
the 4.2.x branch as well. As a matter of fact, I actually made my changes on a 
branch off of 4.2.x.

On 4/5/11 2:51 PM, david marion wrote:
>
> Bela,
>
>    Yes, it is a replicated cache and I used your udp-largecluster.xml file 
> and just modified it slightly. It does appear that the distributed cache is 
> in a deadlock (or there is a race condition), the coordinator comes up, but 
> the other caches do not, they sit there and wait. I was able to get a 
> distributed cache up and running on 100+ nodes, now I cannot get 5 of them 
> running.
>
>> Date: Tue, 5 Apr 2011 11:09:54 +0200
>> From: [email protected]
>> To: [email protected]
>> Subject: Re: [infinispan-dev] Infinispan Large Scale support
>>
>>
>>
>> On 4/4/11 5:45 PM, david marion wrote:
>>>
>>>
>>> Good news! I was able to use the system property from ISPN-83 and remove 
>>> the FLUSH from the jgroups config with 4.2.1.FINAL, and start-up times are 
>>> much much better. We have a replicated cache on about 420+ nodes up in 
>>> under 2 minutes.
>>
>>
>> Great ! Just to confirm: this is 420+ Infinispan instances, with
>> replication enabled, correct ?
>>
>> Did you use a specific JGroups config (e.g. udp-largecluster.xml) ?
>>
>>
>>> I am seeing an issue with the distributed cache though with as little as 5 
>>> nodes.
>>>
>>> In the coordinator log I see
>>>
>>> org.infinispan.distribution.DistributionmanagerImpl: Detected a view 
>>> change. Member list changed.......
>>> org.infinispan.distribution.DistributionmanagerImpl: This is a JOIN
>>> event! Wait for notification from new joiner<name>
>>>
>>> In the log from the joining node I see:
>>>
>>> org.infinispan.distribution.JoinTask: Commencing rehash on
>>> node:<name>. Before start, distributionManager.joinComplete=false
>>> org.infinispan.distribution.JoinTask: Requesting old consistent hash
>>> from coordinator
>>>
>>> I jstack'd the joiner, the DefaultCacheManager.getCache() method is
>>> waiting on 
>>> org.infinispan.distribution.DistributionManagerImpl.waitForJoinToComplete() 
>>> and the Rehasher thread is waiting on:
>>>
>>> at
>>> org.infinispan.util.concurrent.ReclosableLatch.await(ReclosableLatch
>>> .java:75) at
>>> org.infinipsan.remoting.transport.jgroups.JGroupsDistSync.blockUntil
>>> NoJoinsInProgress(JGroupsDistSync.java:113)
>>>
>>> Any thoughts?
>>
>>
>> I recently took a look at the distribution code, and this part is
>> very brittle with respect to parallel startup and merging. Plus, I
>> believe the (blocking) RPC to fetch the old CH from the coordinator
>> might deadlock in certain cases...
>>
>> I've got a pull request for a push based rebalancing versus pull
>> based rebalancing pending. It'll likely make it into 5.x, as a matter
>> of fact I've got a chat about this this afternoon.
>>
>>
>>
>>
>>>> Date: Wed, 23 Mar 2011 15:58:19 +0100
>>>> From: [email protected]
>>>> To: [email protected]
>>>> Subject: Re: [infinispan-dev] Infinispan Large Scale support
>>>>
>>>>
>>>>
>>>> On 3/23/11 2:39 PM, david marion wrote:
>>>>>
>>>>> Bela,
>>>>>
>>>>> Is there a way to start up the JGroups stack on every node without using 
>>>>> Infinispan?
>>>>
>>>>
>>>> You could use ViewDemo [1] or Draw. Or write your own small test
>>>> program; if you take a look at ViewDemo's src, you'll see that it's
>>>> onyl a page of code.
>>>>
>>>>
>>>>> Is there some functional test that I can run or something? I know
>>>>> I can't remove the FLUSH from Infinispan until 5.0.0 and I don't know if 
>>>>> I can upgrade the underlying JGroups jar.
>>>>
>>>>
>>>> I suggest test with the latest JGroups (2.12.0) and +FLUSH and -FLUSH.
>>>> The +FLUSH config should be less painful now, with the introduction
>>>> of view bundling: we need to run flush fewer times than before.
>>>>
>>>>
>>>> [1] http://community.jboss.org/wiki/TestingJBoss
>>>>
>>>> --
>>>> Bela Ban
>>>> Lead JGroups / Clustering Team
>>>> JBoss
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> [email protected]
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> [email protected]
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>> --
>> Bela Ban
>> Lead JGroups / Clustering Team
>> JBoss
>> _______________________________________________
>> infinispan-dev mailing list
>> [email protected]
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> [email protected]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

--
Bela Ban
Lead JGroups / Clustering Team
JBoss
_______________________________________________
infinispan-dev mailing list
[email protected]
https://lists.jboss.org/mailman/listinfo/infinispan-dev

The information contained in this message is legally privileged and 
confidential, and is intended for the individual or entity to whom it is 
addressed (or their designee). If this message is read by anyone other than the 
intended recipient, please be advised that distribution of this message, in any 
form, is strictly prohibited. If you have received this message in error, 
please notify the sender immediately and delete or destroy all copies of this 
message.

_______________________________________________
infinispan-dev mailing list
[email protected]
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Reply via email to