Hi David, Dan and I had a talk about integrating my changes to the distribution code to 5.x. As I mentioned below, the current code is quite brittle wrt concurrent startup, so this will get fixed with my changes. I hope we can backport this to the 4.2.x branch as well. As a matter of fact, I actually made my changes on a branch off of 4.2.x.
On 4/5/11 2:51 PM, david marion wrote: > > Bela, > > Yes, it is a replicated cache and I used your udp-largecluster.xml file > and just modified it slightly. It does appear that the distributed cache is > in a deadlock (or there is a race condition), the coordinator comes up, but > the other caches do not, they sit there and wait. I was able to get a > distributed cache up and running on 100+ nodes, now I cannot get 5 of them > running. > >> Date: Tue, 5 Apr 2011 11:09:54 +0200 >> From: [email protected] >> To: [email protected] >> Subject: Re: [infinispan-dev] Infinispan Large Scale support >> >> >> >> On 4/4/11 5:45 PM, david marion wrote: >>> >>> >>> Good news! I was able to use the system property from ISPN-83 and remove >>> the FLUSH from the jgroups config with 4.2.1.FINAL, and start-up times are >>> much much better. We have a replicated cache on about 420+ nodes up in >>> under 2 minutes. >> >> >> Great ! Just to confirm: this is 420+ Infinispan instances, with >> replication enabled, correct ? >> >> Did you use a specific JGroups config (e.g. udp-largecluster.xml) ? >> >> >>> I am seeing an issue with the distributed cache though with as little as 5 >>> nodes. >>> >>> In the coordinator log I see >>> >>> org.infinispan.distribution.DistributionmanagerImpl: Detected a view >>> change. Member list changed....... >>> org.infinispan.distribution.DistributionmanagerImpl: This is a JOIN event! >>> Wait for notification from new joiner<name> >>> >>> In the log from the joining node I see: >>> >>> org.infinispan.distribution.JoinTask: Commencing rehash on node:<name>. >>> Before start, distributionManager.joinComplete=false >>> org.infinispan.distribution.JoinTask: Requesting old consistent hash from >>> coordinator >>> >>> I jstack'd the joiner, the DefaultCacheManager.getCache() method is waiting >>> on >>> org.infinispan.distribution.DistributionManagerImpl.waitForJoinToComplete() >>> and the Rehasher thread >>> is waiting on: >>> >>> at >>> org.infinispan.util.concurrent.ReclosableLatch.await(ReclosableLatch.java:75) >>> at >>> org.infinipsan.remoting.transport.jgroups.JGroupsDistSync.blockUntilNoJoinsInProgress(JGroupsDistSync.java:113) >>> >>> Any thoughts? >> >> >> I recently took a look at the distribution code, and this part is very >> brittle with respect to parallel startup and merging. Plus, I believe >> the (blocking) RPC to fetch the old CH from the coordinator might >> deadlock in certain cases... >> >> I've got a pull request for a push based rebalancing versus pull based >> rebalancing pending. It'll likely make it into 5.x, as a matter of fact >> I've got a chat about this this afternoon. >> >> >> >> >>>> Date: Wed, 23 Mar 2011 15:58:19 +0100 >>>> From: [email protected] >>>> To: [email protected] >>>> Subject: Re: [infinispan-dev] Infinispan Large Scale support >>>> >>>> >>>> >>>> On 3/23/11 2:39 PM, david marion wrote: >>>>> >>>>> Bela, >>>>> >>>>> Is there a way to start up the JGroups stack on every node without using >>>>> Infinispan? >>>> >>>> >>>> You could use ViewDemo [1] or Draw. Or write your own small test >>>> program; if you take a look at ViewDemo's src, you'll see that it's onyl >>>> a page of code. >>>> >>>> >>>>> Is there some functional test that I can run or something? I know I can't >>>>> remove the FLUSH from Infinispan until 5.0.0 and I don't know if I can >>>>> upgrade the underlying >>>>> JGroups jar. >>>> >>>> >>>> I suggest test with the latest JGroups (2.12.0) and +FLUSH and -FLUSH. >>>> The +FLUSH config should be less painful now, with the introduction of >>>> view bundling: we need to run flush fewer times than before. >>>> >>>> >>>> [1] http://community.jboss.org/wiki/TestingJBoss >>>> >>>> -- >>>> Bela Ban >>>> Lead JGroups / Clustering Team >>>> JBoss >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> [email protected] >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> >>> >>> >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> [email protected] >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> -- >> Bela Ban >> Lead JGroups / Clustering Team >> JBoss >> _______________________________________________ >> infinispan-dev mailing list >> [email protected] >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > > _______________________________________________ > infinispan-dev mailing list > [email protected] > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Bela Ban Lead JGroups / Clustering Team JBoss _______________________________________________ infinispan-dev mailing list [email protected] https://lists.jboss.org/mailman/listinfo/infinispan-dev
