I will work on that, thanks for the complete report. Cheers,
Sacha > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On > Behalf Of Scott M Stark > Sent: lundi, 4. août 2003 00:59 > To: [EMAIL PROTECTED] > Subject: [JBoss-dev] > DistributedReplicantManager.isMasterReplica(String) false positives? > > > There is a race condition i the > DistributedReplicantManager.isMasterReplica(String) that > shows up when this > method is called from within a notifyKeyListeners as shown by > this stack trace: > > Thread "main"@65 status: RUNNING > - isMasterReplica():437, > org.jboss.ha.framework.server.DistributedReplicantManagerImpl > - isDRMMasterReplica():234, org.jboss.ha.jmx.HAServiceMBeanSupport > - partitionTopologyChanged():103, > org.jboss.ha.singleton.HASingletonSupport > - replicantsChanged():197, org.jboss.ha.jmx.HAServiceMBeanSupport$1 > - notifyKeyListeners():675, > org.jboss.ha.framework.server.DistributedReplicantManagerImpl > - add():326, > org.jboss.ha.framework.server.DistributedReplicantManagerImpl > - registerDRMListener():204, org.jboss.ha.jmx.HAServiceMBeanSupport > - startService():144, org.jboss.ha.jmx.HAServiceMBeanSupport > > This is due the the choice to return true when the key in > question is in the > localReplicants table, but not the replicants table: > > public boolean isMasterReplica (String key) > { > if (!localReplicants.containsKey (key)) > return false; > > Vector allNodes = this.partition.getCurrentView (); > HashMap repForKey = (HashMap)replicants.get(key); > if (repForKey==null) > return true; ???? > > This seems to be an ambiguous condition as this condition > exists for a node that > calls add and when the state has not synched or has failed to > synch. Another > problem I'm seeing at least in the context of the singleton > service is that the > notion of the master node is unstable. Here is the output > from one of 3 nodes > running the singleton service starting with the addition of > the final node shown > as view 2. > > 15:35:44,637 INFO [Server] JBoss (MX MicroKernel) [3.2.2RC3 (build: > CVSTag=Branch_3_2 date=200307312219)] Started in 5s:948ms > 15:36:27,719 INFO [DefaultPartition] New cluster view: 2 > ([lamia:32947, > 172.17.66.54:2821, ironmaiden:51770] delta: 1) > 15:36:27,749 INFO [DefaultPartition:ReplicantManager] Dead members: 0 > 15:37:13,555 INFO [DefaultPartition] New cluster view (id: > 3, delta: -1) : > [172.17.66.54:2821, ironmaiden:51770] > 15:37:13,575 INFO [DefaultPartition:ReplicantManager] Dead members: 1 > 15:38:13,321 INFO [HASingletonMBeanExample] Notified to > start as singleton > 15:38:13,321 INFO [DefaultPartition] New cluster view (id: > 4, delta: 1) : > [172.17.66.54:2821, ironmaiden:51770, lamia:32949] > 15:38:13,331 INFO [DefaultPartition:ReplicantManager] Dead members: 0 > 15:38:13,361 INFO [HASingletonMBeanExample] Notified to stop > as singleton > 15:39:13,447 INFO [HASingletonMBeanExample] Notified to > start as singleton > 15:39:13,457 INFO [HASingletonMBeanExample] Notified to stop > as singleton > > With view 3 the orginal node and singleton is killed and the > node for which the > console output corresponds(172.17.66.54) is selected as the > singleton. When the > third node is started again there is some thrashing due to > the existing 2 nodes > both selecting themselves as the singleton and telling the > other to stop and it > appears that there is no singleton choosen. The problem seems > to be inconsistent > matching of member names. Once only knows it IP while the > other node knows the > hostnames. Here is the console view of the second node > showing the hostnames and > its thrashing: > > 15:25:21,023 INFO [Server] JBoss (MX MicroKernel) [3.2.2RC3 (build: > CVSTag=Branch_3_2 date=200307312219)] Started in 13s:597ms > 15:26:05,562 INFO [DefaultPartition] New cluster view: 3 > ([succubus:2821, > ironmaiden:51770] delta: -1) > 15:26:05,573 INFO [DefaultPartition:ReplicantManager] Dead members: 1 > 15:27:05,506 INFO [HASingletonMBeanExample] Notified to > start as singleton > 15:27:05,509 INFO [DefaultPartition] New cluster view: 4 > ([succubus:2821, > ironmaiden:51770, lamia:32949] delta: 1) > 15:27:05,513 INFO [DefaultPartition:ReplicantManager] Dead members: 0 > 15:27:05,531 INFO [HASingletonMBeanExample] Notified to stop > as singleton > 15:28:05,520 INFO [HASingletonMBeanExample] Notified to > start as singleton > 15:28:05,526 INFO [HASingletonMBeanExample] Notified to stop > as singleton > > Its not clear that the > DistributedReplicantManager.isMasterReplica was designed > to be used for the selection of a singleton node, but if it > is, the logic needs > to be firmed up. If not, the singleton service needs to be > built on something else. > > -- > xxxxxxxxxxxxxxxxxxxxxxxx > Scott Stark > Chief Technology Officer > JBoss Group, LLC > xxxxxxxxxxxxxxxxxxxxxxxx > > > > > ------------------------------------------------------- > This SF.Net email sponsored by: Free pre-built ASP.NET sites including > Data Reports, E-commerce, Portals, and Forums are available now. > Download today and enter to win an XBOX or Visual Studio .NET. > http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet > _072303_01/01 > _______________________________________________ > JBoss-Development mailing list > [EMAIL PROTECTED] > https://lists.sourceforge.net/lists/listinfo/jboss-development > > > ------------------------------------------------------- This SF.Net email sponsored by: Free pre-built ASP.NET sites including Data Reports, E-commerce, Portals, and Forums are available now. Download today and enter to win an XBOX or Visual Studio .NET. http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01 _______________________________________________ JBoss-Development mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/jboss-development