I will work on that, thanks for the complete report.

Cheers,

Sacha


> -----Original Message-----
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On 
> Behalf Of Scott M Stark
> Sent: lundi, 4. août 2003 00:59
> To: [EMAIL PROTECTED]
> Subject: [JBoss-dev] 
> DistributedReplicantManager.isMasterReplica(String) false positives?
> 
> 
> There is a race condition i the 
> DistributedReplicantManager.isMasterReplica(String) that 
> shows up when this 
> method is called from within a notifyKeyListeners as shown by 
> this stack trace:
> 
> Thread "main"@65 status: RUNNING
> - isMasterReplica():437, 
> org.jboss.ha.framework.server.DistributedReplicantManagerImpl
> - isDRMMasterReplica():234, org.jboss.ha.jmx.HAServiceMBeanSupport
> - partitionTopologyChanged():103, 
> org.jboss.ha.singleton.HASingletonSupport
> - replicantsChanged():197, org.jboss.ha.jmx.HAServiceMBeanSupport$1
> - notifyKeyListeners():675, 
> org.jboss.ha.framework.server.DistributedReplicantManagerImpl
> - add():326, 
> org.jboss.ha.framework.server.DistributedReplicantManagerImpl
> - registerDRMListener():204, org.jboss.ha.jmx.HAServiceMBeanSupport
> - startService():144, org.jboss.ha.jmx.HAServiceMBeanSupport
> 
> This is due the the choice to return true when the key in 
> question is in the
> localReplicants table, but not the replicants table:
> 
>     public boolean isMasterReplica (String key)
>     {
>        if (!localReplicants.containsKey (key))
>           return false;
> 
>        Vector allNodes = this.partition.getCurrentView ();
>        HashMap repForKey = (HashMap)replicants.get(key);
>        if (repForKey==null)
>           return true; ????
> 
> This seems to be an ambiguous condition as this condition 
> exists for a node that 
> calls add and when the state has not synched or has failed to 
> synch. Another 
> problem I'm seeing at least in the context of the singleton 
> service is that the 
> notion of the master node is unstable. Here is the output 
> from one of 3 nodes 
> running the singleton service starting with the addition of 
> the final node shown 
> as view 2.
> 
> 15:35:44,637 INFO  [Server] JBoss (MX MicroKernel) [3.2.2RC3 (build: 
> CVSTag=Branch_3_2 date=200307312219)] Started in 5s:948ms
> 15:36:27,719 INFO  [DefaultPartition] New cluster view: 2 
> ([lamia:32947, 
> 172.17.66.54:2821, ironmaiden:51770] delta: 1)
> 15:36:27,749 INFO  [DefaultPartition:ReplicantManager] Dead members: 0
> 15:37:13,555 INFO  [DefaultPartition] New cluster view (id: 
> 3, delta: -1) : 
> [172.17.66.54:2821, ironmaiden:51770]
> 15:37:13,575 INFO  [DefaultPartition:ReplicantManager] Dead members: 1
> 15:38:13,321 INFO  [HASingletonMBeanExample] Notified to 
> start as singleton
> 15:38:13,321 INFO  [DefaultPartition] New cluster view (id: 
> 4, delta: 1) : 
> [172.17.66.54:2821, ironmaiden:51770, lamia:32949]
> 15:38:13,331 INFO  [DefaultPartition:ReplicantManager] Dead members: 0
> 15:38:13,361 INFO  [HASingletonMBeanExample] Notified to stop 
> as singleton
> 15:39:13,447 INFO  [HASingletonMBeanExample] Notified to 
> start as singleton
> 15:39:13,457 INFO  [HASingletonMBeanExample] Notified to stop 
> as singleton
> 
> With view 3 the orginal node and singleton is killed and the 
> node for which the 
> console output corresponds(172.17.66.54) is selected as the 
> singleton. When the 
> third node is started again there is some thrashing due to 
> the existing 2 nodes 
> both selecting themselves as the singleton and telling the 
> other to stop and it 
> appears that there is no singleton choosen. The problem seems 
> to be inconsistent 
>   matching of member names. Once only knows it IP while the 
> other node knows the 
> hostnames. Here is the console view of the second node 
> showing the hostnames and 
> its thrashing:
> 
> 15:25:21,023 INFO  [Server] JBoss (MX MicroKernel) [3.2.2RC3 (build: 
> CVSTag=Branch_3_2 date=200307312219)] Started in 13s:597ms
> 15:26:05,562 INFO  [DefaultPartition] New cluster view: 3 
> ([succubus:2821, 
> ironmaiden:51770] delta: -1)
> 15:26:05,573 INFO  [DefaultPartition:ReplicantManager] Dead members: 1
> 15:27:05,506 INFO  [HASingletonMBeanExample] Notified to 
> start as singleton
> 15:27:05,509 INFO  [DefaultPartition] New cluster view: 4 
> ([succubus:2821, 
> ironmaiden:51770, lamia:32949] delta: 1)
> 15:27:05,513 INFO  [DefaultPartition:ReplicantManager] Dead members: 0
> 15:27:05,531 INFO  [HASingletonMBeanExample] Notified to stop 
> as singleton
> 15:28:05,520 INFO  [HASingletonMBeanExample] Notified to 
> start as singleton
> 15:28:05,526 INFO  [HASingletonMBeanExample] Notified to stop 
> as singleton
> 
> Its not clear that the 
> DistributedReplicantManager.isMasterReplica was designed 
> to be used for the selection of a singleton node, but if it 
> is, the logic needs 
> to be firmed up. If not, the singleton service needs to be 
> built on something else.
> 
> -- 
> xxxxxxxxxxxxxxxxxxxxxxxx
> Scott Stark
> Chief Technology Officer
> JBoss Group, LLC
> xxxxxxxxxxxxxxxxxxxxxxxx
> 
> 
> 
> 
> -------------------------------------------------------
> This SF.Net email sponsored by: Free pre-built ASP.NET sites including
> Data Reports, E-commerce, Portals, and Forums are available now.
> Download today and enter to win an XBOX or Visual Studio .NET.
> http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet
> _072303_01/01
> _______________________________________________
> JBoss-Development mailing list
> [EMAIL PROTECTED]
> https://lists.sourceforge.net/lists/listinfo/jboss-development
> 
> 
> 




-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01
_______________________________________________
JBoss-Development mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/jboss-development

Reply via email to