[ http://jira.jboss.com/jira/browse/JBAS-896?page=history ]
Scott M Stark reassigned JBAS-896: ---------------------------------- Assign To: Bela Ban (was: Scott M Stark) > JMS started on both nodes in cluster after network glitch > --------------------------------------------------------- > > Key: JBAS-896 > URL: http://jira.jboss.com/jira/browse/JBAS-896 > Project: JBoss Application Server > Type: Bug > Components: Clustering > Versions: JBossAS-3.2.6 Final > Reporter: SourceForge User > Assignee: Bela Ban > > > SourceForge Submitter: iankenn . > Original posting on JBoss.org Clustering forum: > Hi > I'm currently developing a system which uses JMS > queuing for async processing of messages. I'm looking > at deploying to a cluster of two JBoss 3.2.3 servers to > provide some level of fail-over/resilience. > During testing of the JMS fail-over I've tried killing > one of the JBoss instances (the one running the JMS > server) and see that the JMS queues are migrated to the > other node. But when I tried to simulate a temporary > loss of network connectivity between the two machines > (by removing one of the network cables and then > replacing it) the cluster seems to break and both > machines start to run the JMS queues. > When the network cable is reconnected, neither node > appear to know that there is another node in the same > partition. Effectively the cluster is not > re-established. The only way to make the two nodes see > each other again is to restart one of the nodes. Is > there something that I have miss-configured/not > configured, I am new to clustering and would appreciate > some advice. - I am currently testing on two windows > machines but intend to deploy to Linux boxes. > Thanks, > Ian > See posting > http://www.jboss.org/index.html?module=bb&op=viewtopic&t=45901 > Configuration (both machines) > OS: Windows 2000 > JDK: 1.4.2_03 > JBoss: 3.2.3 > The attached zip contains the cluster.log files for > both servers: > Node 'A' - Node_A_cluster.log > Node 'B' - Node_B_cluster.log > Steps > ----- > 1. Turn on logging for clustering in /conf/log4j.xml > 2. Start JBoss on Node 'A' > 3. Start JBoss on Node 'B' > 4. Deploy EAR to farm dir on Node 'A'' > This is farmed to Node 'B' > 5. Submit Msg to Node 'A' (Http request to application) > 6. Submit Msg to Node 'B' (Http request to application) > 7. Look at the HAILSharedState ServerAddress for the > JBoss MQ on the jmx-console - this shows the IP address > of Node 'A' on both nodes. > 8. Remove network cable from Node 'A' > 9. The following messages are displayed in the console: > Node 'A' > 10:40:53,921 INFO [DefaultPartition] New cluster view > (id: 2, delta: -1) : [192.168.0.34:1099] > 10:40:53,921 INFO [DefaultPartition:ReplicantManager] > Dead members: 1 > 10:40:58,015 INFO [DefaultPartition] Suspected member: > wizcom-desk01:4950 (additional data: 17 byte > s) > Node 'B' > 10:40:53,376 INFO [DefaultPartition] New cluster view > (id: 2, delta: -1) : [192.168.0.46:1099] > 10:40:53,376 INFO [DefaultPartition:ReplicantManager] > Dead members: 1 > 10:40:53,516 INFO [HAILServerILService] Notified to > become singleton > 10. The jmx-console on Node 'B' now shows it's own IP > address as the HAILSharedState ServerAddress. > 11. The jmx-console on Node 'A' still shows it's own IP > address as the HAILSharedState ServerAddress. > 11. Reconnect the network cable to Node 'A' > 12. The following message appears in the console: > Node 'A' > 10:45:05,171 INFO [DefaultPartition] New cluster view > (id: 3, delta: 1) : [192.168.0.34:1099, 192.168.0.46:1099] > 10:45:05,171 INFO [DefaultPartition:ReplicantManager] > Merging partitions... > 10:45:05,171 INFO [DefaultPartition:ReplicantManager] > Dead members: 0 > 10:45:05,187 INFO [DefaultPartition:ReplicantManager] > Originating groups: [[wizcom-comp2:1277 (additional > data: 17 bytes)|2] [wizcom-comp2:1277 (additional data: > 17 bytes)], [wizcom-desk01:4950 (additional data: 17 > bytes)|2] [wizcom-desk01:4950 (additional data: 17 bytes)]] > 10:45:05,233 INFO [DefaultPartition:ReplicantManager] > Start merging members in DRM service... > 10:45:05,655 INFO [DefaultPartition:ReplicantManager] > ..Finished merging members in DRM service > Node 'B' > 10:45:05,740 INFO [DefaultPartition] New cluster view: > 3 ([192.168.0.34:1099, 192.168.0.46:1099] delta: 1) > 10:45:05,756 INFO [DefaultPartition:ReplicantManager] > Merging partitions... > 10:45:05,756 INFO [DefaultPartition:ReplicantManager] > Dead members: 0 > 10:45:05,756 INFO [DefaultPartition:ReplicantManager] > Originating groups: [[wizcom-comp2:1277 (additional > data: 17 bytes)|2] [wizcom-comp2:1277 (additional data: > 17 bytes)], [WIZCOM-DESK01:4950 (additional data: 17 > bytes)|2] [WIZCOM-DESK01:4950 (additional data: 17 bytes)]] > 10:45:05,818 INFO [DefaultPartition:ReplicantManager] > Start merging members in DRM service... > 10:45:05,943 INFO [HAILServerILService] Notified to > stop acting as singleton. > 10:45:05,943 INFO [DefaultPartition:ReplicantManager] > ..Finished merging members in DRM service > 13. Refresh the HAILSharedState in the jmx-console, > both nodes have their own IP address as the ServerAddress. > Thanks > Ian -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa - If you want more information on JIRA, or have a bug to report see: http://www.atlassian.com/software/jira ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ JBoss-Development mailing list JBoss-Development@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/jboss-development