[JBoss-user] [Clustering/JBoss] - Re: HA-JMS fails, Master node undeploying channels, no failo
Sorry for not replying for a while, but I was analyzing the logfiles and trying to reproduce the behaviour we have on our production system. Thanks to the answers here I think I understand now better what is going on, and I indeed found a way to reproduce the behaviour. First, I was wrong in my assumption that the channels are never rebound to JNDI when the master node fails. Here's what happens: Initally node 210 is the master node, and node 211 is a slave (hope the terminology is correct). At 08:14:24 the node 211 begins to receive new views. Taken from 211's logfile: 2006-06-21 08:14:24,757 INFO [org.jboss.ha.framework.interfaces.HAPartition.lifecycle.StagePartition] New cluster view for partition StagePartition (id: 201, delta: -2) : [62.50.43.211:1099, 62.50. 43.213:1099, 62.50.43.216:1099, 62.50.43.215:1099] 2006-06-21 08:14:24,757 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.StagePartition] I am (62.50.43.211:1099) received membershipChanged event: 2006-06-21 08:14:24,757 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.StagePartition] Dead members: 2 ([62.50.43.210:1099, 62.50.43.214:1099]) 2006-06-21 08:14:24,757 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.StagePartition] New Members : 0 ([]) 2006-06-21 08:14:24,757 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.StagePartition] All Members : 4 ([62.50.43.211:1099, 62.50.43.213:1099, 62.50.43.216:1099, 62.50.43.215:1 099]) As node 211 is now the master node and node 210 is in the list of dead members, node 211 deploys all channels, like it should. Taken from 211's logfile: 2006-06-21 08:14:25,496 INFO [org.jboss.web.tomcat.tc5.TomcatDeployer] deploy, ctxPath=/jbossmq-httpil, warUrl=.../deploy-hasingleton/jms/jbossmq-httpil.sar/jbossmq-httpil.war/ 2006-06-21 08:14:26,916 INFO [org.jboss.mq.server.jmx.Topic.sgw/MOCacheInvalidationTopic] Bound to JNDI name: topic/sgw/MOCacheInvalidationTopic 2006-06-21 08:14:26,917 INFO [org.jboss.mq.server.jmx.Topic.sgw/CdaHtmlCacheInvalidationTopic] Bound to JNDI name: topic/sgw/CdaHtmlCacheInvalidationTopic [...] But: Node 210 did not receive view 201 at all, so this node still has all the channels deployed as well. The next thing I see in the logfile of 211 is that node 214 is still sending messages, but from the viewpoint of 211 is not a cluster member anymore. I do not know if this is of any relevance, but to give you a complete picture I wanted to mention it. Taken from 211's logfile: 2006-06-21 08:14:29,985 ERROR [org.jgroups.protocols.pbcast.CoordGmsImpl] mbr 62.50.43.214:54923 (additional data: 17 bytes) is not a member ! 2006-06-21 08:14:29,987 INFO [org.jboss.ha.framework.interfaces.HAPartition.lifecycle.StagePartition] Suspected member: 62.50.43.214:54923 (additional data: 17 bytes) Next, 211 is receiving two more view changes (id 202 and 203). Taken from 211's logfile: 2006-06-21 08:14:34,867 INFO [org.jboss.ha.framework.interfaces.HAPartition.lifecycle.StagePartition] New cluster view for partition StagePartition (id: 202, delta: 1) : [62.50.43.211:1099, 62.50.4 3.213:1099, 62.50.43.216:1099, 62.50.43.215:1099, 62.50.43.214:1099] 2006-06-21 08:14:34,867 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.StagePartition] I am (62.50.43.211:1099) received membershipChanged event: 2006-06-21 08:14:34,867 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.StagePartition] Dead members: 0 ([]) 2006-06-21 08:14:34,867 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.StagePartition] New Members : 1 ([62.50.43.214:1099]) 2006-06-21 08:14:34,867 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.StagePartition] All Members : 5 ([62.50.43.211:1099, 62.50.43.213:1099, 62.50.43.216:1099, 62.50.43.215:1 099, 62.50.43.214:1099]) 2006-06-21 08:14:35,021 INFO [org.jboss.ha.framework.interfaces.HAPartition.lifecycle.StagePartition] New cluster view for partition StagePartition (id: 203, delta: 1) : [62.50.43.211:1099, 62.50.4 3.213:1099, 62.50.43.216:1099, 62.50.43.215:1099, 62.50.43.214:1099, 62.50.43.210:1099] 2006-06-21 08:14:35,021 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.StagePartition] I am (62.50.43.211:1099) received membershipChanged event: 2006-06-21 08:14:35,021 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.StagePartition] Dead members: 0 ([]) 2006-06-21 08:14:35,021 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.StagePartition] New Members : 1 ([62.50.43.210:1099]) 2006-06-21 08:14:35,021 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.StagePartition] All Members : 6 ([62.50.43.211:1099, 62.50.43.213:1099, 62.50.43.216:1099, 62.50.43.215:1 099, 62.50.43.214:1099, 62.50.43.210:1099]) Node 210 was not receiving view 202, but view 203. After receiving view 203 node 210 is aware
[JBoss-user] [Clustering/JBoss] - Re: HA-JMS fails, Master node undeploying channels, no failo
Thanks very much for your reply. I examined the logfiles again to answer your questions: [EMAIL PROTECTED] wrote : 1) You refer to the master node. Please confirm that this is 62.50.43.211. | No, at that time the master node was 62.50.43.210. The first logoutput and the second one are from this machine, means that the master node (62.50.43.210) produced the output Dead members:0, New members: 0 and immediately after that undeployed all the HA-Queues and HA-Topics. Sorry, I should have made that clear in my first post. [EMAIL PROTECTED] wrote : | 2) On the node that produced the first bit of logging in your post, do you see log entries with this content New cluster view for partition StagePartition: 202 and New cluster view for partition StagePartition: 201? | No, these messages are not present in the logfile. [EMAIL PROTECTED] wrote : | 3) If you have a log entry somewhere that contains New cluster view for partition StagePartition: 200, please compare the list of nodes to the first line in the first log entry in your post. Does it have the same 6 nodes but in different order? | You are right, I can see the same nodes, but in different order [EMAIL PROTECTED] wrote : | What I'm driving at here is I wonder if the machine doing the first bit of logging lost a couple view changes, going from 200 to 203. The result would be Dead members:0, New members: 0 but a different order of members. | Thanks, now I start to understand what is happening. You are right that the machine indeed lost some of the view changes, that's a problem I probably have to investigate on the network level. But the most intersting question for me is: Even if the (Master-)node lost some viewchanges, why does it suddenly undeploy the (HA-)queues and (HA-)topics? And why is the failover not happening, no other node is starting to deploy the queues and topics instead. I cannot explain how this is possible and also found no information in the docs or in the forums on this issue. The critical thing is that if I run into this scenario my HA-Queues and HA-Topics are not present on any instance, leading to lost messages and therefore also lost data. This situation should not be possible at all in a cluster. I am not quite sure if this is a cluster issue (I guess so), so if it is something related to JMS please let me know so I can ask in JMS-Forum. BTW: This is the only real problem we have with the JBoss platform. Everything else is working fine and stable. Developing with JBoss really was a breeze, so thanks for this great piece of software. Thanks again for your help. Jochen View the original post : http://www.jboss.com/index.html?module=bbop=viewtopicp=3954296#3954296 Reply to the post : http://www.jboss.com/index.html?module=bbop=postingmode=replyp=3954296 Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642 ___ JBoss-user mailing list JBoss-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/jboss-user
[JBoss-user] [Clustering/JBoss] - Re: HA-JMS fails, Master node undeploying channels, no failo
OK, things are a bit clearer. Don't know the full answer yet but we're getting there. jkressin wrote : | But the most intersting question for me is: Even if the (Master-)node lost some viewchanges, why does it suddenly undeploy the (HA-)queues and (HA-)topics? They are undeployed because when view 203 came in, 65.20.43.211 was no longer the first node in the view, 62.50.43.211 was. All HASingleton services (currently, we're looking to change this) run on the first member in the view on which they are deployed. If a node that is currently the singleton master for the service discovers its no longer that first node, it will stop providing the service. jkressin wrote : And why is the failover not happening, no other node is starting to deploy the queues and topics instead. I cannot explain how this is possible and also found no information in the docs or in the forums on this issue. This is the key question. 65.20.43.211 should have taken over as the HA-JMS server and deployed the queues and topics. Is there anything interesting in the 65.20.43.211 logs that could shed light on why it didn't? View the original post : http://www.jboss.com/index.html?module=bbop=viewtopicp=3954498#3954498 Reply to the post : http://www.jboss.com/index.html?module=bbop=postingmode=replyp=3954498 Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642 ___ JBoss-user mailing list JBoss-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/jboss-user
[JBoss-user] [Clustering/JBoss] - Re: HA-JMS fails, Master node undeploying channels, no failo
1) You refer to the master node. Please confirm that this is 62.50.43.211. 2) On the node that produced the first bit of logging in your post, do you see log entries with this content New cluster view for partition StagePartition: 202 and New cluster view for partition StagePartition: 201? 3) If you have a log entry somewhere that contains New cluster view for partition StagePartition: 200, please compare the list of nodes to the first line in the first log entry in your post. Does it have the same 6 nodes but in different order? What I'm driving at here is I wonder if the machine doing the first bit of logging lost a couple view changes, going from 200 to 203. The result would be Dead members:0, New members: 0 but a different order of members. I'm not sure what that would mean if it were the case, but it's an avenue to explore. View the original post : http://www.jboss.com/index.html?module=bbop=viewtopicp=3954206#3954206 Reply to the post : http://www.jboss.com/index.html?module=bbop=postingmode=replyp=3954206 Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642 ___ JBoss-user mailing list JBoss-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/jboss-user
[JBoss-user] [Clustering/JBoss] - Re: HA-JMS doesn't work for me
Make sure your client is accessing a connection factory using the HAJNDI and not the local JNDI. If your client is using the wrong port number for the local instead of the HA then you can get this exception when the name you are looking for is really i9n the HA and not the local. View the original post : http://www.jboss.org/index.html?module=bbop=viewtopicp=3853322#3853322 Reply to the post : http://www.jboss.org/index.html?module=bbop=postingmode=replyp=3853322 --- This SF.Net email is sponsored by: Sybase ASE Linux Express Edition - download now for FREE LinuxWorld Reader's Choice Award Winner for best database on Linux. http://ads.osdn.com/?ad_id=5588alloc_id=12065op=click ___ JBoss-user mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/jboss-user
[JBoss-user] [Clustering/JBoss] - Re: HA JMS
http://www.jboss.org/wiki/Wiki.jsp?page=JBossMQHA View the original post : http://www.jboss.org/index.html?module=bbop=viewtopicp=3851559#3851559 Reply to the post : http://www.jboss.org/index.html?module=bbop=postingmode=replyp=3851559 --- This SF.net email is sponsored by: IT Product Guide on ITManagersJournal Use IT products in your business? Tell us what you think of them. Give us Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more http://productguide.itmanagersjournal.com/guidepromo.tmpl ___ JBoss-user mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/jboss-user