[JBoss-user] [Clustering/JBoss] - Re: HA-JMS fails, Master node undeploying channels, no failo

2006-07-06 Thread jkressin
Sorry for not replying for a while, but I was analyzing the logfiles and trying 
to reproduce the behaviour we have on our production system. Thanks to the 
answers here I think I understand now better what is going on, and I indeed 
found a way to reproduce the behaviour.

First, I was wrong in my assumption that the channels are never rebound to JNDI 
when the master node fails.  Here's what happens:

Initally node 210 is the master node, and node 211 is a slave (hope the 
terminology is correct).  At 08:14:24 the node 211 begins to receive new views. 
Taken from 211's logfile:

2006-06-21 08:14:24,757 INFO  
[org.jboss.ha.framework.interfaces.HAPartition.lifecycle.StagePartition] New 
cluster view for partition StagePartition (id: 201, delta: -2) : 
[62.50.43.211:1099, 62.50.
43.213:1099, 62.50.43.216:1099, 62.50.43.215:1099]
2006-06-21 08:14:24,757 INFO  
[org.jboss.ha.framework.server.DistributedReplicantManagerImpl.StagePartition] 
I am (62.50.43.211:1099) received membershipChanged event:
2006-06-21 08:14:24,757 INFO  
[org.jboss.ha.framework.server.DistributedReplicantManagerImpl.StagePartition] 
Dead members: 2 ([62.50.43.210:1099, 62.50.43.214:1099])
2006-06-21 08:14:24,757 INFO  
[org.jboss.ha.framework.server.DistributedReplicantManagerImpl.StagePartition] 
New Members : 0 ([])
2006-06-21 08:14:24,757 INFO  
[org.jboss.ha.framework.server.DistributedReplicantManagerImpl.StagePartition] 
All Members : 4 ([62.50.43.211:1099, 62.50.43.213:1099, 62.50.43.216:1099, 
62.50.43.215:1
099])


As node 211 is now the master node and node 210 is in the list of dead members, 
node 211 deploys all channels, like it should.
Taken from 211's logfile:

2006-06-21 08:14:25,496 INFO  [org.jboss.web.tomcat.tc5.TomcatDeployer] deploy, 
ctxPath=/jbossmq-httpil, 
warUrl=.../deploy-hasingleton/jms/jbossmq-httpil.sar/jbossmq-httpil.war/
2006-06-21 08:14:26,916 INFO  
[org.jboss.mq.server.jmx.Topic.sgw/MOCacheInvalidationTopic] Bound to JNDI 
name: topic/sgw/MOCacheInvalidationTopic
2006-06-21 08:14:26,917 INFO  
[org.jboss.mq.server.jmx.Topic.sgw/CdaHtmlCacheInvalidationTopic] Bound to JNDI 
name: topic/sgw/CdaHtmlCacheInvalidationTopic
[...]

But: Node 210 did not receive view 201 at all, so this node still has all the 
channels deployed as well. The next thing I see in the logfile of 211 is that 
node 214 is still sending messages, but from the viewpoint of 211 is not a 
cluster member anymore. I do not know if this is of any relevance, but to give 
you a complete picture I wanted to mention it.
Taken from 211's logfile:
2006-06-21 08:14:29,985 ERROR [org.jgroups.protocols.pbcast.CoordGmsImpl] mbr 
62.50.43.214:54923 (additional data: 17 bytes) is not a member !
2006-06-21 08:14:29,987 INFO  
[org.jboss.ha.framework.interfaces.HAPartition.lifecycle.StagePartition] 
Suspected member: 62.50.43.214:54923 (additional data: 17 bytes)

Next, 211 is receiving two more view changes (id 202 and 203). 
Taken from 211's logfile:

2006-06-21 08:14:34,867 INFO  
[org.jboss.ha.framework.interfaces.HAPartition.lifecycle.StagePartition] New 
cluster view for partition StagePartition (id: 202, delta: 1) : 
[62.50.43.211:1099, 62.50.4
3.213:1099, 62.50.43.216:1099, 62.50.43.215:1099, 62.50.43.214:1099]
2006-06-21 08:14:34,867 INFO  
[org.jboss.ha.framework.server.DistributedReplicantManagerImpl.StagePartition] 
I am (62.50.43.211:1099) received membershipChanged event:
2006-06-21 08:14:34,867 INFO  
[org.jboss.ha.framework.server.DistributedReplicantManagerImpl.StagePartition] 
Dead members: 0 ([])
2006-06-21 08:14:34,867 INFO  
[org.jboss.ha.framework.server.DistributedReplicantManagerImpl.StagePartition] 
New Members : 1 ([62.50.43.214:1099])
2006-06-21 08:14:34,867 INFO  
[org.jboss.ha.framework.server.DistributedReplicantManagerImpl.StagePartition] 
All Members : 5 ([62.50.43.211:1099, 62.50.43.213:1099, 62.50.43.216:1099, 
62.50.43.215:1
099, 62.50.43.214:1099])
2006-06-21 08:14:35,021 INFO  
[org.jboss.ha.framework.interfaces.HAPartition.lifecycle.StagePartition] New 
cluster view for partition StagePartition (id: 203, delta: 1) : 
[62.50.43.211:1099, 62.50.4
3.213:1099, 62.50.43.216:1099, 62.50.43.215:1099, 62.50.43.214:1099, 
62.50.43.210:1099]
2006-06-21 08:14:35,021 INFO  
[org.jboss.ha.framework.server.DistributedReplicantManagerImpl.StagePartition] 
I am (62.50.43.211:1099) received membershipChanged event:
2006-06-21 08:14:35,021 INFO  
[org.jboss.ha.framework.server.DistributedReplicantManagerImpl.StagePartition] 
Dead members: 0 ([])
2006-06-21 08:14:35,021 INFO  
[org.jboss.ha.framework.server.DistributedReplicantManagerImpl.StagePartition] 
New Members : 1 ([62.50.43.210:1099])
2006-06-21 08:14:35,021 INFO  
[org.jboss.ha.framework.server.DistributedReplicantManagerImpl.StagePartition] 
All Members : 6 ([62.50.43.211:1099, 62.50.43.213:1099, 62.50.43.216:1099, 
62.50.43.215:1
099, 62.50.43.214:1099, 62.50.43.210:1099])

Node 210 was not receiving view 202, but view 203. After receiving view 203 
node 210 is aware 

[JBoss-user] [Clustering/JBoss] - Re: HA-JMS fails, Master node undeploying channels, no failo

2006-06-29 Thread jkressin
Thanks very much for your reply. I examined the logfiles again to answer your 
questions:

[EMAIL PROTECTED] wrote : 1) You refer to the master node.  Please confirm 
that this is 62.50.43.211.
  | 

No, at that time the master node was 62.50.43.210. The first logoutput and the 
second one are from this machine, means that the master node (62.50.43.210) 
produced the output Dead members:0, New members: 0 and immediately after that 
undeployed all the HA-Queues and HA-Topics. Sorry, I should have made that 
clear in my first post.

[EMAIL PROTECTED] wrote : 
  | 2) On the node that produced the first bit of logging in your post, do you 
see log entries with this content New cluster view for partition 
StagePartition: 202 and New cluster view for partition StagePartition: 201?
  | 

No, these messages are not present in the logfile.

[EMAIL PROTECTED] wrote : 
  | 3) If you have a log entry somewhere that contains New cluster view for 
partition StagePartition: 200, please compare the list of nodes to the first 
line in the first log entry in your post.  Does it have the same 6 nodes but in 
different order?
  | 

You are right, I can see the same nodes, but in different order

[EMAIL PROTECTED] wrote : 
  | What I'm driving at here is I wonder if the machine doing the first bit of 
logging lost a couple view changes, going from 200 to 203.  The result would be 
Dead members:0, New members: 0 but a different order of members.
  | 

Thanks, now I start to understand what is happening. You are right that the 
machine indeed lost some of the view changes, that's a problem I probably have 
to investigate on the network level. 

But the most intersting question for me is: Even if the (Master-)node lost some 
viewchanges,  why does it suddenly undeploy the (HA-)queues and  (HA-)topics? 
And why is the failover not happening, no other node is starting to deploy the 
queues and topics instead. I cannot explain how this is possible and also found 
no information in the docs or in the forums on this issue.

The critical thing is that if I run into this scenario my HA-Queues and 
HA-Topics are not present on any instance, leading to lost messages and 
therefore also lost data. This situation should not be possible at all in a 
cluster. I am not quite sure if this is a cluster issue (I guess so), so if it 
is something related to JMS please let me know so I can ask in JMS-Forum. 

BTW: This is the only real problem we have with the JBoss platform. Everything 
else is working fine and stable. Developing with JBoss really was a breeze, so 
thanks for this great piece of software. 

Thanks again for your help.

Jochen


View the original post : 
http://www.jboss.com/index.html?module=bbop=viewtopicp=3954296#3954296

Reply to the post : 
http://www.jboss.com/index.html?module=bbop=postingmode=replyp=3954296

Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
___
JBoss-user mailing list
JBoss-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/jboss-user


[JBoss-user] [Clustering/JBoss] - Re: HA-JMS fails, Master node undeploying channels, no failo

2006-06-29 Thread [EMAIL PROTECTED]
OK, things are a bit clearer.  Don't know the full answer yet but we're getting 
there.

jkressin wrote : 
  | But the most intersting question for me is: Even if the (Master-)node lost 
some viewchanges,  why does it suddenly undeploy the (HA-)queues and  
(HA-)topics?

They are undeployed because when view 203 came in, 65.20.43.211 was no longer 
the first node in the view, 62.50.43.211 was.  All HASingleton services 
(currently, we're looking to change this) run on the first member in the view 
on which they are deployed. If a node that is currently the singleton master 
for the service discovers its no longer that first node, it will stop providing 
the service.

jkressin wrote : And why is the failover not happening, no other node is 
starting to deploy the queues and topics instead. I cannot explain how this is 
possible and also found no information in the docs or in the forums on this 
issue.

This is the key question.  65.20.43.211 should have taken over as the HA-JMS 
server and deployed the queues and topics.  Is there anything interesting in 
the 65.20.43.211 logs that could shed light on why it didn't?

View the original post : 
http://www.jboss.com/index.html?module=bbop=viewtopicp=3954498#3954498

Reply to the post : 
http://www.jboss.com/index.html?module=bbop=postingmode=replyp=3954498

Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
___
JBoss-user mailing list
JBoss-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/jboss-user


[JBoss-user] [Clustering/JBoss] - Re: HA-JMS fails, Master node undeploying channels, no failo

2006-06-28 Thread [EMAIL PROTECTED]
1) You refer to the master node.  Please confirm that this is 62.50.43.211.

2) On the node that produced the first bit of logging in your post, do you see 
log entries with this content New cluster view for partition StagePartition: 
202 and New cluster view for partition StagePartition: 201?

3) If you have a log entry somewhere that contains New cluster view for 
partition StagePartition: 200, please compare the list of nodes to the first 
line in the first log entry in your post.  Does it have the same 6 nodes but in 
different order?

What I'm driving at here is I wonder if the machine doing the first bit of 
logging lost a couple view changes, going from 200 to 203.  The result would be 
Dead members:0, New members: 0 but a different order of members.

I'm not sure what that would mean if it were the case, but it's an avenue to 
explore.

View the original post : 
http://www.jboss.com/index.html?module=bbop=viewtopicp=3954206#3954206

Reply to the post : 
http://www.jboss.com/index.html?module=bbop=postingmode=replyp=3954206

Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
___
JBoss-user mailing list
JBoss-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/jboss-user


[JBoss-user] [Clustering/JBoss] - Re: HA-JMS doesn't work for me

2004-10-29 Thread sudkampf
Make sure your client is accessing a connection factory using the HAJNDI and not the 
local JNDI.  If your client is using the wrong port number for the local instead of 
the HA then you can get this exception when the name you are looking for is really i9n 
the HA and not the local. 

View the original post : 
http://www.jboss.org/index.html?module=bbop=viewtopicp=3853322#3853322

Reply to the post : 
http://www.jboss.org/index.html?module=bbop=postingmode=replyp=3853322


---
This SF.Net email is sponsored by:
Sybase ASE Linux Express Edition - download now for FREE
LinuxWorld Reader's Choice Award Winner for best database on Linux.
http://ads.osdn.com/?ad_id=5588alloc_id=12065op=click
___
JBoss-user mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/jboss-user


[JBoss-user] [Clustering/JBoss] - Re: HA JMS

2004-10-15 Thread janilsal
http://www.jboss.org/wiki/Wiki.jsp?page=JBossMQHA

View the original post : 
http://www.jboss.org/index.html?module=bbop=viewtopicp=3851559#3851559

Reply to the post : 
http://www.jboss.org/index.html?module=bbop=postingmode=replyp=3851559


---
This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
Use IT products in your business? Tell us what you think of them. Give us
Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
http://productguide.itmanagersjournal.com/guidepromo.tmpl
___
JBoss-user mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/jboss-user