Hi,

Just to figure out the problem, i've tried to put fresh copies of jboss-4.2.2 
on 3 test servers on the same network with same cluster configurations but find 
the same issue that when third one joins the cluster it is very slow, but i've 
got some WARN messages in logs of the other two servers. Here is what i tried:

- Started Server A (bind_addr = 10.100.54.14)
- Started Server B (bind_addr = 10.100.54.135).. Joins the cluster and 
Everything looks fine
- Started Server C(bind_addr = 10.100.54.12) .. It does join the cluster but is 
very slow

Here are the logs on C, it stucks for long time here (posting relevant portion 
only):


  | -------------------------------------------------------
  | GMS: address is 10.100.54.12:34566
  | -------------------------------------------------------
  | 16:22:35,096 WARN  [GMS] join(10.100.54.12:34566) sent to 
10.100.54.14:40469 timed out, retrying
  | 16:22:39,129 INFO  [TreeCache] viewAccepted(): [10.100.54.14:40469|2] 
[10.100.54.14:40469, 10.100.54.135:45846, 10.100.54.12:34566]
  | 16:22:42,160 ERROR [FD_SOCK] received null cache; retrying
  | 16:22:45,668 ERROR [FD_SOCK] received null cache; retrying
  | 16:22:49,176 ERROR [FD_SOCK] received null cache; retrying
  | 16:22:49,686 INFO  [TreeCache] TreeCache local address is 10.100.54.12:34566
  | 16:22:49,699 INFO  [TreeCache] received the state (size=1024 bytes)
  | 16:22:49,740 INFO  [TreeCache] state was retrieved successfully (in 54 
milliseconds)
  | 16:22:49,740 INFO  [TreeCache] parseConfig(): PojoCacheConfig is empty
  | 16:22:49,949 INFO  [STDOUT] no object for null
  | 16:22:49,958 INFO  [STDOUT] no object for null
  | 16:22:50,011 INFO  [STDOUT] no object for null
  | 16:22:50,053 INFO  [STDOUT] no object for 
{urn:jboss:bean-deployer}supplyType
  | 16:22:50,075 INFO  [STDOUT] no object for 
{urn:jboss:bean-deployer}dependsType
  | 16:22:53,624 INFO  [NativeServerConfig] JBoss Web Services - Native
  | 16:22:53,624 INFO  [NativeServerConfig] jbossws-native-2.0.1.SP2 
(build=200710210837)
  | 16:22:54,978 INFO  [SnmpAgentService] SNMP agent going active
  | 16:22:55,627 INFO  [DefaultPartition] Initializing
  | 16:22:55,714 INFO  [STDOUT]
  | -------------------------------------------------------
  | GMS: address is 10.100.54.12:34571
  | -------------------------------------------------------
  | 16:23:02,800 ERROR [FD_SOCK] received null cache; retrying
  | 16:23:06,308 ERROR [FD_SOCK] received null cache; retrying
  | 16:23:09,816 ERROR [FD_SOCK] received null cache; retrying
  | 16:23:10,323 INFO  [DefaultPartition] Number of cluster members: 3
  | 16:23:10,323 INFO  [DefaultPartition] Other members: 2
  | 16:23:10,323 INFO  [DefaultPartition] Fetching state (will wait for 30000 
milliseconds):
  | 16:23:10,374 INFO  [DefaultPartition] state was retrieved successfully (in 
50 milliseconds)
  | 16:24:10,483 INFO  [HANamingService] Started ha-jndi bootstrap 
jnpPort=1100, backlog=50, bindAddress=/0.0.0.0
  | 16:24:10,497 INFO  [DetachedHANamingService$AutomaticDiscovery] Listening 
on /0.0.0.0:1102, group=230.0.0.4, HA-JNDI address=10.100.54.12:1100
  | 

I can see these warnings on Server A's Logs:

  | 16:22:32,150 INFO  [TreeCache] viewAccepted(): [10.100.54.14:40469|2] 
[10.100.54.14:40469, 10.100.54.135:45846, 10.100.54.12:34566]
  | 16:22:37,157 WARN  [GMS] failed to collect all ACKs (2) for view 
[10.100.54.14:40469|2] [10.100.54.14:40469, 10.100.54.135:45846, 
10.100.54.12:34566] after 5000ms, missing ACKs from [10.100.54.135:45846] 
(received=[10.100.54.14:40469]), local_addr=10.100.54.14:40469
  | 16:22:39,106 WARN  [GMS] 10.100.54.12:34566 already present; returning 
existing view [10.100.54.14:40469|2] [10.100.54.14:40469, 10.100.54.135:45846, 
10.100.54.12:34566]
  | 16:22:49,694 INFO  [TreeCache] locking the subtree at / to transfer state
  | 16:22:49,694 INFO  [StateTransferGenerator_140] returning the state for 
tree rooted in /(1024 bytes)
  | 16:22:57,784 INFO  [DefaultPartition] New cluster view for partition 
DefaultPartition (id: 2, delta: 1) : [10.100.54.14:1099, 10.100.54.135:1099, 
10.100.54.12:1099]
  | 16:22:57,784 INFO  [DefaultPartition] I am (10.100.54.14:1099) received 
membershipChanged event:
  | 16:22:57,784 INFO  [DefaultPartition] Dead members: 0 ([])
  | 16:22:57,784 INFO  [DefaultPartition] New Members : 1 ([10.100.54.12:1099])
  | 16:22:57,784 INFO  [DefaultPartition] All Members : 3 ([10.100.54.14:1099, 
10.100.54.135:1099, 10.100.54.12:1099])
  | 16:22:59,790 WARN  [GMS] failed to collect all ACKs (2) for view 
[10.100.54.14:40472|2] [10.100.54.14:40472, 10.100.54.135:45849, 
10.100.54.12:34571] after 2000ms, missing ACKs from [10.100.54.135:45849] 
(received=[10.100.54.14:40472]), local_addr=10.100.54.14:40472
  | 16:26:13,214 INFO  [TreeCache] viewAccepted(): [10.100.54.14:40474|1] 
[10.100.54.14:40474, 10.100.54.12:34573]
  | 16:26:26,091 INFO  [TreeCache] viewAccepted(): [10.100.54.14:40476|1] 
[10.100.54.14:40476, 10.100.54.12:34575]
  | 

and warnings on Server B:

  | 16:22:40,007 WARN  [NAKACK] 10.100.54.135:45846] discarded message from 
non-member 10.100.54.12:34566, my view is [10.100.54.14:40469|1] 
[10.100.54.14:40469, 10.100.54.135:45846]
  | 16:23:05,714 WARN  [NAKACK] 10.100.54.135:45849] discarded message from 
non-member 10.100.54.12:34571, my view is [10.100.54.14:40472|1] 
[10.100.54.14:40472, 10.100.54.135:45849]
  | 16:23:10,452 WARN  [NAKACK] 10.100.54.135:45849] discarded message from 
non-member 10.100.54.12:34571, my view is [10.100.54.14:40472|1] 
[10.100.54.14:40472, 10.100.54.135:45849]
  | 16:24:10,502 WARN  [NAKACK] 10.100.54.135:45849] discarded message from 
non-member 10.100.54.12:34571, my view is [10.100.54.14:40472|1] 
[10.100.54.14:40472, 10.100.54.135:45849]
  | 16:24:45,147 WARN  [NAKACK] 10.100.54.135:45846] discarded message from 
non-member 10.100.54.12:34566, my view is [10.100.54.14:40469|1] 
[10.100.54.14:40469, 10.100.54.135:45846]
  | 16:25:09,831 WARN  [NAKACK] 10.100.54.135:45849] discarded message from 
non-member 10.100.54.12:34571, my view is [10.100.54.14:40472|1] 
[10.100.54.14:40472, 10.100.54.135:45849]
  | 16:25:10,504 WARN  [NAKACK] 10.100.54.135:45849] discarded message from 
non-member 10.100.54.12:34571, my view is [10.100.54.14:40472|1] 
[10.100.54.14:40472, 10.100.54.135:45849]
  | 

Please note that another cluster is already running on the same network with 5 
servers in it and it works fine. and i am looking to run both of these clusters 
in parallel.

Any clue?

View the original post : 
http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4173128#4173128

Reply to the post : 
http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&p=4173128
_______________________________________________
jboss-user mailing list
jboss-user@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/jboss-user

Reply via email to