In our production environment all hosts have duplicated network links. It is intended to protect from single link failure. Does anyone have any example / best practices how to configure JGroups for proper work in such environment? (So that JGroups works fine despite a single link failure).
We made some prototyping but it failed - details below. Thank you in advance. Kind regards Mariusz Version: JBossCache 1.4.1 SP3, JGroups 2.4.1 Environment: a LAN consisting of two hosts, each host with two NICs (eth0, eth1), the hosts connected directly (eth0-to-eth0, eth1-to-eth1), configured as single IPv4 subnet. JGroups was intended to communicate on both interfaces and to use multicast (see Configuration below) Test description: - both links are connected - on each node started one instance of JBossCache - replication working correctly - disconnected link eth1-to-eth1 - replication working correctly - reconnected link eth1-to-eth1, disconnected link eth0-to-eth0 - replication working correctly ! after a time (around 5sec) both instances communicate an exception (see below) to one another and break because the exception is not caught I don't know if it is enough to simply catch the exception. From the top-level I can see that JGroups/JBossCache does have some problem with this configuration. Configuration details: <UDP mcast_addr="228.8.8.8" mcast_port="45566" ip_ttl="64" ip_mcast="true" mcast_send_buf_size="150000" mcast_recv_buf_size="80000" ucast_send_buf_size="150000" ucast_recv_buf_size="80000" loopback="false" receive_on_all_interfaces="true" send_on_all_interfaces="true" receive_interfaces="eth0,eth1" send_interfaces="eth0,eth1"/> <PING timeout="2000" num_initial_members="3" up_thread="false" down_thread="false"/> <MERGE2 min_interval="10000" max_interval="20000"/> <!-- <FD shun="true" up_thread="true" own_thread="true" />--> <FD_SOCK/> <VERIFY_SUSPECT timeout="1500" up_thread="false" down_thread="false"/> <pbcast.NAKACK gc_lag="50" retransmit_timeout="600,1200,2400,4800" max_xmit_size="8192" up_thread="false" down_thread="false"/> <UNICAST timeout="600,1200,2400" window_size="100" min_threshold="10" down_thread="false"/> <pbcast.STABLE desired_avg_gossip="20000" up_thread="false" down_thread="false"/> <pbcast.GMS join_timeout="5000" join_retry_timeout="2000" shun="true" print_local_addr="true"/> <FC max_credits="2000000" down_thread="false" up_thread="false" min_threshold="0.20"/> <FRAG frag_size="8192" down_thread="false" up_thread="true"/> <pbcast.STATE_TRANSFER up_thread="true" down_thread="true"/> Logs with exception: [2007-08-30 15:20:29,796|DEBUG|main; |org.jgroups.blocks.GroupRequest(execute:195)]: call did not execute correctly, request is [GroupRequest: req_id=1188480009786 caller=10.10.0.2:32781 10.10.0.1:32781: sender=10.10.0.1:32781, retval=null, received=false, suspected=false request_msg: [dst: , src: 10.10.0.2:32781 (3 headers), size = 34 bytes] rsp_mode: GET_ALL done: false timeout: 20000 expected_mbrs: 0 ] [2007-08-30 15:20:29,796|DEBUG|main; |org.jgroups.blocks.RpcDispatcher(callRemoteMethods:193)]: responses: [sender=10.10.0.1:32781, retval=null, received=false, suspected=false] [2007-08-30 15:20:29,797|DEBUG|main; |org.jboss.cache.TreeCache(callRemoteMethods:4405)]: (10.10.0.2:32781): responses for method _replicate: [sender=10.10.0.1:32781, retval=null, received=false, suspected=false] [2007-08-30 15:20:29,798|DEBUG|main; |org.jboss.cache.interceptors.BaseRpcInterceptor(replicateCall:118)]: responses=[org.jboss.cache.ReplicationException: rsp=sender=10.10.0.1:32781, retval=null, received=false, suspected=false] [2007-08-30 15:20:29,800|DEBUG|main; |org.jboss.cache.interceptors.BaseRpcInterceptor(checkResponses:79)]: Received Throwable from remote node org.jboss.cache.ReplicationException: rsp=sender=10.10.0.1:32781, retval=null, received=false, suspected=false at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:4422) at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:4344) at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:4455) at org.jboss.cache.interceptors.BaseRpcInterceptor.replicateCall(BaseRpcInterceptor.java:110) at org.jboss.cache.interceptors.BaseRpcInterceptor.replicateCall(BaseRpcInterceptor.java:88) at org.jboss.cache.interceptors.ReplicationInterceptor.handleReplicatedMethod(ReplicationInterceptor.java:124) at org.jboss.cache.interceptors.ReplicationInterceptor.invoke(ReplicationInterceptor.java:88) at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:68) at org.jboss.cache.interceptors.TxInterceptor.handleNonTxMethod(TxInterceptor.java:365) at org.jboss.cache.interceptors.TxInterceptor.invoke(TxInterceptor.java:160) at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:68) at org.jboss.cache.interceptors.CacheMgmtInterceptor.invoke(CacheMgmtInterceptor.java:183) at org.jboss.cache.TreeCache.invokeMethod(TreeCache.java:5863) at org.jboss.cache.TreeCache.remove(TreeCache.java:3929) at org.jboss.cache.TreeCache.remove(TreeCache.java:3915) at test.jbcache.DistributedTree.remove(DistributedTree.java:41) at test.jbcache.DistributedTest.handleSession(DistributedTest.java:46) at test.jbcache.DistributedTest.main(DistributedTest.java:78) Caused by: org.jboss.cache.lock.TimeoutException: Response timed out: sender=10.10.0.1:32781, retval=null, received=false, suspected=false at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:4420) ... 17 more View the original post : http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4079888#4079888 Reply to the post : http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&p=4079888 _______________________________________________ jboss-user mailing list jboss-user@lists.jboss.org https://lists.jboss.org/mailman/listinfo/jboss-user