In our production environment all hosts have duplicated network links. It is 
intended to protect from single link failure. Does anyone have any example / 
best practices how to configure JGroups for proper work in such environment? 
(So that JGroups works fine despite a single link failure).

We made some prototyping but it failed - details below. 

Thank you in advance.
Kind regards 
Mariusz

Version: JBossCache 1.4.1 SP3, JGroups 2.4.1

Environment: a LAN consisting of two hosts, each host with two NICs (eth0, 
eth1), the hosts connected directly (eth0-to-eth0, eth1-to-eth1), configured as 
single IPv4 subnet. JGroups was intended to communicate on both interfaces and 
to use multicast (see Configuration below)

Test description: 
- both links are connected
- on each node started one instance of JBossCache
- replication working correctly
- disconnected link eth1-to-eth1
- replication working correctly
- reconnected link eth1-to-eth1, disconnected link eth0-to-eth0
- replication working correctly
! after a time (around 5sec) both instances communicate an exception (see 
below) to one another and break because the exception is not caught

I don't know if it is enough to simply catch the exception. From the top-level 
I can see that JGroups/JBossCache does have some problem with this 
configuration.

Configuration details:
                <UDP mcast_addr="228.8.8.8" mcast_port="45566"
                    ip_ttl="64" ip_mcast="true" 
                    mcast_send_buf_size="150000" mcast_recv_buf_size="80000"
                    ucast_send_buf_size="150000" ucast_recv_buf_size="80000"
                    loopback="false"
                    receive_on_all_interfaces="true"
                    send_on_all_interfaces="true"
                    receive_interfaces="eth0,eth1"
                    send_interfaces="eth0,eth1"/>
                <PING timeout="2000" num_initial_members="3"
                    up_thread="false" down_thread="false"/>
                <MERGE2 min_interval="10000" max_interval="20000"/>
                <!--        <FD shun="true" up_thread="true" own_thread="true" 
/>-->
                <FD_SOCK/>
                <VERIFY_SUSPECT timeout="1500" up_thread="false" 
down_thread="false"/>
                <pbcast.NAKACK gc_lag="50" 
retransmit_timeout="600,1200,2400,4800" 
max_xmit_size="8192" up_thread="false" down_thread="false"/>
                <UNICAST timeout="600,1200,2400" window_size="100" 
min_threshold="10"
                    down_thread="false"/>
                <pbcast.STABLE desired_avg_gossip="20000"
                    up_thread="false" down_thread="false"/>
                <pbcast.GMS join_timeout="5000" join_retry_timeout="2000"
                    shun="true" print_local_addr="true"/>
                <FC max_credits="2000000" down_thread="false" up_thread="false"
                    min_threshold="0.20"/>
                <FRAG frag_size="8192" down_thread="false" up_thread="true"/>
                <pbcast.STATE_TRANSFER up_thread="true" down_thread="true"/>

Logs with exception:
[2007-08-30 15:20:29,796|DEBUG|main; 
|org.jgroups.blocks.GroupRequest(execute:195)]: call did not execute correctly, 
request is [GroupRequest:
req_id=1188480009786
caller=10.10.0.2:32781
10.10.0.1:32781: sender=10.10.0.1:32781, retval=null, received=false, 
suspected=false

request_msg: [dst: , src: 10.10.0.2:32781 (3 headers), size = 34 bytes]
rsp_mode: GET_ALL
done: false
timeout: 20000
expected_mbrs: 0
]
[2007-08-30 15:20:29,796|DEBUG|main; 
|org.jgroups.blocks.RpcDispatcher(callRemoteMethods:193)]: responses: 
[sender=10.10.0.1:32781, retval=null, received=false, suspected=false]

[2007-08-30 15:20:29,797|DEBUG|main; 
|org.jboss.cache.TreeCache(callRemoteMethods:4405)]: (10.10.0.2:32781): 
responses for method _replicate:
[sender=10.10.0.1:32781, retval=null, received=false, suspected=false]

[2007-08-30 15:20:29,798|DEBUG|main; 
|org.jboss.cache.interceptors.BaseRpcInterceptor(replicateCall:118)]: 
responses=[org.jboss.cache.ReplicationException: rsp=sender=10.10.0.1:32781, 
retval=null, received=false, suspected=false]
[2007-08-30 15:20:29,800|DEBUG|main; 
|org.jboss.cache.interceptors.BaseRpcInterceptor(checkResponses:79)]: Received 
Throwable from remote node
org.jboss.cache.ReplicationException: rsp=sender=10.10.0.1:32781, retval=null, 
received=false, suspected=false
        at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:4422)
        at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:4344)
        at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:4455)
        at 
org.jboss.cache.interceptors.BaseRpcInterceptor.replicateCall(BaseRpcInterceptor.java:110)
        at 
org.jboss.cache.interceptors.BaseRpcInterceptor.replicateCall(BaseRpcInterceptor.java:88)
        at 
org.jboss.cache.interceptors.ReplicationInterceptor.handleReplicatedMethod(ReplicationInterceptor.java:124)
        at 
org.jboss.cache.interceptors.ReplicationInterceptor.invoke(ReplicationInterceptor.java:88)
        at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:68)
        at 
org.jboss.cache.interceptors.TxInterceptor.handleNonTxMethod(TxInterceptor.java:365)
        at 
org.jboss.cache.interceptors.TxInterceptor.invoke(TxInterceptor.java:160)
        at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:68)
        at 
org.jboss.cache.interceptors.CacheMgmtInterceptor.invoke(CacheMgmtInterceptor.java:183)
        at org.jboss.cache.TreeCache.invokeMethod(TreeCache.java:5863)
        at org.jboss.cache.TreeCache.remove(TreeCache.java:3929)
        at org.jboss.cache.TreeCache.remove(TreeCache.java:3915)
        at test.jbcache.DistributedTree.remove(DistributedTree.java:41)
        at test.jbcache.DistributedTest.handleSession(DistributedTest.java:46)
        at test.jbcache.DistributedTest.main(DistributedTest.java:78)
Caused by: org.jboss.cache.lock.TimeoutException: Response timed out: 
sender=10.10.0.1:32781, retval=null, received=false, suspected=false
        at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:4420)
        ... 17 more

View the original post : 
http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4079888#4079888

Reply to the post : 
http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&p=4079888
_______________________________________________
jboss-user mailing list
jboss-user@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/jboss-user

Reply via email to