[ http://jira.jboss.com/jira/browse/JGRP-39?page=comments#action_12315514 ]
     
Bela Ban commented on JGRP-39:
------------------------------

does that occur with FD_SOCK as well ?

> A TCP stack does not correctly detect failure (pulled cable) for certain 
> TCPPING configurations
> -----------------------------------------------------------------------------------------------
>
>          Key: JGRP-39
>          URL: http://jira.jboss.com/jira/browse/JGRP-39
>      Project: JGroups
>         Type: Bug
>     Versions: 2.2.9
>     Reporter: Ovidiu Feodorov
>     Assignee: Ovidiu Feodorov
>      Fix For: 2.2.8

>
>
> Physical hosts "A" (192.168.1.1, coordinator) and "B" (192.168.1.2) run 
> JGroups processes configured with TCP/TCPPING stacks.
> "A" stack configuration:
> TCP(bind_addr=192.168.1.1;start_port=11800;loopback=true):
> TCPPING(initial_hosts=192.168.1.2[11800];port_range=3;timeout=3500;num_initial_members=3;up_thread=true;down_thread=true):
> MERGE2(min_interval=5000;max_interval=10000):
> FD(shun=true;timeout=1500;max_tries=3;up_thread=true;down_thread=true):
> VERIFY_SUSPECT(timeout=1500;down_thread=false;up_thread=false):
> pbcast.NAKACK(down_thread=true;up_thread=true;gc_lag=100;retransmit_timeout=3000):
> pbcast.STABLE(desired_avg_gossip=20000;down_thread=false;up_thread=false):
> pbcast.GMS(join_timeout=5000;join_retry_timeout=2000;shun=false;print_local_addr=false;down_thread=true;up_thread=true)
> "B" stack configuration:
> TCP(bind_addr=192.168.1.2;start_port=11800;loopback=true):
> TCPPING(initial_hosts=192.168.1.1[11800];port_range=3;timeout=3500;num_initial_members=3;up_thread=true;down_thread=true):
> MERGE2(min_interval=5000;max_interval=10000):
> FD(shun=true;timeout=1500;max_tries=3;up_thread=true;down_thread=true):
> VERIFY_SUSPECT(timeout=1500;down_thread=false;up_thread=false):
> pbcast.NAKACK(down_thread=true;up_thread=true;gc_lag=100;retransmit_timeout=3000):
> pbcast.STABLE(desired_avg_gossip=20000;down_thread=false;up_thread=false):
> pbcast.GMS(join_timeout=5000;join_retry_timeout=2000;shun=false;print_local_addr=false;down_thread=true;up_thread=true)
> If I pull the cable under B, the B stack immediately and correctly 
> indentifies A as suspect and installs a new view containing itself only.
> However, A does not recognizes B as suspect and undeterministically spews out 
> various info and warning messages. The view (A, B) stays incorrectly "valid" 
> for a long time; sometimes gets replaced by (A), sometimes not.
> I tracked down the cause of the problem down to the A TCPPING configuration 
> and  TCP queue . If A's TCPPING is configured with a port_range=1, the 
> problem goes away and the new view immediately installs into the A stack. It 
> seems that if there are messages in the TCP queue except the SUSPECT message 
> generated by FD, they mess up things and the SUSPECT message gets stuck in 
> the queue, with undeterministic results.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://jira.jboss.com/jira/secure/Administrators.jspa
-
If you want more information on JIRA, or have a bug to report see:
   http://www.atlassian.com/software/jira



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
JBoss-Development mailing list
JBoss-Development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/jboss-development

Reply via email to