[ 
https://issues.apache.org/jira/browse/GEODE-6423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16821335#comment-16821335
 ] 

ASF subversion and git services commented on GEODE-6423:
--------------------------------------------------------

Commit 0fea07ad0eb4cc23220e482967c2734f8835e982 in geode's branch 
refs/heads/release/1.9.0 from Bruce Schuchardt
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=0fea07a ]

GEODE-6423 availability checks sometimes immediately initiate removal

Do not loop in trying to form a tcp/ip connection to a suspect unless
the next step is to remove the suspect from membership.  In this case
there will be another invocation of the same method that will take the
removal step next.

(cherry picked from commit 2e0a893f0587bdcec560960a6b283b5465d5897f)


> availability checks sometimes immediately initiate removal
> ----------------------------------------------------------
>
>                 Key: GEODE-6423
>                 URL: https://issues.apache.org/jira/browse/GEODE-6423
>             Project: Geode
>          Issue Type: Bug
>          Components: membership
>            Reporter: Bruce Schuchardt
>            Assignee: Bruce Schuchardt
>            Priority: Major
>             Fix For: 1.10.0
>
>          Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> If the network goes down the JGroupsMessenger service initiates suspect 
> processing when it tries to send messages.  In 1.8 this seems to initiate 
> immediate removal of the suspect.
> ioexception sending udp message initiates suspicion
> suspect processing initiates a final check
> the final check fails immediately (it's using a timed Socket.connect() which 
> fails immediately)
> the member is declared dead
> {noformat}
> [info 2019/02/13 17:44:59.366 CST perf157-130-167-server1 <Geode Failure 
> Detection thread 3> tid=0xc2] received suspect message from myself for 
> 192.168.130.167(perf157-130-167-locator1:225065:locator)<ec><v0>:41000: 
> Unable to send messages to this member via JGroups
> [info 2019/02/13 17:44:59.368 CST perf157-130-167-server1 <Geode Failure 
> Detection thread 4> tid=0xc3] Performing final check for suspect member 
> 192.168.130.167(perf157-130-167-locator1:225065:locator)<ec><v0>:41000 
> reason=Unable to send messages to this member via JGroups
> [info 2019/02/13 17:44:59.368 CST perf157-130-167-server1 <Geode Failure 
> Detection thread 5> tid=0xc4] Performing final check for suspect member 
> 192.168.130.167(perf157-130-167-worker1:225794)<v3>:16202 reason=Unable to 
> send messages to this member via JGroups
> [info 2019/02/13 17:44:59.368 CST perf157-130-167-server1 <Geode Failure 
> Detection thread 4> tid=0xc3] Failure detection is now watching 
> 192.168.130.167(perf157-130-167-server1:225263)<v1>:16200
> [info 2019/02/13 17:44:59.368 CST perf157-130-167-server1 <Geode Failure 
> Detection thread 5> tid=0xc4] Failure detection is now watching 
> 192.168.130.167(perf157-130-167-locator1:225065:locator)<ec><v0>:41000
> [info 2019/02/13 17:44:59.368 CST perf157-130-167-server1 <Geode Failure 
> Detection thread 3> tid=0xc2] received suspect message from myself for 
> 192.168.130.167(perf157-130-167-server2:225522)<v2>:16201: Unable to send 
> messages to this member via JGroups
> [info 2019/02/13 17:44:59.369 CST perf157-130-167-server1 <Geode Failure 
> Detection thread 6> tid=0xc5] Performing final check for suspect member 
> 192.168.130.167(perf157-130-167-server2:225522)<v2>:16201 reason=Unable to 
> send messages to this member via JGroups
> [info 2019/02/13 17:44:59.369 CST perf157-130-167-server1 <Geode Failure 
> Detection thread 6> tid=0xc5] Failure detection is now watching 
> 192.168.130.167(perf157-130-167-server1:225263)<v1>:16200
> [info 2019/02/13 17:44:59.371 CST perf157-130-167-server1 <Geode Failure 
> Detection thread 5> tid=0xc4] Final check failed for member 
> 192.168.130.167(perf157-130-167-worker1:225794)<v3>:16202
> [info 2019/02/13 17:44:59.371 CST perf157-130-167-server1 <Geode Failure 
> Detection thread 5> tid=0xc4] Requesting removal of suspect member 
> 192.168.130.167(perf157-130-167-worker1:225794)<v3>:16202
> [info 2019/02/13 17:44:59.371 CST perf157-130-167-server1 <Geode Failure 
> Detection thread 4> tid=0xc3] Final check failed for member 
> 192.168.130.167(perf157-130-167-locator1:225065:locator)<ec><v0>:41000
> [info 2019/02/13 17:44:59.371 CST perf157-130-167-server1 <Geode Failure 
> Detection thread 4> tid=0xc3] Requesting removal of suspect member 
> 192.168.130.167(perf157-130-167-locator1:225065:locator)<ec><v0>:41000
> [info 2019/02/13 17:44:59.371 CST perf157-130-167-server1 <Geode Failure 
> Detection thread 4> tid=0xc3] This member is becoming the membership 
> coordinator with address 
> 192.168.130.167(perf157-130-167-server1:225263)<v1>:16200
> [info 2019/02/13 17:44:59.371 CST perf157-130-167-server1 <Geode Failure 
> Detection thread 6> tid=0xc5] Final check failed for member 
> 192.168.130.167(perf157-130-167-server2:225522)<v2>:16201
> [info 2019/02/13 17:44:59.373 CST perf157-130-167-server1 <Geode Failure 
> Detection thread 6> tid=0xc5] Requesting removal of suspect member 
> 192.168.130.167(perf157-130-167-server2:225522)<v2>:16201
> [info 2019/02/13 17:44:59.376 CST perf157-130-167-server1 <Geode Failure 
> Detection thread 4> tid=0xc3] ViewCreator starting 
> on:192.168.130.167(perf157-130-167-server1:225263)<v1>:16200
> [info 2019/02/13 17:44:59.376 CST perf157-130-167-server1 <Geode Membership 
> View Creator> tid=0xc6] View Creator thread is starting
> [info 2019/02/13 17:44:59.377 CST perf157-130-167-server1 <Geode Membership 
> View Creator> tid=0xc6] 
> 192.168.130.167(perf157-130-167-locator1:225065:locator)<ec><v0>:41000 had a 
> weight of 3
> [info 2019/02/13 17:44:59.377 CST perf157-130-167-server1 <Geode Membership 
> View Creator> tid=0xc6] 
> 192.168.130.167(perf157-130-167-worker1:225794)<v3>:16202 had a weight of 10
> [info 2019/02/13 17:44:59.377 CST perf157-130-167-server1 <Geode Membership 
> View Creator> tid=0xc6] preparing new view 
> View[192.168.130.167(perf157-130-167-server1:225263)<v1>:16200|10] members: 
> [192.168.130.167(perf157-130-167-server1:225263)<v1>:16200{lead}, 
> 192.168.130.167(perf157-130-167-server2:225522)<v2>:16201] crashed: 
> [192.168.130.167(perf157-130-167-locator1:225065:locator)<ec><v0>:41000, 
> 192.168.130.167(perf157-130-167-worker1:225794)<v3>:16202]
> [info 2019/02/13 17:45:03.627 CST perf157-130-167-server1 <unicast 
> receiver,perf157-130-167-62066> tid=0x21] received suspect message from 
> 192.168.130.167(perf157-130-167-worker1:225794)<v3>:16202 for 
> 192.168.130.167(perf157-130-167-locator1:225065:locator)<ec><v0>:41000: 
> Unable to send messages to this member via JGroups
> [info 2019/02/13 17:45:03.718 CST perf157-130-167-server1 <unicast 
> receiver,perf157-130-167-62066> tid=0x21] Membership received a request to 
> remove 192.168.130.167(perf157-130-167-server1:225263)<v1>:16200 from 
> 192.168.130.167(perf157-130-167-locator1:225065:locator)<ec><v0>:41000 
> reason=Unable to send messages to this member via JGroups
> [severe 2019/02/13 17:45:03.719 CST perf157-130-167-server1 <unicast 
> receiver,perf157-130-167-62066> tid=0x21] Membership service failure: Unable 
> to send messages to this member via JGroups
> org.apache.geode.ForcedDisconnectException: Unable to send messages to this 
> member via JGroups
> {noformat}
>  
> We expect the final check to respect the member-timeout setting.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to