[ https://issues.apache.org/jira/browse/GEODE-6423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Owen Nichols closed GEODE-6423. ------------------------------- > availability checks sometimes immediately initiate removal > ---------------------------------------------------------- > > Key: GEODE-6423 > URL: https://issues.apache.org/jira/browse/GEODE-6423 > Project: Geode > Issue Type: Bug > Components: membership > Reporter: Bruce Schuchardt > Assignee: Bruce Schuchardt > Priority: Major > Fix For: 1.9.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > If the network goes down the JGroupsMessenger service initiates suspect > processing when it tries to send messages. In 1.8 this seems to initiate > immediate removal of the suspect. > ioexception sending udp message initiates suspicion > suspect processing initiates a final check > the final check fails immediately (it's using a timed Socket.connect() which > fails immediately) > the member is declared dead > {noformat} > [info 2019/02/13 17:44:59.366 CST perf157-130-167-server1 <Geode Failure > Detection thread 3> tid=0xc2] received suspect message from myself for > 192.168.130.167(perf157-130-167-locator1:225065:locator)<ec><v0>:41000: > Unable to send messages to this member via JGroups > [info 2019/02/13 17:44:59.368 CST perf157-130-167-server1 <Geode Failure > Detection thread 4> tid=0xc3] Performing final check for suspect member > 192.168.130.167(perf157-130-167-locator1:225065:locator)<ec><v0>:41000 > reason=Unable to send messages to this member via JGroups > [info 2019/02/13 17:44:59.368 CST perf157-130-167-server1 <Geode Failure > Detection thread 5> tid=0xc4] Performing final check for suspect member > 192.168.130.167(perf157-130-167-worker1:225794)<v3>:16202 reason=Unable to > send messages to this member via JGroups > [info 2019/02/13 17:44:59.368 CST perf157-130-167-server1 <Geode Failure > Detection thread 4> tid=0xc3] Failure detection is now watching > 192.168.130.167(perf157-130-167-server1:225263)<v1>:16200 > [info 2019/02/13 17:44:59.368 CST perf157-130-167-server1 <Geode Failure > Detection thread 5> tid=0xc4] Failure detection is now watching > 192.168.130.167(perf157-130-167-locator1:225065:locator)<ec><v0>:41000 > [info 2019/02/13 17:44:59.368 CST perf157-130-167-server1 <Geode Failure > Detection thread 3> tid=0xc2] received suspect message from myself for > 192.168.130.167(perf157-130-167-server2:225522)<v2>:16201: Unable to send > messages to this member via JGroups > [info 2019/02/13 17:44:59.369 CST perf157-130-167-server1 <Geode Failure > Detection thread 6> tid=0xc5] Performing final check for suspect member > 192.168.130.167(perf157-130-167-server2:225522)<v2>:16201 reason=Unable to > send messages to this member via JGroups > [info 2019/02/13 17:44:59.369 CST perf157-130-167-server1 <Geode Failure > Detection thread 6> tid=0xc5] Failure detection is now watching > 192.168.130.167(perf157-130-167-server1:225263)<v1>:16200 > [info 2019/02/13 17:44:59.371 CST perf157-130-167-server1 <Geode Failure > Detection thread 5> tid=0xc4] Final check failed for member > 192.168.130.167(perf157-130-167-worker1:225794)<v3>:16202 > [info 2019/02/13 17:44:59.371 CST perf157-130-167-server1 <Geode Failure > Detection thread 5> tid=0xc4] Requesting removal of suspect member > 192.168.130.167(perf157-130-167-worker1:225794)<v3>:16202 > [info 2019/02/13 17:44:59.371 CST perf157-130-167-server1 <Geode Failure > Detection thread 4> tid=0xc3] Final check failed for member > 192.168.130.167(perf157-130-167-locator1:225065:locator)<ec><v0>:41000 > [info 2019/02/13 17:44:59.371 CST perf157-130-167-server1 <Geode Failure > Detection thread 4> tid=0xc3] Requesting removal of suspect member > 192.168.130.167(perf157-130-167-locator1:225065:locator)<ec><v0>:41000 > [info 2019/02/13 17:44:59.371 CST perf157-130-167-server1 <Geode Failure > Detection thread 4> tid=0xc3] This member is becoming the membership > coordinator with address > 192.168.130.167(perf157-130-167-server1:225263)<v1>:16200 > [info 2019/02/13 17:44:59.371 CST perf157-130-167-server1 <Geode Failure > Detection thread 6> tid=0xc5] Final check failed for member > 192.168.130.167(perf157-130-167-server2:225522)<v2>:16201 > [info 2019/02/13 17:44:59.373 CST perf157-130-167-server1 <Geode Failure > Detection thread 6> tid=0xc5] Requesting removal of suspect member > 192.168.130.167(perf157-130-167-server2:225522)<v2>:16201 > [info 2019/02/13 17:44:59.376 CST perf157-130-167-server1 <Geode Failure > Detection thread 4> tid=0xc3] ViewCreator starting > on:192.168.130.167(perf157-130-167-server1:225263)<v1>:16200 > [info 2019/02/13 17:44:59.376 CST perf157-130-167-server1 <Geode Membership > View Creator> tid=0xc6] View Creator thread is starting > [info 2019/02/13 17:44:59.377 CST perf157-130-167-server1 <Geode Membership > View Creator> tid=0xc6] > 192.168.130.167(perf157-130-167-locator1:225065:locator)<ec><v0>:41000 had a > weight of 3 > [info 2019/02/13 17:44:59.377 CST perf157-130-167-server1 <Geode Membership > View Creator> tid=0xc6] > 192.168.130.167(perf157-130-167-worker1:225794)<v3>:16202 had a weight of 10 > [info 2019/02/13 17:44:59.377 CST perf157-130-167-server1 <Geode Membership > View Creator> tid=0xc6] preparing new view > View[192.168.130.167(perf157-130-167-server1:225263)<v1>:16200|10] members: > [192.168.130.167(perf157-130-167-server1:225263)<v1>:16200{lead}, > 192.168.130.167(perf157-130-167-server2:225522)<v2>:16201] crashed: > [192.168.130.167(perf157-130-167-locator1:225065:locator)<ec><v0>:41000, > 192.168.130.167(perf157-130-167-worker1:225794)<v3>:16202] > [info 2019/02/13 17:45:03.627 CST perf157-130-167-server1 <unicast > receiver,perf157-130-167-62066> tid=0x21] received suspect message from > 192.168.130.167(perf157-130-167-worker1:225794)<v3>:16202 for > 192.168.130.167(perf157-130-167-locator1:225065:locator)<ec><v0>:41000: > Unable to send messages to this member via JGroups > [info 2019/02/13 17:45:03.718 CST perf157-130-167-server1 <unicast > receiver,perf157-130-167-62066> tid=0x21] Membership received a request to > remove 192.168.130.167(perf157-130-167-server1:225263)<v1>:16200 from > 192.168.130.167(perf157-130-167-locator1:225065:locator)<ec><v0>:41000 > reason=Unable to send messages to this member via JGroups > [severe 2019/02/13 17:45:03.719 CST perf157-130-167-server1 <unicast > receiver,perf157-130-167-62066> tid=0x21] Membership service failure: Unable > to send messages to this member via JGroups > org.apache.geode.ForcedDisconnectException: Unable to send messages to this > member via JGroups > {noformat} > > We expect the final check to respect the member-timeout setting. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)