[jira] [Commented] (GEODE-9531) CI Failure: TxCommitMessageBCClientToServerTxPartitionTest fails with ForcedDisconnectException

2021-10-26 Thread Dale Emery (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17434502#comment-17434502
 ] 

Dale Emery commented on GEODE-9531:
---

I was curious about the warnings from the stat sampling thread, so I checked a 
bunch of runs with failures. Eight of those failure runs had swarms of those 
warnings. By "swarm" I mean that multiple tests issued that warning at about 
the same time (within a second or two). In all eight of those runs, the 
following tests were executing at the time of the first warning:
 # org.apache.geode.security.ClientAuthorizationCQDUnitTest 
testAllOpsWithFailover2
 # org.apache.geode.management.GfshRebalanceCommandCompatibilityTest 
whenCurrentVersionLocatorsExecuteRebalanceOnOldServersThenItMustSucceed
 # org.apache.geode.management.ConfigurationCompatibilityTest 
whenConfigurationIsExchangedBetweenMixedVersionLocatorsThenItShouldNotThrowExceptions
 # 
org.apache.geode.cache.wan.WANRollingUpgradeSecondaryEventsNotReprocessedAfterOldSiteMemberFailover
 testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover
 # 
org.apache.geode.cache.wan.WANRollingUpgradeSecondaryEventsNotReprocessedAfterCurrentSiteMemberFailoverWithOldClient
 testSecondaryEventsNotReprocessedAfterCurrentSiteMemberFailoverWithOldClient
 # 
org.apache.geode.cache.wan.WANRollingUpgradeEventProcessingOldSiteOneCurrentSiteTwo
 testEventProcessingOldSiteOneCurrentSiteTwo
 # 
org.apache.geode.cache.wan.WANRollingUpgradeEventProcessingMixedSiteOneOldSiteTwo
 EventProcessingMixedSiteOneOldSiteTwo
 # 
org.apache.geode.cache.wan.WANRollingUpgradeEventProcessingMixedSiteOneCurrentSiteTwo
 EventProcessingMixedSiteOneCurrentSiteTwo
 # 
org.apache.geode.cache.wan.WANRollingUpgradeCreateGatewaySenderMixedSiteOneCurrentSiteTwo
 CreateGatewaySenderMixedSiteOneCurrentSiteTwo
 # 
org.apache.geode.cache.lucene.RollingUpgradeReindexShouldBeSuccessfulWhenAllServersRollToCurrentVersion
 luceneReindexShouldBeSuccessfulWhenAllServersRollToCurrentVersion
 # 
org.apache.geode.cache.lucene.RollingUpgradeQueryReturnsCorrectResultsAfterServersRollOverOnPersistentPartitionRegion
 luceneQueryReturnsCorrectResultsAfterServersRollOverOnPersistentPartitionRegion
 # 
org.apache.geode.cache.lucene.RollingUpgradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegion
 luceneQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegion
 # 
org.apache.geode.cache.lucene.RollingUpgradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOverAllBucketsCreated
 # 
org.apache.geode.cache.lucene.RollingUpgradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOver
 luceneQueryReturnsCorrectResultsAfterClientAndServersAreRolledOver
 # 
org.apache.geode.cache.lucene.RollingUpgradeQueryReturnsCorrectResultAfterTwoLocatorsWithTwoServersAreRolled
 luceneQueryReturnsCorrectResultAfterTwoLocatorsWithTwoServersAreRolled

Perhaps one of these tests is doing something unusually CPU intensive.

Given that mosts tests succeeded even after emitting the warning, I may be able 
to prune this list of tests by analyzing "green" jobs that have those warnings.

> CI Failure: TxCommitMessageBCClientToServerTxPartitionTest fails with 
> ForcedDisconnectException
> ---
>
> Key: GEODE-9531
> URL: https://issues.apache.org/jira/browse/GEODE-9531
> Project: Geode
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Donal Evans
>Assignee: Eric Shu
>Priority: Major
>  Labels: GeodeOperationAPI
>
> {noformat}
> org.apache.geode.internal.cache.TxCommitMessageBCClientToServerTxPartitionTest
>  > test[11] FAILED
> org.apache.geode.test.dunit.RMIException: While invoking 
> org.apache.geode.internal.cache.TxCommitMessageBCTestBase$$Lambda$55/2050040059.run
>  in VM 2 running on Host 1797ac7f43c4 with 5 VMs
> Caused by:
> org.apache.geode.distributed.DistributedSystemDisconnectedException: 
> membership shutdown, caused by org.apache.geode.ForcedDisconnectException: 
> Member isn't responding to heartbeat requests
> Caused by:
> org.apache.geode.ForcedDisconnectException: Member isn't 
> responding to heartbeat requests
> java.lang.AssertionError: Suspicious strings were written to the log 
> during this run.
> Fix the strings or use IgnoredException.addIgnoredException to ignore.
> ---
> Found suspect string in 'dunit_suspect-vm2.log' at line 993
> [fatal 2021/05/25 16:58:13.700 GMT  
> tid=1349] Membership service failure: Member isn't responding to heartbeat 
> requests
> 
> org.apache.geode.distributed.internal.membership.api.MemberDisconnectedException:
>  Member isn't responding to heartbeat requests
>   at 
> 

[jira] [Commented] (GEODE-9531) CI Failure: TxCommitMessageBCClientToServerTxPartitionTest fails with ForcedDisconnectException

2021-10-25 Thread Geode Integration (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17434042#comment-17434042
 ] 

Geode Integration commented on GEODE-9531:
--

Seen in [upgrade-test-openjdk11 
#299|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/upgrade-test-openjdk11/builds/299]
 ... see [test 
results|http://files.apachegeode-ci.info/builds/apache-develop-main/1.15.0-build.0610/test-results/upgradeTest/1634950673/]
 or download 
[artifacts|http://files.apachegeode-ci.info/builds/apache-develop-main/1.15.0-build.0610/test-artifacts/1634950673/upgradetestfiles-openjdk11-1.15.0-build.0610.tgz].

> CI Failure: TxCommitMessageBCClientToServerTxPartitionTest fails with 
> ForcedDisconnectException
> ---
>
> Key: GEODE-9531
> URL: https://issues.apache.org/jira/browse/GEODE-9531
> Project: Geode
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Donal Evans
>Assignee: Eric Shu
>Priority: Major
>  Labels: GeodeOperationAPI
>
> {noformat}
> org.apache.geode.internal.cache.TxCommitMessageBCClientToServerTxPartitionTest
>  > test[11] FAILED
> org.apache.geode.test.dunit.RMIException: While invoking 
> org.apache.geode.internal.cache.TxCommitMessageBCTestBase$$Lambda$55/2050040059.run
>  in VM 2 running on Host 1797ac7f43c4 with 5 VMs
> Caused by:
> org.apache.geode.distributed.DistributedSystemDisconnectedException: 
> membership shutdown, caused by org.apache.geode.ForcedDisconnectException: 
> Member isn't responding to heartbeat requests
> Caused by:
> org.apache.geode.ForcedDisconnectException: Member isn't 
> responding to heartbeat requests
> java.lang.AssertionError: Suspicious strings were written to the log 
> during this run.
> Fix the strings or use IgnoredException.addIgnoredException to ignore.
> ---
> Found suspect string in 'dunit_suspect-vm2.log' at line 993
> [fatal 2021/05/25 16:58:13.700 GMT  
> tid=1349] Membership service failure: Member isn't responding to heartbeat 
> requests
> 
> org.apache.geode.distributed.internal.membership.api.MemberDisconnectedException:
>  Member isn't responding to heartbeat requests
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership$ManagerImpl.forceDisconnect(GMSMembership.java:1783)
>   at 
> org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.forceDisconnect(GMSJoinLeave.java:1122)
>   at 
> org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.processRemoveMemberMessage(GMSJoinLeave.java:725)
>   at 
> org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger$JGroupsReceiver.receive(JGroupsMessenger.java:1366)
>   at 
> org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger$JGroupsReceiver.receive(JGroupsMessenger.java:1302)
>   at org.jgroups.JChannel.invokeCallback(JChannel.java:816)
>   at org.jgroups.JChannel.up(JChannel.java:741)
>   at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1030)
>   at org.jgroups.protocols.FRAG2.up(FRAG2.java:165)
>   at org.jgroups.protocols.FlowControl.up(FlowControl.java:390)
>   at org.jgroups.protocols.UNICAST3.deliverMessage(UNICAST3.java:1077)
>   at org.jgroups.protocols.UNICAST3.handleDataReceived(UNICAST3.java:792)
>   at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:433)
>   at 
> org.apache.geode.distributed.internal.membership.gms.messenger.StatRecorder.up(StatRecorder.java:72)
>   at 
> org.apache.geode.distributed.internal.membership.gms.messenger.AddressManager.up(AddressManager.java:70)
>   at org.jgroups.protocols.TP.passMessageUp(TP.java:1658)
>   at org.jgroups.protocols.TP$SingleMessageHandler.run(TP.java:1876)
>   at org.jgroups.util.DirectExecutor.execute(DirectExecutor.java:10)
>   at org.jgroups.protocols.TP.handleSingleMessage(TP.java:1789)
>   at org.jgroups.protocols.TP.receive(TP.java:1714)
>   at 
> org.apache.geode.distributed.internal.membership.gms.messenger.Transport.receive(Transport.java:159)
>   at org.jgroups.protocols.UDP$PacketReceiver.run(UDP.java:701)
>   at java.lang.Thread.run(Thread.java:748)
> ---
> Found suspect string in 'dunit_suspect-vm2.log' at line 1041
> [error 2021/05/25 16:58:14.206 GMT  
> tid=135] Cache initialization for GemFireCache[id = 664332017; isClosing = 
> false; isShutDownAll = false; created = Tue May 25 16:57:54 GMT 2021; server 
> = false; copyOnRead = false; lockLease = 120; lockTimeout = 60] failed 
> because:

[jira] [Commented] (GEODE-9531) CI Failure: TxCommitMessageBCClientToServerTxPartitionTest fails with ForcedDisconnectException

2021-10-13 Thread Geode Integration (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17428440#comment-17428440
 ] 

Geode Integration commented on GEODE-9531:
--

Seen on support/1.13 in [upgrade-test-openjdk11 
#62|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-support-1-13-main/jobs/upgrade-test-openjdk11/builds/62]
 ... see [test 
results|http://files.apachegeode-ci.info/builds/apache-support-1-13-main/1.13.5-build.0606/test-results/upgradeTest/1634119201/]
 or download 
[artifacts|http://files.apachegeode-ci.info/builds/apache-support-1-13-main/1.13.5-build.0606/test-artifacts/1634119201/upgradetestfiles-openjdk11-1.13.5-build.0606.tgz].

> CI Failure: TxCommitMessageBCClientToServerTxPartitionTest fails with 
> ForcedDisconnectException
> ---
>
> Key: GEODE-9531
> URL: https://issues.apache.org/jira/browse/GEODE-9531
> Project: Geode
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Donal Evans
>Assignee: Eric Shu
>Priority: Major
>  Labels: GeodeOperationAPI
>
> {noformat}
> org.apache.geode.internal.cache.TxCommitMessageBCClientToServerTxPartitionTest
>  > test[11] FAILED
> org.apache.geode.test.dunit.RMIException: While invoking 
> org.apache.geode.internal.cache.TxCommitMessageBCTestBase$$Lambda$55/2050040059.run
>  in VM 2 running on Host 1797ac7f43c4 with 5 VMs
> Caused by:
> org.apache.geode.distributed.DistributedSystemDisconnectedException: 
> membership shutdown, caused by org.apache.geode.ForcedDisconnectException: 
> Member isn't responding to heartbeat requests
> Caused by:
> org.apache.geode.ForcedDisconnectException: Member isn't 
> responding to heartbeat requests
> java.lang.AssertionError: Suspicious strings were written to the log 
> during this run.
> Fix the strings or use IgnoredException.addIgnoredException to ignore.
> ---
> Found suspect string in 'dunit_suspect-vm2.log' at line 993
> [fatal 2021/05/25 16:58:13.700 GMT  
> tid=1349] Membership service failure: Member isn't responding to heartbeat 
> requests
> 
> org.apache.geode.distributed.internal.membership.api.MemberDisconnectedException:
>  Member isn't responding to heartbeat requests
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership$ManagerImpl.forceDisconnect(GMSMembership.java:1783)
>   at 
> org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.forceDisconnect(GMSJoinLeave.java:1122)
>   at 
> org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.processRemoveMemberMessage(GMSJoinLeave.java:725)
>   at 
> org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger$JGroupsReceiver.receive(JGroupsMessenger.java:1366)
>   at 
> org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger$JGroupsReceiver.receive(JGroupsMessenger.java:1302)
>   at org.jgroups.JChannel.invokeCallback(JChannel.java:816)
>   at org.jgroups.JChannel.up(JChannel.java:741)
>   at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1030)
>   at org.jgroups.protocols.FRAG2.up(FRAG2.java:165)
>   at org.jgroups.protocols.FlowControl.up(FlowControl.java:390)
>   at org.jgroups.protocols.UNICAST3.deliverMessage(UNICAST3.java:1077)
>   at org.jgroups.protocols.UNICAST3.handleDataReceived(UNICAST3.java:792)
>   at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:433)
>   at 
> org.apache.geode.distributed.internal.membership.gms.messenger.StatRecorder.up(StatRecorder.java:72)
>   at 
> org.apache.geode.distributed.internal.membership.gms.messenger.AddressManager.up(AddressManager.java:70)
>   at org.jgroups.protocols.TP.passMessageUp(TP.java:1658)
>   at org.jgroups.protocols.TP$SingleMessageHandler.run(TP.java:1876)
>   at org.jgroups.util.DirectExecutor.execute(DirectExecutor.java:10)
>   at org.jgroups.protocols.TP.handleSingleMessage(TP.java:1789)
>   at org.jgroups.protocols.TP.receive(TP.java:1714)
>   at 
> org.apache.geode.distributed.internal.membership.gms.messenger.Transport.receive(Transport.java:159)
>   at org.jgroups.protocols.UDP$PacketReceiver.run(UDP.java:701)
>   at java.lang.Thread.run(Thread.java:748)
> ---
> Found suspect string in 'dunit_suspect-vm2.log' at line 1041
> [error 2021/05/25 16:58:14.206 GMT  
> tid=135] Cache initialization for GemFireCache[id = 664332017; isClosing = 
> false; isShutDownAll = false; created = Tue May 25 16:57:54 GMT 2021; server 
> = false; copyOnRead = false; lockLease = 120; 

[jira] [Commented] (GEODE-9531) CI Failure: TxCommitMessageBCClientToServerTxPartitionTest fails with ForcedDisconnectException

2021-08-24 Thread Dale Emery (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17404105#comment-17404105
 ] 

Dale Emery commented on GEODE-9531:
---

Looking at the test artifacts, I found:
* For some reason, too many tests were run concurrently. The job runs the 
{{upgradeTest}} task with test task with {{-PdunitParallelForks=48}}, which 
sets a limit of 48 concurrent tests, but Gradle somehow ran as many as 62 tests 
concurrently. There were 60 running at the time of this failure.
* This failure happened on the 1.14 support branch. That branch did not include 
my changes to upgrade Gradle and to run tests without Dockerizing them. So 
those changes don't explain why too many tests ran concurrently.

> CI Failure: TxCommitMessageBCClientToServerTxPartitionTest fails with 
> ForcedDisconnectException
> ---
>
> Key: GEODE-9531
> URL: https://issues.apache.org/jira/browse/GEODE-9531
> Project: Geode
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Donal Evans
>Assignee: Eric Shu
>Priority: Major
>  Labels: GeodeOperationAPI
>
> {noformat}
> org.apache.geode.internal.cache.TxCommitMessageBCClientToServerTxPartitionTest
>  > test[11] FAILED
> org.apache.geode.test.dunit.RMIException: While invoking 
> org.apache.geode.internal.cache.TxCommitMessageBCTestBase$$Lambda$55/2050040059.run
>  in VM 2 running on Host 1797ac7f43c4 with 5 VMs
> Caused by:
> org.apache.geode.distributed.DistributedSystemDisconnectedException: 
> membership shutdown, caused by org.apache.geode.ForcedDisconnectException: 
> Member isn't responding to heartbeat requests
> Caused by:
> org.apache.geode.ForcedDisconnectException: Member isn't 
> responding to heartbeat requests
> java.lang.AssertionError: Suspicious strings were written to the log 
> during this run.
> Fix the strings or use IgnoredException.addIgnoredException to ignore.
> ---
> Found suspect string in 'dunit_suspect-vm2.log' at line 993
> [fatal 2021/05/25 16:58:13.700 GMT  
> tid=1349] Membership service failure: Member isn't responding to heartbeat 
> requests
> 
> org.apache.geode.distributed.internal.membership.api.MemberDisconnectedException:
>  Member isn't responding to heartbeat requests
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership$ManagerImpl.forceDisconnect(GMSMembership.java:1783)
>   at 
> org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.forceDisconnect(GMSJoinLeave.java:1122)
>   at 
> org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.processRemoveMemberMessage(GMSJoinLeave.java:725)
>   at 
> org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger$JGroupsReceiver.receive(JGroupsMessenger.java:1366)
>   at 
> org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger$JGroupsReceiver.receive(JGroupsMessenger.java:1302)
>   at org.jgroups.JChannel.invokeCallback(JChannel.java:816)
>   at org.jgroups.JChannel.up(JChannel.java:741)
>   at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1030)
>   at org.jgroups.protocols.FRAG2.up(FRAG2.java:165)
>   at org.jgroups.protocols.FlowControl.up(FlowControl.java:390)
>   at org.jgroups.protocols.UNICAST3.deliverMessage(UNICAST3.java:1077)
>   at org.jgroups.protocols.UNICAST3.handleDataReceived(UNICAST3.java:792)
>   at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:433)
>   at 
> org.apache.geode.distributed.internal.membership.gms.messenger.StatRecorder.up(StatRecorder.java:72)
>   at 
> org.apache.geode.distributed.internal.membership.gms.messenger.AddressManager.up(AddressManager.java:70)
>   at org.jgroups.protocols.TP.passMessageUp(TP.java:1658)
>   at org.jgroups.protocols.TP$SingleMessageHandler.run(TP.java:1876)
>   at org.jgroups.util.DirectExecutor.execute(DirectExecutor.java:10)
>   at org.jgroups.protocols.TP.handleSingleMessage(TP.java:1789)
>   at org.jgroups.protocols.TP.receive(TP.java:1714)
>   at 
> org.apache.geode.distributed.internal.membership.gms.messenger.Transport.receive(Transport.java:159)
>   at org.jgroups.protocols.UDP$PacketReceiver.run(UDP.java:701)
>   at java.lang.Thread.run(Thread.java:748)
> ---
> Found suspect string in 'dunit_suspect-vm2.log' at line 1041
> [error 2021/05/25 16:58:14.206 GMT  
> tid=135] Cache initialization for GemFireCache[id = 664332017; isClosing = 
> false; isShutDownAll = false; created = Tue May 25 16:57:54 GMT 2021; server 
> = false; 

[jira] [Commented] (GEODE-9531) CI Failure: TxCommitMessageBCClientToServerTxPartitionTest fails with ForcedDisconnectException

2021-08-24 Thread Eric Shu (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17404051#comment-17404051
 ] 

Eric Shu commented on GEODE-9531:
-

This is a resource issue as multiple vms (in various tests at the time) all 
encountered same suspect process.

For the failing test, the vm failed was just starting up joined the ds:
{noformat}
[vm2] [info 2021/05/25 16:57:54.603 GMT   
tid=0x87] DistributionManager 172.17.0.39(278):41003 started on 
localhost[43925]. There were 3 other DMs. others: 
[172.17.0.39(255):41002, 172.17.0.39(246):41001, 
1797ac7f43c4(107:locator):41000]  (took 1800 ms) 

[vm2] [info 2021/05/25 16:57:54.850 GMT   tid=0x556] Disabling 
statistic archival.

[vm2] [info 2021/05/25 16:57:54.896 GMT   
tid=0x87] No locator(s) found with cluster configuration service

[vm2] [info 2021/05/25 16:57:55.161 GMT   
tid=0x87] Initialized cache service 
org.apache.geode.cache.query.internal.QueryConfigurationServiceImpl

{noformat}

And at the time all vms got suspected as Geode Failure Detection kicked in.
{noformat}
[vm1] [info 2021/05/25 16:58:08.119 GMT   
tid=0x76b] received suspect message from myself for 
1797ac7f43c4(107:locator):41000: Member isn't responding to heartbeat 
requests

[vm1] [info 2021/05/25 16:58:08.118 GMT   
tid=0x76a] received suspect message from myself for 
172.17.0.39(278):41003: Member isn't responding to heartbeat requests

[vm1] [info 2021/05/25 16:58:08.143 GMT   
tid=0x76c] received suspect message from myself for 
172.17.0.39(246):41001: Member isn't responding to heartbeat requests

[vm1] [info 2021/05/25 16:58:08.264 GMT   
tid=0x76a] Performing availability check for suspect member 
172.17.0.39(278):41003 reason=Member isn't responding to heartbeat 
requests

[vm1] [info 2021/05/25 16:58:08.266 GMT   
tid=0x76a] All other members are suspect at this point
{noformat}

locator did not get enough cpu cycles as well, but managed to respond the 
suspect process just in time.
{noformat}
[locator] [warn 2021/05/25 16:58:11.415 GMT   tid=0x37] 
Failure detection heartbeat-generation thread overslept by more than a full 
period. Asleep time: 15,705,239,045 nanoseconds. Period: 2,500,000,000 
nanoseconds.

[locator] [info 2021/05/25 16:58:11.541 GMT   tid=0x32] received suspect message from 
172.17.0.39(255):41002 for 172.17.0.39(278):41003: Member isn't 
responding to heartbeat requests
{noformat}

vm2 did not and so it was kicked out of the ds.
{noformat}
[vm2] [warn 2021/05/25 16:58:13.131 GMT   tid=0x556] Statistics 
sampling thread detected a wakeup delay of 16556 ms, indicating a possible 
resource issue. Check the GC, memory, and CPU statistics.

[vm2] [warn 2021/05/25 16:58:13.147 GMT   tid=0x54a] 
Failure detection heartbeat-generation thread overslept by more than a full 
period. Asleep time: 19,938,476,329 nanoseconds. Period: 2,500,000,000 
nanoseconds.

[vm1] [info 2021/05/25 16:58:13.351 GMT   
tid=0x76a] Availability check failed for member 172.17.0.39(278):41003

[vm1] [info 2021/05/25 16:58:13.351 GMT   
tid=0x76a] Requesting removal of suspect member 172.17.0.39(278):41003
{noformat}

I also tried to see if other tests run experiencing the same issue or not. At 
the time, following tests are run concurrently.
org.apache.geode.cache.lucene.RollingUpgradeQueryReturnsCorrectResultAfterTwoLocatorsWithTwoServersAreRolled
 
luceneQueryReturnsCorrectResultAfterTwoLocatorsWithTwoServersAreRolled[from_v1.3.0,
 with reindex=true, singleHopEnabled=true]
org.apache.geode.cache.lucene.RollingUpgradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOver
 
luceneQueryReturnsCorrectResultsAfterClientAndServersAreRolledOver[from_v1.3.0, 
with reindex=true, singleHopEnabled=true]
org.apache.geode.cache.lucene.RollingUpgradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOverAllBucketsCreated
 test[from_v1.4.0, with reindex=true, singleHopEnabled=true]
org.apache.geode.cache.lucene.RollingUpgradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegion
 
luceneQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegion[from_v1.2.0,
 with reindex=false, singleHopEnabled=true]
org.apache.geode.cache.lucene.RollingUpgradeQueryReturnsCorrectResultsAfterServersRollOverOnPersistentPartitionRegion
 
luceneQueryReturnsCorrectResultsAfterServersRollOverOnPersistentPartitionRegion[from_v1.2.0,
 with reindex=false, singleHopEnabled=true]
org.apache.geode.cache.lucene.RollingUpgradeReindexShouldBeSuccessfulWhenAllServersRollToCurrentVersion
 luceneReindexShouldBeSuccessfulWhenAllServersRollToCurrentVersion[from_v1.3.0, 
with reindex=false, singleHopEnabled=true]
org.apache.geode.cache.wan.WANRollingUpgradeCreateGatewaySenderMixedSiteOneCurrentSiteTwo
 CreateGatewaySenderMixedSiteOneCurrentSiteTwo[from_v1.8.0]
org.apache.geode.cache.wan.WANRollingUpgradeEventProcessingMixedSiteOneCurrentSiteTwo
 EventProcessingMixedSiteOneCurrentSiteTwo[from_v1.7.0]

[jira] [Commented] (GEODE-9531) CI Failure: TxCommitMessageBCClientToServerTxPartitionTest fails with ForcedDisconnectException

2021-08-20 Thread Geode Integration (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17402352#comment-17402352
 ] 

Geode Integration commented on GEODE-9531:
--

Seen on support/1.14 in [UpgradeTestOpenJDK8 
#77|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-support-1-14-main/jobs/UpgradeTestOpenJDK8/builds/77]
 ... see [test 
results|http://files.apachegeode-ci.info/builds/apache-support-1-14-main/1.14.0-build.0787/test-results/upgradeTest/1621966586/]
 or download 
[artifacts|http://files.apachegeode-ci.info/builds/apache-support-1-14-main/1.14.0-build.0787/test-artifacts/1621966586/upgradetestfiles-OpenJDK8-1.14.0-build.0787.tgz].

> CI Failure: TxCommitMessageBCClientToServerTxPartitionTest fails with 
> ForcedDisconnectException
> ---
>
> Key: GEODE-9531
> URL: https://issues.apache.org/jira/browse/GEODE-9531
> Project: Geode
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Donal Evans
>Priority: Major
>  Labels: blocks-1.14.0​
>
> {noformat}
> org.apache.geode.internal.cache.TxCommitMessageBCClientToServerTxPartitionTest
>  > test[11] FAILED
> org.apache.geode.test.dunit.RMIException: While invoking 
> org.apache.geode.internal.cache.TxCommitMessageBCTestBase$$Lambda$55/2050040059.run
>  in VM 2 running on Host 1797ac7f43c4 with 5 VMs
> Caused by:
> org.apache.geode.distributed.DistributedSystemDisconnectedException: 
> membership shutdown, caused by org.apache.geode.ForcedDisconnectException: 
> Member isn't responding to heartbeat requests
> Caused by:
> org.apache.geode.ForcedDisconnectException: Member isn't 
> responding to heartbeat requests
> java.lang.AssertionError: Suspicious strings were written to the log 
> during this run.
> Fix the strings or use IgnoredException.addIgnoredException to ignore.
> ---
> Found suspect string in 'dunit_suspect-vm2.log' at line 993
> [fatal 2021/05/25 16:58:13.700 GMT  
> tid=1349] Membership service failure: Member isn't responding to heartbeat 
> requests
> 
> org.apache.geode.distributed.internal.membership.api.MemberDisconnectedException:
>  Member isn't responding to heartbeat requests
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership$ManagerImpl.forceDisconnect(GMSMembership.java:1783)
>   at 
> org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.forceDisconnect(GMSJoinLeave.java:1122)
>   at 
> org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.processRemoveMemberMessage(GMSJoinLeave.java:725)
>   at 
> org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger$JGroupsReceiver.receive(JGroupsMessenger.java:1366)
>   at 
> org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger$JGroupsReceiver.receive(JGroupsMessenger.java:1302)
>   at org.jgroups.JChannel.invokeCallback(JChannel.java:816)
>   at org.jgroups.JChannel.up(JChannel.java:741)
>   at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1030)
>   at org.jgroups.protocols.FRAG2.up(FRAG2.java:165)
>   at org.jgroups.protocols.FlowControl.up(FlowControl.java:390)
>   at org.jgroups.protocols.UNICAST3.deliverMessage(UNICAST3.java:1077)
>   at org.jgroups.protocols.UNICAST3.handleDataReceived(UNICAST3.java:792)
>   at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:433)
>   at 
> org.apache.geode.distributed.internal.membership.gms.messenger.StatRecorder.up(StatRecorder.java:72)
>   at 
> org.apache.geode.distributed.internal.membership.gms.messenger.AddressManager.up(AddressManager.java:70)
>   at org.jgroups.protocols.TP.passMessageUp(TP.java:1658)
>   at org.jgroups.protocols.TP$SingleMessageHandler.run(TP.java:1876)
>   at org.jgroups.util.DirectExecutor.execute(DirectExecutor.java:10)
>   at org.jgroups.protocols.TP.handleSingleMessage(TP.java:1789)
>   at org.jgroups.protocols.TP.receive(TP.java:1714)
>   at 
> org.apache.geode.distributed.internal.membership.gms.messenger.Transport.receive(Transport.java:159)
>   at org.jgroups.protocols.UDP$PacketReceiver.run(UDP.java:701)
>   at java.lang.Thread.run(Thread.java:748)
> ---
> Found suspect string in 'dunit_suspect-vm2.log' at line 1041
> [error 2021/05/25 16:58:14.206 GMT  
> tid=135] Cache initialization for GemFireCache[id = 664332017; isClosing = 
> false; isShutDownAll = false; created = Tue May 25 16:57:54 GMT 2021; server 
> = false; copyOnRead = false; lockLease = 120; lockTimeout = 60] failed 
> because:
>