[jira] [Commented] (GEODE-9531) CI Failure: TxCommitMessageBCClientToServerTxPartitionTest fails with ForcedDisconnectException
[ https://issues.apache.org/jira/browse/GEODE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17434502#comment-17434502 ] Dale Emery commented on GEODE-9531: --- I was curious about the warnings from the stat sampling thread, so I checked a bunch of runs with failures. Eight of those failure runs had swarms of those warnings. By "swarm" I mean that multiple tests issued that warning at about the same time (within a second or two). In all eight of those runs, the following tests were executing at the time of the first warning: # org.apache.geode.security.ClientAuthorizationCQDUnitTest testAllOpsWithFailover2 # org.apache.geode.management.GfshRebalanceCommandCompatibilityTest whenCurrentVersionLocatorsExecuteRebalanceOnOldServersThenItMustSucceed # org.apache.geode.management.ConfigurationCompatibilityTest whenConfigurationIsExchangedBetweenMixedVersionLocatorsThenItShouldNotThrowExceptions # org.apache.geode.cache.wan.WANRollingUpgradeSecondaryEventsNotReprocessedAfterOldSiteMemberFailover testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover # org.apache.geode.cache.wan.WANRollingUpgradeSecondaryEventsNotReprocessedAfterCurrentSiteMemberFailoverWithOldClient testSecondaryEventsNotReprocessedAfterCurrentSiteMemberFailoverWithOldClient # org.apache.geode.cache.wan.WANRollingUpgradeEventProcessingOldSiteOneCurrentSiteTwo testEventProcessingOldSiteOneCurrentSiteTwo # org.apache.geode.cache.wan.WANRollingUpgradeEventProcessingMixedSiteOneOldSiteTwo EventProcessingMixedSiteOneOldSiteTwo # org.apache.geode.cache.wan.WANRollingUpgradeEventProcessingMixedSiteOneCurrentSiteTwo EventProcessingMixedSiteOneCurrentSiteTwo # org.apache.geode.cache.wan.WANRollingUpgradeCreateGatewaySenderMixedSiteOneCurrentSiteTwo CreateGatewaySenderMixedSiteOneCurrentSiteTwo # org.apache.geode.cache.lucene.RollingUpgradeReindexShouldBeSuccessfulWhenAllServersRollToCurrentVersion luceneReindexShouldBeSuccessfulWhenAllServersRollToCurrentVersion # org.apache.geode.cache.lucene.RollingUpgradeQueryReturnsCorrectResultsAfterServersRollOverOnPersistentPartitionRegion luceneQueryReturnsCorrectResultsAfterServersRollOverOnPersistentPartitionRegion # org.apache.geode.cache.lucene.RollingUpgradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegion luceneQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegion # org.apache.geode.cache.lucene.RollingUpgradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOverAllBucketsCreated # org.apache.geode.cache.lucene.RollingUpgradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOver luceneQueryReturnsCorrectResultsAfterClientAndServersAreRolledOver # org.apache.geode.cache.lucene.RollingUpgradeQueryReturnsCorrectResultAfterTwoLocatorsWithTwoServersAreRolled luceneQueryReturnsCorrectResultAfterTwoLocatorsWithTwoServersAreRolled Perhaps one of these tests is doing something unusually CPU intensive. Given that mosts tests succeeded even after emitting the warning, I may be able to prune this list of tests by analyzing "green" jobs that have those warnings. > CI Failure: TxCommitMessageBCClientToServerTxPartitionTest fails with > ForcedDisconnectException > --- > > Key: GEODE-9531 > URL: https://issues.apache.org/jira/browse/GEODE-9531 > Project: Geode > Issue Type: Bug >Affects Versions: 1.14.0 >Reporter: Donal Evans >Assignee: Eric Shu >Priority: Major > Labels: GeodeOperationAPI > > {noformat} > org.apache.geode.internal.cache.TxCommitMessageBCClientToServerTxPartitionTest > > test[11] FAILED > org.apache.geode.test.dunit.RMIException: While invoking > org.apache.geode.internal.cache.TxCommitMessageBCTestBase$$Lambda$55/2050040059.run > in VM 2 running on Host 1797ac7f43c4 with 5 VMs > Caused by: > org.apache.geode.distributed.DistributedSystemDisconnectedException: > membership shutdown, caused by org.apache.geode.ForcedDisconnectException: > Member isn't responding to heartbeat requests > Caused by: > org.apache.geode.ForcedDisconnectException: Member isn't > responding to heartbeat requests > java.lang.AssertionError: Suspicious strings were written to the log > during this run. > Fix the strings or use IgnoredException.addIgnoredException to ignore. > --- > Found suspect string in 'dunit_suspect-vm2.log' at line 993 > [fatal 2021/05/25 16:58:13.700 GMT > tid=1349] Membership service failure: Member isn't responding to heartbeat > requests > > org.apache.geode.distributed.internal.membership.api.MemberDisconnectedException: > Member isn't responding to heartbeat requests > at >
[jira] [Commented] (GEODE-9531) CI Failure: TxCommitMessageBCClientToServerTxPartitionTest fails with ForcedDisconnectException
[ https://issues.apache.org/jira/browse/GEODE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17434042#comment-17434042 ] Geode Integration commented on GEODE-9531: -- Seen in [upgrade-test-openjdk11 #299|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/upgrade-test-openjdk11/builds/299] ... see [test results|http://files.apachegeode-ci.info/builds/apache-develop-main/1.15.0-build.0610/test-results/upgradeTest/1634950673/] or download [artifacts|http://files.apachegeode-ci.info/builds/apache-develop-main/1.15.0-build.0610/test-artifacts/1634950673/upgradetestfiles-openjdk11-1.15.0-build.0610.tgz]. > CI Failure: TxCommitMessageBCClientToServerTxPartitionTest fails with > ForcedDisconnectException > --- > > Key: GEODE-9531 > URL: https://issues.apache.org/jira/browse/GEODE-9531 > Project: Geode > Issue Type: Bug >Affects Versions: 1.14.0 >Reporter: Donal Evans >Assignee: Eric Shu >Priority: Major > Labels: GeodeOperationAPI > > {noformat} > org.apache.geode.internal.cache.TxCommitMessageBCClientToServerTxPartitionTest > > test[11] FAILED > org.apache.geode.test.dunit.RMIException: While invoking > org.apache.geode.internal.cache.TxCommitMessageBCTestBase$$Lambda$55/2050040059.run > in VM 2 running on Host 1797ac7f43c4 with 5 VMs > Caused by: > org.apache.geode.distributed.DistributedSystemDisconnectedException: > membership shutdown, caused by org.apache.geode.ForcedDisconnectException: > Member isn't responding to heartbeat requests > Caused by: > org.apache.geode.ForcedDisconnectException: Member isn't > responding to heartbeat requests > java.lang.AssertionError: Suspicious strings were written to the log > during this run. > Fix the strings or use IgnoredException.addIgnoredException to ignore. > --- > Found suspect string in 'dunit_suspect-vm2.log' at line 993 > [fatal 2021/05/25 16:58:13.700 GMT > tid=1349] Membership service failure: Member isn't responding to heartbeat > requests > > org.apache.geode.distributed.internal.membership.api.MemberDisconnectedException: > Member isn't responding to heartbeat requests > at > org.apache.geode.distributed.internal.membership.gms.GMSMembership$ManagerImpl.forceDisconnect(GMSMembership.java:1783) > at > org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.forceDisconnect(GMSJoinLeave.java:1122) > at > org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.processRemoveMemberMessage(GMSJoinLeave.java:725) > at > org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger$JGroupsReceiver.receive(JGroupsMessenger.java:1366) > at > org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger$JGroupsReceiver.receive(JGroupsMessenger.java:1302) > at org.jgroups.JChannel.invokeCallback(JChannel.java:816) > at org.jgroups.JChannel.up(JChannel.java:741) > at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1030) > at org.jgroups.protocols.FRAG2.up(FRAG2.java:165) > at org.jgroups.protocols.FlowControl.up(FlowControl.java:390) > at org.jgroups.protocols.UNICAST3.deliverMessage(UNICAST3.java:1077) > at org.jgroups.protocols.UNICAST3.handleDataReceived(UNICAST3.java:792) > at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:433) > at > org.apache.geode.distributed.internal.membership.gms.messenger.StatRecorder.up(StatRecorder.java:72) > at > org.apache.geode.distributed.internal.membership.gms.messenger.AddressManager.up(AddressManager.java:70) > at org.jgroups.protocols.TP.passMessageUp(TP.java:1658) > at org.jgroups.protocols.TP$SingleMessageHandler.run(TP.java:1876) > at org.jgroups.util.DirectExecutor.execute(DirectExecutor.java:10) > at org.jgroups.protocols.TP.handleSingleMessage(TP.java:1789) > at org.jgroups.protocols.TP.receive(TP.java:1714) > at > org.apache.geode.distributed.internal.membership.gms.messenger.Transport.receive(Transport.java:159) > at org.jgroups.protocols.UDP$PacketReceiver.run(UDP.java:701) > at java.lang.Thread.run(Thread.java:748) > --- > Found suspect string in 'dunit_suspect-vm2.log' at line 1041 > [error 2021/05/25 16:58:14.206 GMT > tid=135] Cache initialization for GemFireCache[id = 664332017; isClosing = > false; isShutDownAll = false; created = Tue May 25 16:57:54 GMT 2021; server > = false; copyOnRead = false; lockLease = 120; lockTimeout = 60] failed > because:
[jira] [Commented] (GEODE-9531) CI Failure: TxCommitMessageBCClientToServerTxPartitionTest fails with ForcedDisconnectException
[ https://issues.apache.org/jira/browse/GEODE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17428440#comment-17428440 ] Geode Integration commented on GEODE-9531: -- Seen on support/1.13 in [upgrade-test-openjdk11 #62|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-support-1-13-main/jobs/upgrade-test-openjdk11/builds/62] ... see [test results|http://files.apachegeode-ci.info/builds/apache-support-1-13-main/1.13.5-build.0606/test-results/upgradeTest/1634119201/] or download [artifacts|http://files.apachegeode-ci.info/builds/apache-support-1-13-main/1.13.5-build.0606/test-artifacts/1634119201/upgradetestfiles-openjdk11-1.13.5-build.0606.tgz]. > CI Failure: TxCommitMessageBCClientToServerTxPartitionTest fails with > ForcedDisconnectException > --- > > Key: GEODE-9531 > URL: https://issues.apache.org/jira/browse/GEODE-9531 > Project: Geode > Issue Type: Bug >Affects Versions: 1.14.0 >Reporter: Donal Evans >Assignee: Eric Shu >Priority: Major > Labels: GeodeOperationAPI > > {noformat} > org.apache.geode.internal.cache.TxCommitMessageBCClientToServerTxPartitionTest > > test[11] FAILED > org.apache.geode.test.dunit.RMIException: While invoking > org.apache.geode.internal.cache.TxCommitMessageBCTestBase$$Lambda$55/2050040059.run > in VM 2 running on Host 1797ac7f43c4 with 5 VMs > Caused by: > org.apache.geode.distributed.DistributedSystemDisconnectedException: > membership shutdown, caused by org.apache.geode.ForcedDisconnectException: > Member isn't responding to heartbeat requests > Caused by: > org.apache.geode.ForcedDisconnectException: Member isn't > responding to heartbeat requests > java.lang.AssertionError: Suspicious strings were written to the log > during this run. > Fix the strings or use IgnoredException.addIgnoredException to ignore. > --- > Found suspect string in 'dunit_suspect-vm2.log' at line 993 > [fatal 2021/05/25 16:58:13.700 GMT > tid=1349] Membership service failure: Member isn't responding to heartbeat > requests > > org.apache.geode.distributed.internal.membership.api.MemberDisconnectedException: > Member isn't responding to heartbeat requests > at > org.apache.geode.distributed.internal.membership.gms.GMSMembership$ManagerImpl.forceDisconnect(GMSMembership.java:1783) > at > org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.forceDisconnect(GMSJoinLeave.java:1122) > at > org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.processRemoveMemberMessage(GMSJoinLeave.java:725) > at > org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger$JGroupsReceiver.receive(JGroupsMessenger.java:1366) > at > org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger$JGroupsReceiver.receive(JGroupsMessenger.java:1302) > at org.jgroups.JChannel.invokeCallback(JChannel.java:816) > at org.jgroups.JChannel.up(JChannel.java:741) > at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1030) > at org.jgroups.protocols.FRAG2.up(FRAG2.java:165) > at org.jgroups.protocols.FlowControl.up(FlowControl.java:390) > at org.jgroups.protocols.UNICAST3.deliverMessage(UNICAST3.java:1077) > at org.jgroups.protocols.UNICAST3.handleDataReceived(UNICAST3.java:792) > at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:433) > at > org.apache.geode.distributed.internal.membership.gms.messenger.StatRecorder.up(StatRecorder.java:72) > at > org.apache.geode.distributed.internal.membership.gms.messenger.AddressManager.up(AddressManager.java:70) > at org.jgroups.protocols.TP.passMessageUp(TP.java:1658) > at org.jgroups.protocols.TP$SingleMessageHandler.run(TP.java:1876) > at org.jgroups.util.DirectExecutor.execute(DirectExecutor.java:10) > at org.jgroups.protocols.TP.handleSingleMessage(TP.java:1789) > at org.jgroups.protocols.TP.receive(TP.java:1714) > at > org.apache.geode.distributed.internal.membership.gms.messenger.Transport.receive(Transport.java:159) > at org.jgroups.protocols.UDP$PacketReceiver.run(UDP.java:701) > at java.lang.Thread.run(Thread.java:748) > --- > Found suspect string in 'dunit_suspect-vm2.log' at line 1041 > [error 2021/05/25 16:58:14.206 GMT > tid=135] Cache initialization for GemFireCache[id = 664332017; isClosing = > false; isShutDownAll = false; created = Tue May 25 16:57:54 GMT 2021; server > = false; copyOnRead = false; lockLease = 120;
[jira] [Commented] (GEODE-9531) CI Failure: TxCommitMessageBCClientToServerTxPartitionTest fails with ForcedDisconnectException
[ https://issues.apache.org/jira/browse/GEODE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17404105#comment-17404105 ] Dale Emery commented on GEODE-9531: --- Looking at the test artifacts, I found: * For some reason, too many tests were run concurrently. The job runs the {{upgradeTest}} task with test task with {{-PdunitParallelForks=48}}, which sets a limit of 48 concurrent tests, but Gradle somehow ran as many as 62 tests concurrently. There were 60 running at the time of this failure. * This failure happened on the 1.14 support branch. That branch did not include my changes to upgrade Gradle and to run tests without Dockerizing them. So those changes don't explain why too many tests ran concurrently. > CI Failure: TxCommitMessageBCClientToServerTxPartitionTest fails with > ForcedDisconnectException > --- > > Key: GEODE-9531 > URL: https://issues.apache.org/jira/browse/GEODE-9531 > Project: Geode > Issue Type: Bug >Affects Versions: 1.14.0 >Reporter: Donal Evans >Assignee: Eric Shu >Priority: Major > Labels: GeodeOperationAPI > > {noformat} > org.apache.geode.internal.cache.TxCommitMessageBCClientToServerTxPartitionTest > > test[11] FAILED > org.apache.geode.test.dunit.RMIException: While invoking > org.apache.geode.internal.cache.TxCommitMessageBCTestBase$$Lambda$55/2050040059.run > in VM 2 running on Host 1797ac7f43c4 with 5 VMs > Caused by: > org.apache.geode.distributed.DistributedSystemDisconnectedException: > membership shutdown, caused by org.apache.geode.ForcedDisconnectException: > Member isn't responding to heartbeat requests > Caused by: > org.apache.geode.ForcedDisconnectException: Member isn't > responding to heartbeat requests > java.lang.AssertionError: Suspicious strings were written to the log > during this run. > Fix the strings or use IgnoredException.addIgnoredException to ignore. > --- > Found suspect string in 'dunit_suspect-vm2.log' at line 993 > [fatal 2021/05/25 16:58:13.700 GMT > tid=1349] Membership service failure: Member isn't responding to heartbeat > requests > > org.apache.geode.distributed.internal.membership.api.MemberDisconnectedException: > Member isn't responding to heartbeat requests > at > org.apache.geode.distributed.internal.membership.gms.GMSMembership$ManagerImpl.forceDisconnect(GMSMembership.java:1783) > at > org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.forceDisconnect(GMSJoinLeave.java:1122) > at > org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.processRemoveMemberMessage(GMSJoinLeave.java:725) > at > org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger$JGroupsReceiver.receive(JGroupsMessenger.java:1366) > at > org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger$JGroupsReceiver.receive(JGroupsMessenger.java:1302) > at org.jgroups.JChannel.invokeCallback(JChannel.java:816) > at org.jgroups.JChannel.up(JChannel.java:741) > at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1030) > at org.jgroups.protocols.FRAG2.up(FRAG2.java:165) > at org.jgroups.protocols.FlowControl.up(FlowControl.java:390) > at org.jgroups.protocols.UNICAST3.deliverMessage(UNICAST3.java:1077) > at org.jgroups.protocols.UNICAST3.handleDataReceived(UNICAST3.java:792) > at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:433) > at > org.apache.geode.distributed.internal.membership.gms.messenger.StatRecorder.up(StatRecorder.java:72) > at > org.apache.geode.distributed.internal.membership.gms.messenger.AddressManager.up(AddressManager.java:70) > at org.jgroups.protocols.TP.passMessageUp(TP.java:1658) > at org.jgroups.protocols.TP$SingleMessageHandler.run(TP.java:1876) > at org.jgroups.util.DirectExecutor.execute(DirectExecutor.java:10) > at org.jgroups.protocols.TP.handleSingleMessage(TP.java:1789) > at org.jgroups.protocols.TP.receive(TP.java:1714) > at > org.apache.geode.distributed.internal.membership.gms.messenger.Transport.receive(Transport.java:159) > at org.jgroups.protocols.UDP$PacketReceiver.run(UDP.java:701) > at java.lang.Thread.run(Thread.java:748) > --- > Found suspect string in 'dunit_suspect-vm2.log' at line 1041 > [error 2021/05/25 16:58:14.206 GMT > tid=135] Cache initialization for GemFireCache[id = 664332017; isClosing = > false; isShutDownAll = false; created = Tue May 25 16:57:54 GMT 2021; server > = false;
[jira] [Commented] (GEODE-9531) CI Failure: TxCommitMessageBCClientToServerTxPartitionTest fails with ForcedDisconnectException
[ https://issues.apache.org/jira/browse/GEODE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17404051#comment-17404051 ] Eric Shu commented on GEODE-9531: - This is a resource issue as multiple vms (in various tests at the time) all encountered same suspect process. For the failing test, the vm failed was just starting up joined the ds: {noformat} [vm2] [info 2021/05/25 16:57:54.603 GMT tid=0x87] DistributionManager 172.17.0.39(278):41003 started on localhost[43925]. There were 3 other DMs. others: [172.17.0.39(255):41002, 172.17.0.39(246):41001, 1797ac7f43c4(107:locator):41000] (took 1800 ms) [vm2] [info 2021/05/25 16:57:54.850 GMT tid=0x556] Disabling statistic archival. [vm2] [info 2021/05/25 16:57:54.896 GMT tid=0x87] No locator(s) found with cluster configuration service [vm2] [info 2021/05/25 16:57:55.161 GMT tid=0x87] Initialized cache service org.apache.geode.cache.query.internal.QueryConfigurationServiceImpl {noformat} And at the time all vms got suspected as Geode Failure Detection kicked in. {noformat} [vm1] [info 2021/05/25 16:58:08.119 GMT tid=0x76b] received suspect message from myself for 1797ac7f43c4(107:locator):41000: Member isn't responding to heartbeat requests [vm1] [info 2021/05/25 16:58:08.118 GMT tid=0x76a] received suspect message from myself for 172.17.0.39(278):41003: Member isn't responding to heartbeat requests [vm1] [info 2021/05/25 16:58:08.143 GMT tid=0x76c] received suspect message from myself for 172.17.0.39(246):41001: Member isn't responding to heartbeat requests [vm1] [info 2021/05/25 16:58:08.264 GMT tid=0x76a] Performing availability check for suspect member 172.17.0.39(278):41003 reason=Member isn't responding to heartbeat requests [vm1] [info 2021/05/25 16:58:08.266 GMT tid=0x76a] All other members are suspect at this point {noformat} locator did not get enough cpu cycles as well, but managed to respond the suspect process just in time. {noformat} [locator] [warn 2021/05/25 16:58:11.415 GMT tid=0x37] Failure detection heartbeat-generation thread overslept by more than a full period. Asleep time: 15,705,239,045 nanoseconds. Period: 2,500,000,000 nanoseconds. [locator] [info 2021/05/25 16:58:11.541 GMT tid=0x32] received suspect message from 172.17.0.39(255):41002 for 172.17.0.39(278):41003: Member isn't responding to heartbeat requests {noformat} vm2 did not and so it was kicked out of the ds. {noformat} [vm2] [warn 2021/05/25 16:58:13.131 GMT tid=0x556] Statistics sampling thread detected a wakeup delay of 16556 ms, indicating a possible resource issue. Check the GC, memory, and CPU statistics. [vm2] [warn 2021/05/25 16:58:13.147 GMT tid=0x54a] Failure detection heartbeat-generation thread overslept by more than a full period. Asleep time: 19,938,476,329 nanoseconds. Period: 2,500,000,000 nanoseconds. [vm1] [info 2021/05/25 16:58:13.351 GMT tid=0x76a] Availability check failed for member 172.17.0.39(278):41003 [vm1] [info 2021/05/25 16:58:13.351 GMT tid=0x76a] Requesting removal of suspect member 172.17.0.39(278):41003 {noformat} I also tried to see if other tests run experiencing the same issue or not. At the time, following tests are run concurrently. org.apache.geode.cache.lucene.RollingUpgradeQueryReturnsCorrectResultAfterTwoLocatorsWithTwoServersAreRolled luceneQueryReturnsCorrectResultAfterTwoLocatorsWithTwoServersAreRolled[from_v1.3.0, with reindex=true, singleHopEnabled=true] org.apache.geode.cache.lucene.RollingUpgradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOver luceneQueryReturnsCorrectResultsAfterClientAndServersAreRolledOver[from_v1.3.0, with reindex=true, singleHopEnabled=true] org.apache.geode.cache.lucene.RollingUpgradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOverAllBucketsCreated test[from_v1.4.0, with reindex=true, singleHopEnabled=true] org.apache.geode.cache.lucene.RollingUpgradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegion luceneQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegion[from_v1.2.0, with reindex=false, singleHopEnabled=true] org.apache.geode.cache.lucene.RollingUpgradeQueryReturnsCorrectResultsAfterServersRollOverOnPersistentPartitionRegion luceneQueryReturnsCorrectResultsAfterServersRollOverOnPersistentPartitionRegion[from_v1.2.0, with reindex=false, singleHopEnabled=true] org.apache.geode.cache.lucene.RollingUpgradeReindexShouldBeSuccessfulWhenAllServersRollToCurrentVersion luceneReindexShouldBeSuccessfulWhenAllServersRollToCurrentVersion[from_v1.3.0, with reindex=false, singleHopEnabled=true] org.apache.geode.cache.wan.WANRollingUpgradeCreateGatewaySenderMixedSiteOneCurrentSiteTwo CreateGatewaySenderMixedSiteOneCurrentSiteTwo[from_v1.8.0] org.apache.geode.cache.wan.WANRollingUpgradeEventProcessingMixedSiteOneCurrentSiteTwo EventProcessingMixedSiteOneCurrentSiteTwo[from_v1.7.0]
[jira] [Commented] (GEODE-9531) CI Failure: TxCommitMessageBCClientToServerTxPartitionTest fails with ForcedDisconnectException
[ https://issues.apache.org/jira/browse/GEODE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17402352#comment-17402352 ] Geode Integration commented on GEODE-9531: -- Seen on support/1.14 in [UpgradeTestOpenJDK8 #77|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-support-1-14-main/jobs/UpgradeTestOpenJDK8/builds/77] ... see [test results|http://files.apachegeode-ci.info/builds/apache-support-1-14-main/1.14.0-build.0787/test-results/upgradeTest/1621966586/] or download [artifacts|http://files.apachegeode-ci.info/builds/apache-support-1-14-main/1.14.0-build.0787/test-artifacts/1621966586/upgradetestfiles-OpenJDK8-1.14.0-build.0787.tgz]. > CI Failure: TxCommitMessageBCClientToServerTxPartitionTest fails with > ForcedDisconnectException > --- > > Key: GEODE-9531 > URL: https://issues.apache.org/jira/browse/GEODE-9531 > Project: Geode > Issue Type: Bug >Affects Versions: 1.14.0 >Reporter: Donal Evans >Priority: Major > Labels: blocks-1.14.0 > > {noformat} > org.apache.geode.internal.cache.TxCommitMessageBCClientToServerTxPartitionTest > > test[11] FAILED > org.apache.geode.test.dunit.RMIException: While invoking > org.apache.geode.internal.cache.TxCommitMessageBCTestBase$$Lambda$55/2050040059.run > in VM 2 running on Host 1797ac7f43c4 with 5 VMs > Caused by: > org.apache.geode.distributed.DistributedSystemDisconnectedException: > membership shutdown, caused by org.apache.geode.ForcedDisconnectException: > Member isn't responding to heartbeat requests > Caused by: > org.apache.geode.ForcedDisconnectException: Member isn't > responding to heartbeat requests > java.lang.AssertionError: Suspicious strings were written to the log > during this run. > Fix the strings or use IgnoredException.addIgnoredException to ignore. > --- > Found suspect string in 'dunit_suspect-vm2.log' at line 993 > [fatal 2021/05/25 16:58:13.700 GMT > tid=1349] Membership service failure: Member isn't responding to heartbeat > requests > > org.apache.geode.distributed.internal.membership.api.MemberDisconnectedException: > Member isn't responding to heartbeat requests > at > org.apache.geode.distributed.internal.membership.gms.GMSMembership$ManagerImpl.forceDisconnect(GMSMembership.java:1783) > at > org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.forceDisconnect(GMSJoinLeave.java:1122) > at > org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.processRemoveMemberMessage(GMSJoinLeave.java:725) > at > org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger$JGroupsReceiver.receive(JGroupsMessenger.java:1366) > at > org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger$JGroupsReceiver.receive(JGroupsMessenger.java:1302) > at org.jgroups.JChannel.invokeCallback(JChannel.java:816) > at org.jgroups.JChannel.up(JChannel.java:741) > at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1030) > at org.jgroups.protocols.FRAG2.up(FRAG2.java:165) > at org.jgroups.protocols.FlowControl.up(FlowControl.java:390) > at org.jgroups.protocols.UNICAST3.deliverMessage(UNICAST3.java:1077) > at org.jgroups.protocols.UNICAST3.handleDataReceived(UNICAST3.java:792) > at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:433) > at > org.apache.geode.distributed.internal.membership.gms.messenger.StatRecorder.up(StatRecorder.java:72) > at > org.apache.geode.distributed.internal.membership.gms.messenger.AddressManager.up(AddressManager.java:70) > at org.jgroups.protocols.TP.passMessageUp(TP.java:1658) > at org.jgroups.protocols.TP$SingleMessageHandler.run(TP.java:1876) > at org.jgroups.util.DirectExecutor.execute(DirectExecutor.java:10) > at org.jgroups.protocols.TP.handleSingleMessage(TP.java:1789) > at org.jgroups.protocols.TP.receive(TP.java:1714) > at > org.apache.geode.distributed.internal.membership.gms.messenger.Transport.receive(Transport.java:159) > at org.jgroups.protocols.UDP$PacketReceiver.run(UDP.java:701) > at java.lang.Thread.run(Thread.java:748) > --- > Found suspect string in 'dunit_suspect-vm2.log' at line 1041 > [error 2021/05/25 16:58:14.206 GMT > tid=135] Cache initialization for GemFireCache[id = 664332017; isClosing = > false; isShutDownAll = false; created = Tue May 25 16:57:54 GMT 2021; server > = false; copyOnRead = false; lockLease = 120; lockTimeout = 60] failed > because: >