[jira] [Commented] (GEODE-7739) JMX managers may fail to federate mbeans for other members
[ https://issues.apache.org/jira/browse/GEODE-7739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17487839#comment-17487839 ] Geode Integration commented on GEODE-7739: -- Seen in [distributed-test-openjdk8 #972|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-mass-test-run/jobs/distributed-test-openjdk8/builds/972] ... see [test results|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.16.0-build.0035/test-results/distributedTest/1644086525/] or download [artifacts|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.16.0-build.0035/test-artifacts/1644086525/distributedtestfiles-openjdk8-1.16.0-build.0035.tgz]. > JMX managers may fail to federate mbeans for other members > -- > > Key: GEODE-7739 > URL: https://issues.apache.org/jira/browse/GEODE-7739 > Project: Geode > Issue Type: Bug > Components: jmx >Reporter: Kirk Lund >Assignee: Kirk Lund >Priority: Major > Labels: GeodeOperationAPI, pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > JMX Manager may fail to federate one or more MXBeans for other members > because of a race condition during startup. When ManagementCacheListener is > first constructed, it is in a state that will ignore all callbacks because > the field readyForEvents is false. > > Debugging with JMXMBeanReconnectDUnitTest revealed this bug. > The test starts two locators with jmx manager configured and started. > Locator1 always has all of locator2's mbeans, but locator2 is intermittently > missing the personal mbeans of locator1. > I think this is caused by some sort of race condition in the code that > creates the monitoring regions for other members in locator2. > It's possible that the jmx manager that hits this bug might fail to have > mbeans for servers as well as other locators but I haven't seen a test case > for this scenario. > The exposure of this bug means that a user running more than one locator > might have a locator that is missing one or more mbeans for the cluster. > > Studying the JMX code also reveals the existence of *GEODE-8012*. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (GEODE-7739) JMX managers may fail to federate mbeans for other members
[ https://issues.apache.org/jira/browse/GEODE-7739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17487840#comment-17487840 ] Geode Integration commented on GEODE-7739: -- Seen in [distributed-test-openjdk8 #928|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-mass-test-run/jobs/distributed-test-openjdk8/builds/928] ... see [test results|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.16.0-build.0035/test-results/distributedTest/1644047684/] or download [artifacts|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.16.0-build.0035/test-artifacts/1644047684/distributedtestfiles-openjdk8-1.16.0-build.0035.tgz]. > JMX managers may fail to federate mbeans for other members > -- > > Key: GEODE-7739 > URL: https://issues.apache.org/jira/browse/GEODE-7739 > Project: Geode > Issue Type: Bug > Components: jmx >Reporter: Kirk Lund >Assignee: Kirk Lund >Priority: Major > Labels: GeodeOperationAPI, pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > JMX Manager may fail to federate one or more MXBeans for other members > because of a race condition during startup. When ManagementCacheListener is > first constructed, it is in a state that will ignore all callbacks because > the field readyForEvents is false. > > Debugging with JMXMBeanReconnectDUnitTest revealed this bug. > The test starts two locators with jmx manager configured and started. > Locator1 always has all of locator2's mbeans, but locator2 is intermittently > missing the personal mbeans of locator1. > I think this is caused by some sort of race condition in the code that > creates the monitoring regions for other members in locator2. > It's possible that the jmx manager that hits this bug might fail to have > mbeans for servers as well as other locators but I haven't seen a test case > for this scenario. > The exposure of this bug means that a user running more than one locator > might have a locator that is missing one or more mbeans for the cluster. > > Studying the JMX code also reveals the existence of *GEODE-8012*. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (GEODE-7739) JMX managers may fail to federate mbeans for other members
[ https://issues.apache.org/jira/browse/GEODE-7739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17487838#comment-17487838 ] Geode Integration commented on GEODE-7739: -- Seen in [distributed-test-openjdk8 #917|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-mass-test-run/jobs/distributed-test-openjdk8/builds/917] ... see [test results|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.16.0-build.0035/test-results/distributedTest/1644039034/] or download [artifacts|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.16.0-build.0035/test-artifacts/1644039034/distributedtestfiles-openjdk8-1.16.0-build.0035.tgz]. > JMX managers may fail to federate mbeans for other members > -- > > Key: GEODE-7739 > URL: https://issues.apache.org/jira/browse/GEODE-7739 > Project: Geode > Issue Type: Bug > Components: jmx >Reporter: Kirk Lund >Assignee: Kirk Lund >Priority: Major > Labels: GeodeOperationAPI, pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > JMX Manager may fail to federate one or more MXBeans for other members > because of a race condition during startup. When ManagementCacheListener is > first constructed, it is in a state that will ignore all callbacks because > the field readyForEvents is false. > > Debugging with JMXMBeanReconnectDUnitTest revealed this bug. > The test starts two locators with jmx manager configured and started. > Locator1 always has all of locator2's mbeans, but locator2 is intermittently > missing the personal mbeans of locator1. > I think this is caused by some sort of race condition in the code that > creates the monitoring regions for other members in locator2. > It's possible that the jmx manager that hits this bug might fail to have > mbeans for servers as well as other locators but I haven't seen a test case > for this scenario. > The exposure of this bug means that a user running more than one locator > might have a locator that is missing one or more mbeans for the cluster. > > Studying the JMX code also reveals the existence of *GEODE-8012*. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (GEODE-7739) JMX managers may fail to federate mbeans for other members
[ https://issues.apache.org/jira/browse/GEODE-7739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17487837#comment-17487837 ] Geode Integration commented on GEODE-7739: -- Seen in [distributed-test-openjdk8 #913|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-mass-test-run/jobs/distributed-test-openjdk8/builds/913] ... see [test results|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.16.0-build.0035/test-results/distributedTest/1644038950/] or download [artifacts|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.16.0-build.0035/test-artifacts/1644038950/distributedtestfiles-openjdk8-1.16.0-build.0035.tgz]. > JMX managers may fail to federate mbeans for other members > -- > > Key: GEODE-7739 > URL: https://issues.apache.org/jira/browse/GEODE-7739 > Project: Geode > Issue Type: Bug > Components: jmx >Reporter: Kirk Lund >Assignee: Kirk Lund >Priority: Major > Labels: GeodeOperationAPI, pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > JMX Manager may fail to federate one or more MXBeans for other members > because of a race condition during startup. When ManagementCacheListener is > first constructed, it is in a state that will ignore all callbacks because > the field readyForEvents is false. > > Debugging with JMXMBeanReconnectDUnitTest revealed this bug. > The test starts two locators with jmx manager configured and started. > Locator1 always has all of locator2's mbeans, but locator2 is intermittently > missing the personal mbeans of locator1. > I think this is caused by some sort of race condition in the code that > creates the monitoring regions for other members in locator2. > It's possible that the jmx manager that hits this bug might fail to have > mbeans for servers as well as other locators but I haven't seen a test case > for this scenario. > The exposure of this bug means that a user running more than one locator > might have a locator that is missing one or more mbeans for the cluster. > > Studying the JMX code also reveals the existence of *GEODE-8012*. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (GEODE-7710) CI Failure: JMXMBeanReconnectDUnitTest aka JmxServerReconnectDistributedTest fails intermittently because one locator is missing the LockServiceMXBean
[ https://issues.apache.org/jira/browse/GEODE-7710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17487835#comment-17487835 ] Geode Integration commented on GEODE-7710: -- Seen in [distributed-test-openjdk8 #928|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-mass-test-run/jobs/distributed-test-openjdk8/builds/928] ... see [test results|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.16.0-build.0035/test-results/distributedTest/1644047684/] or download [artifacts|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.16.0-build.0035/test-artifacts/1644047684/distributedtestfiles-openjdk8-1.16.0-build.0035.tgz]. > CI Failure: JMXMBeanReconnectDUnitTest aka JmxServerReconnectDistributedTest > fails intermittently because one locator is missing the LockServiceMXBean > -- > > Key: GEODE-7710 > URL: https://issues.apache.org/jira/browse/GEODE-7710 > Project: Geode > Issue Type: Bug > Components: tests >Affects Versions: 1.13.0, 1.14.0, 1.15.0 >Reporter: Kirk Lund >Assignee: Kirk Lund >Priority: Major > Labels: GeodeOperationAPI, flaky, pull-request-available > Time Spent: 5h > Remaining Estimate: 0h > > This test fails due to GEODE-7739. Enter new failures against that ticket. > Multiple tests in JMXMBeanReconnectDUnitTest may fail an await due to one of > the locators missing the LockServiceMXBean for the cluster config service. > {noformat} > but could not find: > > <[GemFire:service=LockService,name=__CLUSTER_CONFIG_LS,type=Member,member=locator1]> > {noformat} > These test failures are caused by *GEODE-7739*. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (GEODE-7710) CI Failure: JMXMBeanReconnectDUnitTest aka JmxServerReconnectDistributedTest fails intermittently because one locator is missing the LockServiceMXBean
[ https://issues.apache.org/jira/browse/GEODE-7710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17487834#comment-17487834 ] Geode Integration commented on GEODE-7710: -- Seen in [distributed-test-openjdk8 #972|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-mass-test-run/jobs/distributed-test-openjdk8/builds/972] ... see [test results|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.16.0-build.0035/test-results/distributedTest/1644086525/] or download [artifacts|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.16.0-build.0035/test-artifacts/1644086525/distributedtestfiles-openjdk8-1.16.0-build.0035.tgz]. > CI Failure: JMXMBeanReconnectDUnitTest aka JmxServerReconnectDistributedTest > fails intermittently because one locator is missing the LockServiceMXBean > -- > > Key: GEODE-7710 > URL: https://issues.apache.org/jira/browse/GEODE-7710 > Project: Geode > Issue Type: Bug > Components: tests >Affects Versions: 1.13.0, 1.14.0, 1.15.0 >Reporter: Kirk Lund >Assignee: Kirk Lund >Priority: Major > Labels: GeodeOperationAPI, flaky, pull-request-available > Time Spent: 5h > Remaining Estimate: 0h > > This test fails due to GEODE-7739. Enter new failures against that ticket. > Multiple tests in JMXMBeanReconnectDUnitTest may fail an await due to one of > the locators missing the LockServiceMXBean for the cluster config service. > {noformat} > but could not find: > > <[GemFire:service=LockService,name=__CLUSTER_CONFIG_LS,type=Member,member=locator1]> > {noformat} > These test failures are caused by *GEODE-7739*. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (GEODE-10008) Avoid possible EntryDestroyedException error in wan-copy region command when entry destroyed in partitioned region
[ https://issues.apache.org/jira/browse/GEODE-10008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17487822#comment-17487822 ] Geode Integration commented on GEODE-10008: --- Seen in [distributed-test-openjdk8 #961|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-mass-test-run/jobs/distributed-test-openjdk8/builds/961] ... see [test results|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.16.0-build.0035/test-results/distributedTest/1644078587/] or download [artifacts|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.16.0-build.0035/test-artifacts/1644078587/distributedtestfiles-openjdk8-1.16.0-build.0035.tgz]. > Avoid possible EntryDestroyedException error in wan-copy region command when > entry destroyed in partitioned region > -- > > Key: GEODE-10008 > URL: https://issues.apache.org/jira/browse/GEODE-10008 > Project: Geode > Issue Type: Bug > Components: gfsh, wan >Reporter: Alberto Gomez >Assignee: Alberto Gomez >Priority: Major > Labels: needsTriage > > When the wan-copy region command is executed over a partitioned region, it is > possible that sometimes an EntryDestroyedException error is found if, while > reading the entries from the region, one of them is destroyed. > The error has been seen sometimes when running the following test: > WanCopyRegionCommandDUnitTest. > testSuccessfulExecutionWhileRunningOpsOnRegion(true, true). > > Log of the error: > Multiple Failures (2 failures) > org.opentest4j.AssertionFailedError: [ Member | > Status | Message > -- | -- | > --- > alberto-dell(421543):41010 | ERROR | Execution failed. Error: > org.apache.geode.cache.EntryDestroyedException: 26 > alberto-dell(421504):41009 | OK | Entries copied: 2,977 > alberto-dell(421598):41011 | OK | Entries copied: 3,042 > ] > expected: OK > but was: ERROR > java.lang.AssertionError: Suspicious strings were written to the log > during this run. > Fix the strings or use IgnoredException.addIgnoredException to ignore. > --- > Found suspect string in 'dunit_suspect-vm6.log' at line 1883 > [error 2022/02/01 08:58:59.046 CET tid=99] > Exception occurred attempting to wan-copy region > java.util.concurrent.ExecutionException: > org.apache.geode.cache.EntryDestroyedException: 26 > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.geode.cache.wan.internal.WanCopyRegionFunctionService.execute(WanCopyRegionFunctionService.java:90) > at > org.apache.geode.management.internal.cli.functions.WanCopyRegionFunction.executeFunctionInService(WanCopyRegionFunction.java:163) > at > org.apache.geode.management.internal.cli.functions.WanCopyRegionFunction.executeFunction(WanCopyRegionFunction.java:157) > at > org.apache.geode.management.cli.CliFunction.execute(CliFunction.java:37) > at > org.apache.geode.internal.cache.MemberFunctionStreamingMessage.process(MemberFunctionStreamingMessage.java:201) > at > org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:382) > at > org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:447) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at > org.apache.geode.distributed.internal.ClusterOperationExecutors.runUntilShutdown(ClusterOperationExecutors.java:444) > at > org.apache.geode.distributed.internal.ClusterOperationExecutors.doFunctionExecutionThread(ClusterOperationExecutors.java:379) > at > org.apache.geode.logging.internal.executors.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:120) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.geode.cache.EntryDestroyedException: 26 > at > org.apache.geode.internal.cache.NonTXEntry.basicGetEntry(NonTXEntry.java:62) > at > org.apache.geode.internal.cache.NonTXEntry.getRegion(NonTXEntry.java:119) > at > org.apache.geode.internal.cache.NonTXEntry.hashCode(NonTXEntry.java:157) > at java.util.HashMap.hash(HashMap.java:340) > at java.util.HashMap.put(HashMap.java:613) > at java.util.HashSet.add(HashSet.java:220) > at java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169) > at java.util.Iterator.forEachRemaining(Iterator.java:116) >
[jira] [Commented] (GEODE-10008) Avoid possible EntryDestroyedException error in wan-copy region command when entry destroyed in partitioned region
[ https://issues.apache.org/jira/browse/GEODE-10008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17487821#comment-17487821 ] Geode Integration commented on GEODE-10008: --- Seen in [distributed-test-openjdk8 #973|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-mass-test-run/jobs/distributed-test-openjdk8/builds/973] ... see [test results|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.16.0-build.0035/test-results/distributedTest/1644088387/] or download [artifacts|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.16.0-build.0035/test-artifacts/1644088387/distributedtestfiles-openjdk8-1.16.0-build.0035.tgz]. > Avoid possible EntryDestroyedException error in wan-copy region command when > entry destroyed in partitioned region > -- > > Key: GEODE-10008 > URL: https://issues.apache.org/jira/browse/GEODE-10008 > Project: Geode > Issue Type: Bug > Components: gfsh, wan >Reporter: Alberto Gomez >Assignee: Alberto Gomez >Priority: Major > Labels: needsTriage > > When the wan-copy region command is executed over a partitioned region, it is > possible that sometimes an EntryDestroyedException error is found if, while > reading the entries from the region, one of them is destroyed. > The error has been seen sometimes when running the following test: > WanCopyRegionCommandDUnitTest. > testSuccessfulExecutionWhileRunningOpsOnRegion(true, true). > > Log of the error: > Multiple Failures (2 failures) > org.opentest4j.AssertionFailedError: [ Member | > Status | Message > -- | -- | > --- > alberto-dell(421543):41010 | ERROR | Execution failed. Error: > org.apache.geode.cache.EntryDestroyedException: 26 > alberto-dell(421504):41009 | OK | Entries copied: 2,977 > alberto-dell(421598):41011 | OK | Entries copied: 3,042 > ] > expected: OK > but was: ERROR > java.lang.AssertionError: Suspicious strings were written to the log > during this run. > Fix the strings or use IgnoredException.addIgnoredException to ignore. > --- > Found suspect string in 'dunit_suspect-vm6.log' at line 1883 > [error 2022/02/01 08:58:59.046 CET tid=99] > Exception occurred attempting to wan-copy region > java.util.concurrent.ExecutionException: > org.apache.geode.cache.EntryDestroyedException: 26 > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.geode.cache.wan.internal.WanCopyRegionFunctionService.execute(WanCopyRegionFunctionService.java:90) > at > org.apache.geode.management.internal.cli.functions.WanCopyRegionFunction.executeFunctionInService(WanCopyRegionFunction.java:163) > at > org.apache.geode.management.internal.cli.functions.WanCopyRegionFunction.executeFunction(WanCopyRegionFunction.java:157) > at > org.apache.geode.management.cli.CliFunction.execute(CliFunction.java:37) > at > org.apache.geode.internal.cache.MemberFunctionStreamingMessage.process(MemberFunctionStreamingMessage.java:201) > at > org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:382) > at > org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:447) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at > org.apache.geode.distributed.internal.ClusterOperationExecutors.runUntilShutdown(ClusterOperationExecutors.java:444) > at > org.apache.geode.distributed.internal.ClusterOperationExecutors.doFunctionExecutionThread(ClusterOperationExecutors.java:379) > at > org.apache.geode.logging.internal.executors.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:120) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.geode.cache.EntryDestroyedException: 26 > at > org.apache.geode.internal.cache.NonTXEntry.basicGetEntry(NonTXEntry.java:62) > at > org.apache.geode.internal.cache.NonTXEntry.getRegion(NonTXEntry.java:119) > at > org.apache.geode.internal.cache.NonTXEntry.hashCode(NonTXEntry.java:157) > at java.util.HashMap.hash(HashMap.java:340) > at java.util.HashMap.put(HashMap.java:613) > at java.util.HashSet.add(HashSet.java:220) > at java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169) > at java.util.Iterator.forEachRemaining(Iterator.java:116) >