[jira] [Commented] (GEODE-7739) JMX managers may fail to federate mbeans for other members

2022-02-06 Thread Geode Integration (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-7739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17487839#comment-17487839
 ] 

Geode Integration commented on GEODE-7739:
--

Seen in [distributed-test-openjdk8 
#972|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-mass-test-run/jobs/distributed-test-openjdk8/builds/972]
 ... see [test 
results|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.16.0-build.0035/test-results/distributedTest/1644086525/]
 or download 
[artifacts|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.16.0-build.0035/test-artifacts/1644086525/distributedtestfiles-openjdk8-1.16.0-build.0035.tgz].

> JMX managers may fail to federate mbeans for other members
> --
>
> Key: GEODE-7739
> URL: https://issues.apache.org/jira/browse/GEODE-7739
> Project: Geode
>  Issue Type: Bug
>  Components: jmx
>Reporter: Kirk Lund
>Assignee: Kirk Lund
>Priority: Major
>  Labels: GeodeOperationAPI, pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> JMX Manager may fail to federate one or more MXBeans for other members 
> because of a race condition during startup. When ManagementCacheListener is 
> first constructed, it is in a state that will ignore all callbacks because 
> the field readyForEvents is false.
> 
> Debugging with JMXMBeanReconnectDUnitTest revealed this bug.
> The test starts two locators with jmx manager configured and started. 
> Locator1 always has all of locator2's mbeans, but locator2 is intermittently 
> missing the personal mbeans of locator1. 
> I think this is caused by some sort of race condition in the code that 
> creates the monitoring regions for other members in locator2.
> It's possible that the jmx manager that hits this bug might fail to have 
> mbeans for servers as well as other locators but I haven't seen a test case 
> for this scenario.
> The exposure of this bug means that a user running more than one locator 
> might have a locator that is missing one or more mbeans for the cluster.
> 
> Studying the JMX code also reveals the existence of *GEODE-8012*.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (GEODE-7739) JMX managers may fail to federate mbeans for other members

2022-02-06 Thread Geode Integration (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-7739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17487840#comment-17487840
 ] 

Geode Integration commented on GEODE-7739:
--

Seen in [distributed-test-openjdk8 
#928|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-mass-test-run/jobs/distributed-test-openjdk8/builds/928]
 ... see [test 
results|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.16.0-build.0035/test-results/distributedTest/1644047684/]
 or download 
[artifacts|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.16.0-build.0035/test-artifacts/1644047684/distributedtestfiles-openjdk8-1.16.0-build.0035.tgz].

> JMX managers may fail to federate mbeans for other members
> --
>
> Key: GEODE-7739
> URL: https://issues.apache.org/jira/browse/GEODE-7739
> Project: Geode
>  Issue Type: Bug
>  Components: jmx
>Reporter: Kirk Lund
>Assignee: Kirk Lund
>Priority: Major
>  Labels: GeodeOperationAPI, pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> JMX Manager may fail to federate one or more MXBeans for other members 
> because of a race condition during startup. When ManagementCacheListener is 
> first constructed, it is in a state that will ignore all callbacks because 
> the field readyForEvents is false.
> 
> Debugging with JMXMBeanReconnectDUnitTest revealed this bug.
> The test starts two locators with jmx manager configured and started. 
> Locator1 always has all of locator2's mbeans, but locator2 is intermittently 
> missing the personal mbeans of locator1. 
> I think this is caused by some sort of race condition in the code that 
> creates the monitoring regions for other members in locator2.
> It's possible that the jmx manager that hits this bug might fail to have 
> mbeans for servers as well as other locators but I haven't seen a test case 
> for this scenario.
> The exposure of this bug means that a user running more than one locator 
> might have a locator that is missing one or more mbeans for the cluster.
> 
> Studying the JMX code also reveals the existence of *GEODE-8012*.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (GEODE-7739) JMX managers may fail to federate mbeans for other members

2022-02-06 Thread Geode Integration (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-7739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17487838#comment-17487838
 ] 

Geode Integration commented on GEODE-7739:
--

Seen in [distributed-test-openjdk8 
#917|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-mass-test-run/jobs/distributed-test-openjdk8/builds/917]
 ... see [test 
results|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.16.0-build.0035/test-results/distributedTest/1644039034/]
 or download 
[artifacts|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.16.0-build.0035/test-artifacts/1644039034/distributedtestfiles-openjdk8-1.16.0-build.0035.tgz].

> JMX managers may fail to federate mbeans for other members
> --
>
> Key: GEODE-7739
> URL: https://issues.apache.org/jira/browse/GEODE-7739
> Project: Geode
>  Issue Type: Bug
>  Components: jmx
>Reporter: Kirk Lund
>Assignee: Kirk Lund
>Priority: Major
>  Labels: GeodeOperationAPI, pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> JMX Manager may fail to federate one or more MXBeans for other members 
> because of a race condition during startup. When ManagementCacheListener is 
> first constructed, it is in a state that will ignore all callbacks because 
> the field readyForEvents is false.
> 
> Debugging with JMXMBeanReconnectDUnitTest revealed this bug.
> The test starts two locators with jmx manager configured and started. 
> Locator1 always has all of locator2's mbeans, but locator2 is intermittently 
> missing the personal mbeans of locator1. 
> I think this is caused by some sort of race condition in the code that 
> creates the monitoring regions for other members in locator2.
> It's possible that the jmx manager that hits this bug might fail to have 
> mbeans for servers as well as other locators but I haven't seen a test case 
> for this scenario.
> The exposure of this bug means that a user running more than one locator 
> might have a locator that is missing one or more mbeans for the cluster.
> 
> Studying the JMX code also reveals the existence of *GEODE-8012*.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (GEODE-7739) JMX managers may fail to federate mbeans for other members

2022-02-06 Thread Geode Integration (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-7739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17487837#comment-17487837
 ] 

Geode Integration commented on GEODE-7739:
--

Seen in [distributed-test-openjdk8 
#913|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-mass-test-run/jobs/distributed-test-openjdk8/builds/913]
 ... see [test 
results|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.16.0-build.0035/test-results/distributedTest/1644038950/]
 or download 
[artifacts|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.16.0-build.0035/test-artifacts/1644038950/distributedtestfiles-openjdk8-1.16.0-build.0035.tgz].

> JMX managers may fail to federate mbeans for other members
> --
>
> Key: GEODE-7739
> URL: https://issues.apache.org/jira/browse/GEODE-7739
> Project: Geode
>  Issue Type: Bug
>  Components: jmx
>Reporter: Kirk Lund
>Assignee: Kirk Lund
>Priority: Major
>  Labels: GeodeOperationAPI, pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> JMX Manager may fail to federate one or more MXBeans for other members 
> because of a race condition during startup. When ManagementCacheListener is 
> first constructed, it is in a state that will ignore all callbacks because 
> the field readyForEvents is false.
> 
> Debugging with JMXMBeanReconnectDUnitTest revealed this bug.
> The test starts two locators with jmx manager configured and started. 
> Locator1 always has all of locator2's mbeans, but locator2 is intermittently 
> missing the personal mbeans of locator1. 
> I think this is caused by some sort of race condition in the code that 
> creates the monitoring regions for other members in locator2.
> It's possible that the jmx manager that hits this bug might fail to have 
> mbeans for servers as well as other locators but I haven't seen a test case 
> for this scenario.
> The exposure of this bug means that a user running more than one locator 
> might have a locator that is missing one or more mbeans for the cluster.
> 
> Studying the JMX code also reveals the existence of *GEODE-8012*.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (GEODE-7710) CI Failure: JMXMBeanReconnectDUnitTest aka JmxServerReconnectDistributedTest fails intermittently because one locator is missing the LockServiceMXBean

2022-02-06 Thread Geode Integration (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-7710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17487835#comment-17487835
 ] 

Geode Integration commented on GEODE-7710:
--

Seen in [distributed-test-openjdk8 
#928|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-mass-test-run/jobs/distributed-test-openjdk8/builds/928]
 ... see [test 
results|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.16.0-build.0035/test-results/distributedTest/1644047684/]
 or download 
[artifacts|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.16.0-build.0035/test-artifacts/1644047684/distributedtestfiles-openjdk8-1.16.0-build.0035.tgz].

> CI Failure: JMXMBeanReconnectDUnitTest aka JmxServerReconnectDistributedTest 
> fails intermittently because one locator is missing the LockServiceMXBean
> --
>
> Key: GEODE-7710
> URL: https://issues.apache.org/jira/browse/GEODE-7710
> Project: Geode
>  Issue Type: Bug
>  Components: tests
>Affects Versions: 1.13.0, 1.14.0, 1.15.0
>Reporter: Kirk Lund
>Assignee: Kirk Lund
>Priority: Major
>  Labels: GeodeOperationAPI, flaky, pull-request-available
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> This test fails due to GEODE-7739. Enter new failures against that ticket.
> Multiple tests in JMXMBeanReconnectDUnitTest may fail an await due to one of 
> the locators missing the LockServiceMXBean for the cluster config service.
> {noformat}
> but could not find:
>  
> <[GemFire:service=LockService,name=__CLUSTER_CONFIG_LS,type=Member,member=locator1]>
> {noformat}
> These test failures are caused by *GEODE-7739*.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (GEODE-7710) CI Failure: JMXMBeanReconnectDUnitTest aka JmxServerReconnectDistributedTest fails intermittently because one locator is missing the LockServiceMXBean

2022-02-06 Thread Geode Integration (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-7710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17487834#comment-17487834
 ] 

Geode Integration commented on GEODE-7710:
--

Seen in [distributed-test-openjdk8 
#972|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-mass-test-run/jobs/distributed-test-openjdk8/builds/972]
 ... see [test 
results|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.16.0-build.0035/test-results/distributedTest/1644086525/]
 or download 
[artifacts|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.16.0-build.0035/test-artifacts/1644086525/distributedtestfiles-openjdk8-1.16.0-build.0035.tgz].

> CI Failure: JMXMBeanReconnectDUnitTest aka JmxServerReconnectDistributedTest 
> fails intermittently because one locator is missing the LockServiceMXBean
> --
>
> Key: GEODE-7710
> URL: https://issues.apache.org/jira/browse/GEODE-7710
> Project: Geode
>  Issue Type: Bug
>  Components: tests
>Affects Versions: 1.13.0, 1.14.0, 1.15.0
>Reporter: Kirk Lund
>Assignee: Kirk Lund
>Priority: Major
>  Labels: GeodeOperationAPI, flaky, pull-request-available
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> This test fails due to GEODE-7739. Enter new failures against that ticket.
> Multiple tests in JMXMBeanReconnectDUnitTest may fail an await due to one of 
> the locators missing the LockServiceMXBean for the cluster config service.
> {noformat}
> but could not find:
>  
> <[GemFire:service=LockService,name=__CLUSTER_CONFIG_LS,type=Member,member=locator1]>
> {noformat}
> These test failures are caused by *GEODE-7739*.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (GEODE-10008) Avoid possible EntryDestroyedException error in wan-copy region command when entry destroyed in partitioned region

2022-02-06 Thread Geode Integration (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-10008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17487822#comment-17487822
 ] 

Geode Integration commented on GEODE-10008:
---

Seen in [distributed-test-openjdk8 
#961|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-mass-test-run/jobs/distributed-test-openjdk8/builds/961]
 ... see [test 
results|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.16.0-build.0035/test-results/distributedTest/1644078587/]
 or download 
[artifacts|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.16.0-build.0035/test-artifacts/1644078587/distributedtestfiles-openjdk8-1.16.0-build.0035.tgz].

> Avoid possible EntryDestroyedException error in wan-copy region command when 
> entry destroyed in partitioned region
> --
>
> Key: GEODE-10008
> URL: https://issues.apache.org/jira/browse/GEODE-10008
> Project: Geode
>  Issue Type: Bug
>  Components: gfsh, wan
>Reporter: Alberto Gomez
>Assignee: Alberto Gomez
>Priority: Major
>  Labels: needsTriage
>
> When the wan-copy region command is executed over a partitioned region, it is 
> possible that sometimes an EntryDestroyedException error is found if, while 
> reading the entries from the region, one of them is destroyed.
> The error has been seen sometimes when running the following test:
> WanCopyRegionCommandDUnitTest.
> testSuccessfulExecutionWhileRunningOpsOnRegion(true, true).
>  
> Log of the error:
> Multiple Failures (2 failures)
>     org.opentest4j.AssertionFailedError: [            Member             | 
> Status | Message
> -- | -- | 
> ---
> alberto-dell(421543):41010 | ERROR  | Execution failed. Error: 
> org.apache.geode.cache.EntryDestroyedException: 26
> alberto-dell(421504):41009 | OK     | Entries copied: 2,977
> alberto-dell(421598):41011 | OK     | Entries copied: 3,042
> ] 
> expected: OK
>  but was: ERROR
>     java.lang.AssertionError: Suspicious strings were written to the log 
> during this run.
> Fix the strings or use IgnoredException.addIgnoredException to ignore.
> ---
> Found suspect string in 'dunit_suspect-vm6.log' at line 1883
> [error 2022/02/01 08:58:59.046 CET  tid=99] 
> Exception occurred attempting to wan-copy region
> java.util.concurrent.ExecutionException: 
> org.apache.geode.cache.EntryDestroyedException: 26
>     at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>     at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>     at 
> org.apache.geode.cache.wan.internal.WanCopyRegionFunctionService.execute(WanCopyRegionFunctionService.java:90)
>     at 
> org.apache.geode.management.internal.cli.functions.WanCopyRegionFunction.executeFunctionInService(WanCopyRegionFunction.java:163)
>     at 
> org.apache.geode.management.internal.cli.functions.WanCopyRegionFunction.executeFunction(WanCopyRegionFunction.java:157)
>     at 
> org.apache.geode.management.cli.CliFunction.execute(CliFunction.java:37)
>     at 
> org.apache.geode.internal.cache.MemberFunctionStreamingMessage.process(MemberFunctionStreamingMessage.java:201)
>     at 
> org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:382)
>     at 
> org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:447)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at 
> org.apache.geode.distributed.internal.ClusterOperationExecutors.runUntilShutdown(ClusterOperationExecutors.java:444)
>     at 
> org.apache.geode.distributed.internal.ClusterOperationExecutors.doFunctionExecutionThread(ClusterOperationExecutors.java:379)
>     at 
> org.apache.geode.logging.internal.executors.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:120)
>     at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.geode.cache.EntryDestroyedException: 26
>     at 
> org.apache.geode.internal.cache.NonTXEntry.basicGetEntry(NonTXEntry.java:62)
>     at 
> org.apache.geode.internal.cache.NonTXEntry.getRegion(NonTXEntry.java:119)
>     at 
> org.apache.geode.internal.cache.NonTXEntry.hashCode(NonTXEntry.java:157)
>     at java.util.HashMap.hash(HashMap.java:340)
>     at java.util.HashMap.put(HashMap.java:613)
>     at java.util.HashSet.add(HashSet.java:220)
>     at java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169)
>     at java.util.Iterator.forEachRemaining(Iterator.java:116)
>     

[jira] [Commented] (GEODE-10008) Avoid possible EntryDestroyedException error in wan-copy region command when entry destroyed in partitioned region

2022-02-06 Thread Geode Integration (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-10008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17487821#comment-17487821
 ] 

Geode Integration commented on GEODE-10008:
---

Seen in [distributed-test-openjdk8 
#973|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-mass-test-run/jobs/distributed-test-openjdk8/builds/973]
 ... see [test 
results|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.16.0-build.0035/test-results/distributedTest/1644088387/]
 or download 
[artifacts|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.16.0-build.0035/test-artifacts/1644088387/distributedtestfiles-openjdk8-1.16.0-build.0035.tgz].

> Avoid possible EntryDestroyedException error in wan-copy region command when 
> entry destroyed in partitioned region
> --
>
> Key: GEODE-10008
> URL: https://issues.apache.org/jira/browse/GEODE-10008
> Project: Geode
>  Issue Type: Bug
>  Components: gfsh, wan
>Reporter: Alberto Gomez
>Assignee: Alberto Gomez
>Priority: Major
>  Labels: needsTriage
>
> When the wan-copy region command is executed over a partitioned region, it is 
> possible that sometimes an EntryDestroyedException error is found if, while 
> reading the entries from the region, one of them is destroyed.
> The error has been seen sometimes when running the following test:
> WanCopyRegionCommandDUnitTest.
> testSuccessfulExecutionWhileRunningOpsOnRegion(true, true).
>  
> Log of the error:
> Multiple Failures (2 failures)
>     org.opentest4j.AssertionFailedError: [            Member             | 
> Status | Message
> -- | -- | 
> ---
> alberto-dell(421543):41010 | ERROR  | Execution failed. Error: 
> org.apache.geode.cache.EntryDestroyedException: 26
> alberto-dell(421504):41009 | OK     | Entries copied: 2,977
> alberto-dell(421598):41011 | OK     | Entries copied: 3,042
> ] 
> expected: OK
>  but was: ERROR
>     java.lang.AssertionError: Suspicious strings were written to the log 
> during this run.
> Fix the strings or use IgnoredException.addIgnoredException to ignore.
> ---
> Found suspect string in 'dunit_suspect-vm6.log' at line 1883
> [error 2022/02/01 08:58:59.046 CET  tid=99] 
> Exception occurred attempting to wan-copy region
> java.util.concurrent.ExecutionException: 
> org.apache.geode.cache.EntryDestroyedException: 26
>     at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>     at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>     at 
> org.apache.geode.cache.wan.internal.WanCopyRegionFunctionService.execute(WanCopyRegionFunctionService.java:90)
>     at 
> org.apache.geode.management.internal.cli.functions.WanCopyRegionFunction.executeFunctionInService(WanCopyRegionFunction.java:163)
>     at 
> org.apache.geode.management.internal.cli.functions.WanCopyRegionFunction.executeFunction(WanCopyRegionFunction.java:157)
>     at 
> org.apache.geode.management.cli.CliFunction.execute(CliFunction.java:37)
>     at 
> org.apache.geode.internal.cache.MemberFunctionStreamingMessage.process(MemberFunctionStreamingMessage.java:201)
>     at 
> org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:382)
>     at 
> org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:447)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at 
> org.apache.geode.distributed.internal.ClusterOperationExecutors.runUntilShutdown(ClusterOperationExecutors.java:444)
>     at 
> org.apache.geode.distributed.internal.ClusterOperationExecutors.doFunctionExecutionThread(ClusterOperationExecutors.java:379)
>     at 
> org.apache.geode.logging.internal.executors.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:120)
>     at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.geode.cache.EntryDestroyedException: 26
>     at 
> org.apache.geode.internal.cache.NonTXEntry.basicGetEntry(NonTXEntry.java:62)
>     at 
> org.apache.geode.internal.cache.NonTXEntry.getRegion(NonTXEntry.java:119)
>     at 
> org.apache.geode.internal.cache.NonTXEntry.hashCode(NonTXEntry.java:157)
>     at java.util.HashMap.hash(HashMap.java:340)
>     at java.util.HashMap.put(HashMap.java:613)
>     at java.util.HashSet.add(HashSet.java:220)
>     at java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169)
>     at java.util.Iterator.forEachRemaining(Iterator.java:116)
>