[jira] [Commented] (GEODE-7739) JMX managers may fail to federate mbeans for other members

ASF GitHub Bot (Jira) Fri, 11 Dec 2020 16:08:08 -0800


    [ 
https://issues.apache.org/jira/browse/GEODE-7739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17248262#comment-17248262
 ]


ASF GitHub Bot commented on GEODE-7739:
---------------------------------------

jdeppe-pivotal commented on a change in pull request #5778:
URL: https://github.com/apache/geode/pull/5778#discussion_r541439516



##########
File path: 
geode-core/src/main/java/org/apache/geode/management/internal/FederatingManager.java
##########
@@ -380,8 +384,10 @@ void addMemberArtifacts(InternalDistributedMember member) {
         return;
       }
 
-      try {
+      FederatingManagerCancelCriterion cancelCriterion =

Review comment:
       I'm really not familiar with the usage of `CancelCriterion` so perhaps 
my understanding of what's required here is a bit naive... My understanding was 
simply that we didn't want to indefinitely block any threads `await`ing in 
either of the two listeners in question here. That situation would arise if 
something bad happened in the `FederatingManager` before it could get to 
calling `readyForEvents`. Hence the use of the `StoppableCountdownLatch`.
   
   Since what's calling into these listeners is a callback using an 
asynchronous thread, I don't understand why it is the responsibility of these 
listeners to go through all of this ceremony to tell some random component that 
the cache is closing. Surely that component should have its own means of 
determining that the cache is closing? Why is it not enough for the thread to 
be released and throw an exception?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> JMX managers may fail to federate mbeans for other members
> ----------------------------------------------------------
>
>                 Key: GEODE-7739
>                 URL: https://issues.apache.org/jira/browse/GEODE-7739
>             Project: Geode
>          Issue Type: Bug
>          Components: jmx
>            Reporter: Kirk Lund
>            Assignee: Kirk Lund
>            Priority: Major
>              Labels: GeodeOperationAPI, pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> JMX Manager may fail to federate one or more MXBeans for other members 
> because of a race condition during startup. When ManagementCacheListener is 
> first constructed, it is in a state that will ignore all callbacks because 
> the field readyForEvents is false.
> ----
> Debugging with JMXMBeanReconnectDUnitTest revealed this bug.
> The test starts two locators with jmx manager configured and started. 
> Locator1 always has all of locator2's mbeans, but locator2 is intermittently 
> missing the personal mbeans of locator1. 
> I think this is caused by some sort of race condition in the code that 
> creates the monitoring regions for other members in locator2.
> It's possible that the jmx manager that hits this bug might fail to have 
> mbeans for servers as well as other locators but I haven't seen a test case 
> for this scenario.
> The exposure of this bug means that a user running more than one locator 
> might have a locator that is missing one or more mbeans for the cluster.
> ----
> Studying the JMX code also reveals the existence of *GEODE-8012*.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (GEODE-7739) JMX managers may fail to federate mbeans for other members

Reply via email to