[ https://issues.apache.org/jira/browse/HBASE-24361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112573#comment-17112573 ]
Nick Dimiduk commented on HBASE-24361: -------------------------------------- What's committed here is still not perfect, but much better. With these changes, my cluster would lose only 1-2 region servers per hour of {{serverKilling}} monkey. Without the patch, the cluster would be completely dead within 30 minutes. Cloudera Manager appears to lose track of process status, even with the accommodations made here. More work will be needed to make this viable for long-running chaos monkey tests. > Make `RESTApiClusterManager` more resilient > ------------------------------------------- > > Key: HBASE-24361 > URL: https://issues.apache.org/jira/browse/HBASE-24361 > Project: HBase > Issue Type: Test > Components: integration tests > Affects Versions: 2.3.0 > Reporter: Nick Dimiduk > Assignee: Nick Dimiduk > Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.0 > > > The Cloudera Manager API client in {{RESTApiClusterManager}} appears to > assume that API calls sent to CM for process commands block on command > completion. However, these commands are "asynchronous," queuing work in the > background for execution. Update the client to track command submission and > block on completion of that commandId. This allows this {{ClusterManager}} to > conform to the expectations of the {{Actions}} that invoke it. -- This message was sent by Atlassian Jira (v8.3.4#803005)