[ https://issues.apache.org/jira/browse/HBASE-12852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dima Spivak reassigned HBASE-12852: ----------------------------------- Assignee: (was: Dima Spivak) > Tests from hbase-it that use ChaosMonkey don't fail if SSH commands fail > ------------------------------------------------------------------------ > > Key: HBASE-12852 > URL: https://issues.apache.org/jira/browse/HBASE-12852 > Project: HBase > Issue Type: Bug > Components: integration tests > Affects Versions: 0.98.6 > Reporter: Dima Spivak > Priority: Major > > I've just started rolling my sleeves up and playing about with hbase-it (at > the moment, only on 0.98.6), but wanted to begin filing JIRAs for issues I > encounter so that I don't forget to get to them. First up is the fact that it > seems that tests run with ChaosMonkey don't fail when the ChaosMonkey fails > to work. As an example, while running IntegrationTestIngest with a > slowDeterministic CM, I forgot to set up SSH properly and saw the following: > {code} > 15/01/14 07:36:53 WARN hbase.ClusterManager: Remote command: ps aux | grep > proc_regionserver | grep -v grep | tr -s ' ' | cut -d ' ' -f2 | xargs kill -s > SIGKILL , hostname:node-5.internal failed at attempt 4. Retrying until > maxAttempts: 5. Exception: stderr: Permission denied, please try again. > Permission denied, please try again. > Permission denied (publickey,password). > , stdout: > 15/01/14 07:36:53 INFO util.RetryCounter: Sleeping 16000ms before retry #4... > 15/01/14 07:36:53 INFO zookeeper.ZooKeeper: Session: 0x14ae74d7bac006b closed > 15/01/14 07:36:53 INFO policies.Policy: Sleeping for: 59541 > 15/01/14 07:36:53 INFO zookeeper.ClientCnxn: EventThread shut down > Failed to write keys: 0 > Key range: [150000..159999] > Batch updates: false > Percent of keys to update: 60 > Updater threads: 10 > Ignore nonce conflicts: true > Regions per server: 5 > 15/01/14 07:36:56 INFO util.LoadTestTool: Starting to mutate data... > Starting to mutate data... > 15/01/14 07:36:57 INFO policies.Policy: Sleeping for: 88816 > 15/01/14 07:37:01 INFO util.MultiThreadedAction: [U:10] Keys=471, cols=5.7 K, > time=00:00:05 Overall: [keys/s= 94, latency=102 ms] Current: [keys/s=94, > latency=102 ms], wroteUpTo=149999 > 15/01/14 07:37:06 INFO util.MultiThreadedAction: [U:10] Keys=908, cols=11.0 > K, time=00:00:10 Overall: [keys/s= 90, latency=90 ms] Current: [keys/s=87, > latency=77 ms], wroteUpTo=149999 > 15/01/14 07:37:09 INFO hbase.ClusterManager: Executing remote command: ps aux > | grep proc_regionserver | grep -v grep | tr -s ' ' | cut -d ' ' -f2 | xargs > kill -s SIGKILL , hostname:node-5.internal > 15/01/14 07:37:09 INFO util.Shell: Executing full command [/usr/bin/ssh > node-5.internal "ps aux | grep proc_regionserver | grep -v grep | tr -s ' ' | > cut -d ' ' -f2 | xargs kill -s SIGKILL"] > 15/01/14 07:37:09 WARN policies.Policy: Exception occured during performing > action: ExitCodeException exitCode=255: stderr: Permission denied, please try > again. > Permission denied, please try again. > Permission denied (publickey,password). > , stdout: > at > org.apache.hadoop.hbase.HBaseClusterManager.exec(HBaseClusterManager.java:208) > at > org.apache.hadoop.hbase.HBaseClusterManager.execWithRetries(HBaseClusterManager.java:223) > at > org.apache.hadoop.hbase.HBaseClusterManager.signal(HBaseClusterManager.java:268) > at org.apache.hadoop.hbase.ClusterManager.kill(ClusterManager.java:97) > at > org.apache.hadoop.hbase.DistributedHBaseCluster.killRegionServer(DistributedHBaseCluster.java:110) > at org.apache.hadoop.hbase.chaos.actions.Action.killRs(Action.java:84) > at > org.apache.hadoop.hbase.chaos.actions.RestartActionBaseAction.restartRs(RestartActionBaseAction.java:50) > at > org.apache.hadoop.hbase.chaos.actions.RestartRsHoldingMetaAction.perform(RestartRsHoldingMetaAction.java:38) > at > org.apache.hadoop.hbase.chaos.policies.DoActionsOncePolicy.runOneIteration(DoActionsOncePolicy.java:50) > at > org.apache.hadoop.hbase.chaos.policies.PeriodicPolicy.run(PeriodicPolicy.java:41) > at > org.apache.hadoop.hbase.chaos.policies.CompositeSequentialPolicy.run(CompositeSequentialPolicy.java:42) > at java.lang.Thread.run(Thread.java:745) > {code} > Seems to me that tests should fail in these instances rather than just toss a > warning. Was this just an oversight, [~enis] and [~ndimiduk], or is this by > design? -- This message was sent by Atlassian JIRA (v7.6.3#76005)