[ 
https://issues.apache.org/jira/browse/HBASE-12852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dima Spivak reassigned HBASE-12852:
-----------------------------------

    Assignee:     (was: Dima Spivak)

> Tests from hbase-it that use ChaosMonkey don't fail if SSH commands fail
> ------------------------------------------------------------------------
>
>                 Key: HBASE-12852
>                 URL: https://issues.apache.org/jira/browse/HBASE-12852
>             Project: HBase
>          Issue Type: Bug
>          Components: integration tests
>    Affects Versions: 0.98.6
>            Reporter: Dima Spivak
>            Priority: Major
>
> I've just started rolling my sleeves up and playing about with hbase-it (at 
> the moment, only on 0.98.6), but wanted to begin filing JIRAs for issues I 
> encounter so that I don't forget to get to them. First up is the fact that it 
> seems that tests run with ChaosMonkey don't fail when the ChaosMonkey fails 
> to work. As an example, while running IntegrationTestIngest with a 
> slowDeterministic CM, I forgot to set up SSH properly and saw the following:
> {code}
> 15/01/14 07:36:53 WARN hbase.ClusterManager: Remote command: ps aux | grep 
> proc_regionserver | grep -v grep | tr -s ' ' | cut -d ' ' -f2 | xargs kill -s 
> SIGKILL , hostname:node-5.internal failed at attempt 4. Retrying until 
> maxAttempts: 5. Exception: stderr: Permission denied, please try again.
> Permission denied, please try again.
> Permission denied (publickey,password).
> , stdout: 
> 15/01/14 07:36:53 INFO util.RetryCounter: Sleeping 16000ms before retry #4...
> 15/01/14 07:36:53 INFO zookeeper.ZooKeeper: Session: 0x14ae74d7bac006b closed
> 15/01/14 07:36:53 INFO policies.Policy: Sleeping for: 59541
> 15/01/14 07:36:53 INFO zookeeper.ClientCnxn: EventThread shut down
> Failed to write keys: 0
> Key range: [150000..159999]
> Batch updates: false
> Percent of keys to update: 60
> Updater threads: 10
> Ignore nonce conflicts: true
> Regions per server: 5
> 15/01/14 07:36:56 INFO util.LoadTestTool: Starting to mutate data...
> Starting to mutate data...
> 15/01/14 07:36:57 INFO policies.Policy: Sleeping for: 88816
> 15/01/14 07:37:01 INFO util.MultiThreadedAction: [U:10] Keys=471, cols=5.7 K, 
> time=00:00:05 Overall: [keys/s= 94, latency=102 ms] Current: [keys/s=94, 
> latency=102 ms], wroteUpTo=149999
> 15/01/14 07:37:06 INFO util.MultiThreadedAction: [U:10] Keys=908, cols=11.0 
> K, time=00:00:10 Overall: [keys/s= 90, latency=90 ms] Current: [keys/s=87, 
> latency=77 ms], wroteUpTo=149999
> 15/01/14 07:37:09 INFO hbase.ClusterManager: Executing remote command: ps aux 
> | grep proc_regionserver | grep -v grep | tr -s ' ' | cut -d ' ' -f2 | xargs 
> kill -s SIGKILL , hostname:node-5.internal
> 15/01/14 07:37:09 INFO util.Shell: Executing full command [/usr/bin/ssh  
> node-5.internal "ps aux | grep proc_regionserver | grep -v grep | tr -s ' ' | 
> cut -d ' ' -f2 | xargs kill -s SIGKILL"]
> 15/01/14 07:37:09 WARN policies.Policy: Exception occured during performing 
> action: ExitCodeException exitCode=255: stderr: Permission denied, please try 
> again.
> Permission denied, please try again.
> Permission denied (publickey,password).
> , stdout: 
>       at 
> org.apache.hadoop.hbase.HBaseClusterManager.exec(HBaseClusterManager.java:208)
>       at 
> org.apache.hadoop.hbase.HBaseClusterManager.execWithRetries(HBaseClusterManager.java:223)
>       at 
> org.apache.hadoop.hbase.HBaseClusterManager.signal(HBaseClusterManager.java:268)
>       at org.apache.hadoop.hbase.ClusterManager.kill(ClusterManager.java:97)
>       at 
> org.apache.hadoop.hbase.DistributedHBaseCluster.killRegionServer(DistributedHBaseCluster.java:110)
>       at org.apache.hadoop.hbase.chaos.actions.Action.killRs(Action.java:84)
>       at 
> org.apache.hadoop.hbase.chaos.actions.RestartActionBaseAction.restartRs(RestartActionBaseAction.java:50)
>       at 
> org.apache.hadoop.hbase.chaos.actions.RestartRsHoldingMetaAction.perform(RestartRsHoldingMetaAction.java:38)
>       at 
> org.apache.hadoop.hbase.chaos.policies.DoActionsOncePolicy.runOneIteration(DoActionsOncePolicy.java:50)
>       at 
> org.apache.hadoop.hbase.chaos.policies.PeriodicPolicy.run(PeriodicPolicy.java:41)
>       at 
> org.apache.hadoop.hbase.chaos.policies.CompositeSequentialPolicy.run(CompositeSequentialPolicy.java:42)
>       at java.lang.Thread.run(Thread.java:745)
> {code}
> Seems to me that tests should fail in these instances rather than just toss a 
> warning. Was this just an oversight, [~enis] and [~ndimiduk], or is this by 
> design?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to