[ https://issues.apache.org/jira/browse/HBASE-10103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844754#comment-13844754 ]
Elliott Clark edited comment on HBASE-10103 at 12/10/13 10:29 PM: ------------------------------------------------------------------ I'm seeing TestNodeHealthCheckChore hang (or run slower than the timeout) on trunk on my jenkins box (twice in a row). Running just TestNodeHealthCheckChore locally passes though. Here's the code stack that I saw one of the two times: {code} "pool-1-thread-1" prio=10 tid=0x00007f1f106d1800 nid=0x57b1 runnable [0x00007f1f14676000] java.lang.Thread.State: RUNNABLE at java.io.FileInputStream.readBytes(Native Method) at java.io.FileInputStream.read(FileInputStream.java:220) at java.io.BufferedInputStream.read1(BufferedInputStream.java:256) at java.io.BufferedInputStream.read(BufferedInputStream.java:317) - locked <0x00000000fbac8c28> (a java.io.BufferedInputStream) at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:264) at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:306) at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:158) - locked <0x00000000fb8a6e00> (a java.io.InputStreamReader) at java.io.InputStreamReader.read(InputStreamReader.java:167) at java.io.BufferedReader.fill(BufferedReader.java:136) at java.io.BufferedReader.read1(BufferedReader.java:187) at java.io.BufferedReader.read(BufferedReader.java:261) - locked <0x00000000fb8a6e00> (a java.io.InputStreamReader) at org.apache.hadoop.util.Shell$ShellCommandExecutor.parseExecResult(Shell.java:602) at org.apache.hadoop.util.Shell.runCommand(Shell.java:446) at org.apache.hadoop.util.Shell.run(Shell.java:379) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) at org.apache.hadoop.hbase.HealthChecker.checkHealth(HealthChecker.java:76) at org.apache.hadoop.hbase.TestNodeHealthCheckChore.healthCheckerTest(TestNodeHealthCheckChore.java:88) at org.apache.hadoop.hbase.TestNodeHealthCheckChore.testHealthCheckerTimeout(TestNodeHealthCheckChore.java:74) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) {code} When I reverted this I got past the test. Does the timeout need to be adjusted or does this need to be moved into a medium test ? was (Author: eclark): I'm seeing TestNodeHealthCheckChore hang (or run slower than the timeout) on trunk on my jenkins box (twice in a row). Here's the code stack that I saw one of the two times: {code} "pool-1-thread-1" prio=10 tid=0x00007f1f106d1800 nid=0x57b1 runnable [0x00007f1f14676000] java.lang.Thread.State: RUNNABLE at java.io.FileInputStream.readBytes(Native Method) at java.io.FileInputStream.read(FileInputStream.java:220) at java.io.BufferedInputStream.read1(BufferedInputStream.java:256) at java.io.BufferedInputStream.read(BufferedInputStream.java:317) - locked <0x00000000fbac8c28> (a java.io.BufferedInputStream) at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:264) at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:306) at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:158) - locked <0x00000000fb8a6e00> (a java.io.InputStreamReader) at java.io.InputStreamReader.read(InputStreamReader.java:167) at java.io.BufferedReader.fill(BufferedReader.java:136) at java.io.BufferedReader.read1(BufferedReader.java:187) at java.io.BufferedReader.read(BufferedReader.java:261) - locked <0x00000000fb8a6e00> (a java.io.InputStreamReader) at org.apache.hadoop.util.Shell$ShellCommandExecutor.parseExecResult(Shell.java:602) at org.apache.hadoop.util.Shell.runCommand(Shell.java:446) at org.apache.hadoop.util.Shell.run(Shell.java:379) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) at org.apache.hadoop.hbase.HealthChecker.checkHealth(HealthChecker.java:76) at org.apache.hadoop.hbase.TestNodeHealthCheckChore.healthCheckerTest(TestNodeHealthCheckChore.java:88) at org.apache.hadoop.hbase.TestNodeHealthCheckChore.testHealthCheckerTimeout(TestNodeHealthCheckChore.java:74) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) {code} When I reverted this I got past the test. Does the timeout need to be adjusted or does this need to be moved into a medium test ? > TestNodeHealthCheckChore#testRSHealthChore: Stoppable must have been stopped > ---------------------------------------------------------------------------- > > Key: HBASE-10103 > URL: https://issues.apache.org/jira/browse/HBASE-10103 > Project: HBase > Issue Type: Bug > Affects Versions: 0.98.0, 0.99.0 > Reporter: Andrew Purtell > Assignee: Andrew Purtell > Fix For: 0.98.0, 0.96.1, 0.99.0 > > Attachments: 10103.patch, 10103.patch, 10103.patch > > > {noformat} > Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 623.639 sec > <<< FAILURE! > testRSHealthChore(org.apache.hadoop.hbase.TestNodeHealthCheckChore) Time > elapsed: 0.001 sec <<< FAILURE! > java.lang.AssertionError: Stoppable must have been stopped. > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.assertTrue(Assert.java:41) > at > org.apache.hadoop.hbase.TestNodeHealthCheckChore.testRSHealthChore(TestNodeHealthCheckChore.java:108) > {noformat} -- This message was sent by Atlassian JIRA (v6.1.4#6159)