[ https://issues.apache.org/jira/browse/HBASE-13732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Stephen Yuan Jiang updated HBASE-13732: --------------------------------------- Status: Patch Available (was: Open) > TestHBaseFsck#testParallelWithRetriesHbck fails intermittently > -------------------------------------------------------------- > > Key: HBASE-13732 > URL: https://issues.apache.org/jira/browse/HBASE-13732 > Project: HBase > Issue Type: Bug > Components: hbck, test > Affects Versions: 1.1.0, 2.0.0, 1.2.0 > Reporter: Stephen Yuan Jiang > Assignee: Stephen Yuan Jiang > Priority: Minor > Fix For: 2.0.0, 1.2.0, 1.1.1 > > Attachments: HBASE-13732.patch > > > TestHBaseFsck#testParallelWithRetriesHbck failed intermittently (especially > in Windows environment) with "java.io.IOException: Duplicate hbck - Abort" > {noformat} > java.util.concurrent.ExecutionException: java.io.IOException: Duplicate hbck > - Abort > at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252) > at java.util.concurrent.FutureTask.get(FutureTask.java:111) > at > org.apache.hadoop.hbase.util.TestHBaseFsck.testParallelWithRetriesHbck(TestHBaseFsck.java:644) > Caused by: java.io.IOException: Duplicate hbck - Abort > at org.apache.hadoop.hbase.util.HBaseFsck.connect(HBaseFsck.java:484) > at > org.apache.hadoop.hbase.util.hbck.HbckTestingUtil.doFsck(HbckTestingUtil.java:53) > at > org.apache.hadoop.hbase.util.hbck.HbckTestingUtil.doFsck(HbckTestingUtil.java:43) > at > org.apache.hadoop.hbase.util.hbck.HbckTestingUtil.doFsck(HbckTestingUtil.java:38) > at > org.apache.hadoop.hbase.util.TestHBaseFsck$2RunHbck.call(TestHBaseFsck.java:635) > at > org.apache.hadoop.hbase.util.TestHBaseFsck$2RunHbck.call(TestHBaseFsck.java:628) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > at java.util.concurrent.FutureTask.run(FutureTask.java:166) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:722) > {noformat} > HBASE-13591 tried to address this issue. It did improve the pass rate in > Linux environment (after the fix, I could not repro in my machine); however, > the test still failed intermittently in Windows environment during testing of > 1.1 release. > Looking at the code, it uses the ExponentialBackoffPolicy (starting with > 200ms sleep time after first failed attempt to acquire the lock in ZK, then > 400ms, then 800ms, etc.) in between retries. Therefore, even the first hbck > run completes, the second hbck run would still fail due to long sleep time. > the proposal to fix the problem is to use ExponentialBackoffPolicyWithLimit > and cap the max sleep time to some small number (eg. 5 seconds, it should be > configurable). This would make the test more robust. -- This message was sent by Atlassian JIRA (v6.3.4#6332)