[ https://issues.apache.org/jira/browse/HBASE-13732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14570220#comment-14570220 ]
Hudson commented on HBASE-13732: -------------------------------- SUCCESS: Integrated in HBase-1.2 #129 (See [https://builds.apache.org/job/HBase-1.2/129/]) HBASE-13732 TestHBaseFsck#testParallelWithRetriesHbck fails intermittently (Stephen Yuan Jiang, ADDENDUM for failing tests) (enis: rev 4e5535a156e885d1a8400346288abffe5294c869) * hbase-server/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java HBASE-13732 TestHBaseFsck#testParallelWithRetriesHbck fails intermittently (Stephen Yuan Jiang, ADDENDUM for failing tests) (enis: rev 41bfe40cf8cfd603671dc2075397a622c78b41af) * hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java * hbase-server/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java > TestHBaseFsck#testParallelWithRetriesHbck fails intermittently > -------------------------------------------------------------- > > Key: HBASE-13732 > URL: https://issues.apache.org/jira/browse/HBASE-13732 > Project: HBase > Issue Type: Bug > Components: hbck, test > Affects Versions: 2.0.0, 1.1.0, 1.2.0 > Reporter: Stephen Yuan Jiang > Assignee: Stephen Yuan Jiang > Priority: Minor > Fix For: 2.0.0, 1.2.0, 1.1.1 > > Attachments: HBASE-13732-addendum-branch-1.patch, > HBASE-13732-addendum-master.patch, HBASE-13732.patch > > > TestHBaseFsck#testParallelWithRetriesHbck failed intermittently (especially > in Windows environment) with "java.io.IOException: Duplicate hbck - Abort" > {noformat} > java.util.concurrent.ExecutionException: java.io.IOException: Duplicate hbck > - Abort > at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252) > at java.util.concurrent.FutureTask.get(FutureTask.java:111) > at > org.apache.hadoop.hbase.util.TestHBaseFsck.testParallelWithRetriesHbck(TestHBaseFsck.java:644) > Caused by: java.io.IOException: Duplicate hbck - Abort > at org.apache.hadoop.hbase.util.HBaseFsck.connect(HBaseFsck.java:484) > at > org.apache.hadoop.hbase.util.hbck.HbckTestingUtil.doFsck(HbckTestingUtil.java:53) > at > org.apache.hadoop.hbase.util.hbck.HbckTestingUtil.doFsck(HbckTestingUtil.java:43) > at > org.apache.hadoop.hbase.util.hbck.HbckTestingUtil.doFsck(HbckTestingUtil.java:38) > at > org.apache.hadoop.hbase.util.TestHBaseFsck$2RunHbck.call(TestHBaseFsck.java:635) > at > org.apache.hadoop.hbase.util.TestHBaseFsck$2RunHbck.call(TestHBaseFsck.java:628) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > at java.util.concurrent.FutureTask.run(FutureTask.java:166) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:722) > {noformat} > HBASE-13591 tried to address this issue. It did improve the pass rate in > Linux environment (after the fix, I could not repro in my machine); however, > the test still failed intermittently in Windows environment during testing of > 1.1 release. > Looking at the code, it uses the ExponentialBackoffPolicy (starting with > 200ms sleep time after first failed attempt to acquire the lock in ZK, then > 400ms, then 800ms, etc.) in between retries. Therefore, even the first hbck > run completes, the second hbck run would still fail due to long sleep time. > the proposal to fix the problem is to use ExponentialBackoffPolicyWithLimit > and cap the max sleep time to some small number (eg. 5 seconds, it should be > configurable). This would make the test more robust. -- This message was sent by Atlassian JIRA (v6.3.4#6332)