[ https://issues.apache.org/jira/browse/HBASE-20414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ted Yu updated HBASE-20414: --------------------------- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) Thanks for the review, Chia-ping and Ram. > TestLockProcedure#testMultipleLocks may fail on slow machine > ------------------------------------------------------------ > > Key: HBASE-20414 > URL: https://issues.apache.org/jira/browse/HBASE-20414 > Project: HBase > Issue Type: Test > Reporter: Ted Yu > Assignee: Ted Yu > Priority: Major > Fix For: 3.0.0 > > Attachments: 20414.v1.txt, 20414.v2.txt > > > Here was recent failure : > https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/172/testReport/junit/org.apache.hadoop.hbase.master.locking/TestLockProcedure/health_checks___yetus_jdk8_hadoop2_checks___testMultipleLocks/ > {code} > java.lang.AssertionError: expected:<LOCKED> but was:<UNLOCKED> > at > org.apache.hadoop.hbase.master.locking.TestLockProcedure.sendHeartbeatAndCheckLocked(TestLockProcedure.java:221) > at > org.apache.hadoop.hbase.master.locking.TestLockProcedure.testMultipleLocks(TestLockProcedure.java:311) > {code} > In the test output, we can see this: > {code} > 2018-04-13 20:19:54,230 DEBUG [Time-limited test] > locking.TestLockProcedure(225): Proc id 22 : LOCKED. > ... > 2018-04-13 20:19:55,529 DEBUG [Time-limited test] > procedure2.ProcedureExecutor(865): Stored pid=26, state=RUNNABLE; > org.apache.hadoop.hbase.master.locking.LockProcedure > regions=a7f9adfd047350eabb199a39564ba4db,c1e297609590b707233a2f9c8bb51fa1, > type=EXCLUSIVE > 2018-04-13 20:19:56,230 DEBUG [ProcExecTimeout] locking.LockProcedure(207): > Timeout failure ProcedureEvent for pid=22, state=WAITING_TIMEOUT; > org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace, > type=EXCLUSIVE, ready=false, [pid=22, state=WAITING_TIMEOUT; > org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace, > type=EXCLUSIVE] > {code} > After the pid=26 log, the code does this (1 second wait): > {code} > // Assert tables & region locks are waiting because of namespace lock. > Thread.sleep(HEARTBEAT_TIMEOUT / 2); > {code} > On a slow machine (in the case above), there was only 730 msec between the > queueing of regionsLock2 and the WAITING_TIMEOUT state of the nsLock. The 1 > second wait was too long, leading to assertion failure. -- This message was sent by Atlassian JIRA (v7.6.3#76005)