[ https://issues.apache.org/jira/browse/HBASE-16144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15356350#comment-15356350 ]
Ted Yu commented on HBASE-16144: -------------------------------- {code} +public class ReplicationZKLockCleanerChore extends ScheduledChore { {code} Add annotation for audience. {code} + String[] array = rsServerNameZnode.split("/"); + String znode = array[array.length - 1]; {code} Should array.length be checked before accessing the array ? {code} + if (s != null && System.currentTimeMillis() - s.getMtime() > TTL) { {code} Use EnvironmentEdge instead. {code} + } catch (InterruptedException e) { + LOG.warn("zk operation interrupted", e); {code} Restore interrupt status. > Replication queue's lock will live forever if RS acquiring the lock has dead > ---------------------------------------------------------------------------- > > Key: HBASE-16144 > URL: https://issues.apache.org/jira/browse/HBASE-16144 > Project: HBase > Issue Type: Bug > Affects Versions: 1.2.1, 1.1.5, 0.98.20 > Reporter: Phil Yang > Assignee: Phil Yang > Attachments: HBASE-16144-v1.patch > > > In default, we will use multi operation when we claimQueues from ZK. But if > we set hbase.zookeeper.useMulti=false, we will add a lock first, then copy > nodes, finally clean old queue and the lock. > However, if the RS acquiring the lock crash before claimQueues done, the lock > will always be there and other RS can never claim the queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)