[ https://issues.apache.org/jira/browse/HBASE-13146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345164#comment-14345164 ]
Ted Yu commented on HBASE-13146: -------------------------------- lgtm > Race Condition in ScheduledChore and ChoreService > ------------------------------------------------- > > Key: HBASE-13146 > URL: https://issues.apache.org/jira/browse/HBASE-13146 > Project: HBase > Issue Type: Bug > Components: regionserver > Affects Versions: 2.0.0, 1.1.0 > Reporter: zhangduo > Assignee: zhangduo > Fix For: 2.0.0, 1.1.0 > > Attachments: HBASE-13146.patch > > > Here is my findings when addressing HBASE-13145. > {code:title=ChoreService.java} > public synchronized boolean scheduleChore(ScheduledChore chore) { > ... > ScheduledFuture<?> future = > scheduler.scheduleAtFixedRate(chore, chore.getInitialDelay(), > chore.getPeriod(), > chore.getTimeUnit()); > chore.setChoreServicer(this); > ... > } > {code} > So we schedule the chore first, and then set chore servicer. And for > CompactionChecker, the initialDelay is 0, so it is possible that the chore is > run before we set chore servicer for it. And see this > {code:title=ScheduledChore.java} > public void run() { > ... > else if (stopper.isStopped() || !isScheduled()) { > cancel(false); > cleanup(); > if (LOG.isInfoEnabled()) LOG.info("Chore: " + getName() + " was > stopped"); > } > ... > } > ... > public synchronized boolean isScheduled() { > return choreServicer != null && choreServicer.isChoreScheduled(this); > } > {code} > So it is possible that isScheduled() returns false and we start to cancel the > chore. You can insert a sleep between scheduled chore and set chore servicer, > then you can always get the log ' Chore: CompactionChecker was stopped'. But > it does not always actually cancel the chore because the cancel method's > implementation. > {code:title=ScheduledChore.java} > public synchronized void cancel(boolean mayInterruptIfRunning) { > if (isScheduled()) choreServicer.cancelChore(this, mayInterruptIfRunning); > choreServicer = null; > } > {code} > So if you insert a sleep before cancel(remember to set a larger sleep time > here), then you can make the test always fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)