[ https://issues.apache.org/jira/browse/SENTRY-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16441571#comment-16441571 ]
Na Li commented on SENTRY-2203: ------------------------------- Log message shows the call stack when releasing the leader lock failed. It also shows that the reason of the failure is because CuratorFrameworkImpl was not in start state {code} 2018-04-08 04:34:31,760 INFO sentry.org.apache.curator.framework.imps.CuratorFrameworkImpl: backgroundOperationsLoop exiting <-- CuratorFrameworkImpl is closed 2018-04-08 04:34:31,762 INFO org.apache.sentry.provider.db.service.persistent.LeaderStatusMonitor: LeaderStatusMonitor: interrupted 2018-04-08 04:34:31,762 INFO org.apache.sentry.service.thrift.SentryService: Attempting to stop sentry thrift service... 2018-04-08 04:34:31,762 INFO org.apache.sentry.provider.db.service.persistent.LeaderStatusMonitor: LeaderStatusMonitor: becoming standby 2018-04-08 04:34:31,762 INFO org.apache.sentry.service.thrift.SentryService: Attempting to stop sentry web service... 2018-04-08 04:34:31,762 ERROR sentry.org.apache.curator.framework.recipes.leader.LeaderSelector: The leader threw an exception java.lang.IllegalStateException: instance must be started before calling this method at com.google.common.base.Preconditions.checkState(Preconditions.java:145) <-- CuratorFrameworkImpl is not in started state at sentry.org.apache.curator.framework.imps.CuratorFrameworkImpl.delete(CuratorFrameworkImpl.java:359) at sentry.org.apache.curator.framework.recipes.locks.LockInternals.deleteOurPath(LockInternals.java:339) at sentry.org.apache.curator.framework.recipes.locks.LockInternals.releaseLock(LockInternals.java:123) at sentry.org.apache.curator.framework.recipes.locks.InterProcessMutex.release(InterProcessMutex.java:154) at sentry.org.apache.curator.framework.recipes.leader.LeaderSelector.doWork(LeaderSelector.java:427) at sentry.org.apache.curator.framework.recipes.leader.LeaderSelector.doWorkLoop(LeaderSelector.java:444) at sentry.org.apache.curator.framework.recipes.leader.LeaderSelector.access$100(LeaderSelector.java:64) at sentry.org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:245) at sentry.org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:239) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {code} d) CuratorFrameworkImpl code {code} public DeleteBuilder delete() { Preconditions.checkState(this.getState() == CuratorFrameworkState.STARTED, "instance must be started before calling this method"); return new DeleteBuilderImpl(this); } {code} > Leader Lock is not released when Sentry service shuts down > ---------------------------------------------------------- > > Key: SENTRY-2203 > URL: https://issues.apache.org/jira/browse/SENTRY-2203 > Project: Sentry > Issue Type: Bug > Components: Sentry > Affects Versions: 2.1.0 > Reporter: Na Li > Assignee: Na Li > Priority: Critical > Attachments: SENTRY-2203.001.patch > > > In our testing for sentry HA, we found after restarting sentry service > without restarting zookeeper service, it is possible that none of sentry > servers is elected as leader to sync with HMS. > What happened was > 1) When a leader is elected, the sentry server host holds the leader lock. > The lock is identified by the mutexPath. All sentry servers in a cluster use > the same mutexPath. > 2) When sentry service is shutdown, the HAContext is shutdown, so its > contained CuratorFrameworkImpl was shutdown, but the leader lock was still > hold by the sentry server host > 3) When the Interruption signal from shutdown caused the leader election > thread to be interrupted, releasing the leader lock failed because > CuratorFrameworkImpl was not in started state. > 4) When sentry server restarts, acquiring the leader lock failed because it > was not released. So no active sentry servers is leader. > 5) If releasing leader lock happened before CuratorFrameworkImpl was > shutdown, this issue won't happen. If restarting zookeeper after sentry > service restart, this issue won't happen. > To fix this issue, > Sentry LeaderStatusMonitor can deactivate the leader to release the leader > lock when it is closed, so the leader lock can be guaranteed to release > before CuratorFrameworkImpl is shutdown. -- This message was sent by Atlassian JIRA (v7.6.3#76005)