[
https://issues.apache.org/jira/browse/SENTRY-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Na Li updated SENTRY-2491:
--------------------------
Attachment: SENTRY-2491.002.patch
> Sentry High availability unit tests run into deadlock sometimes
> ---------------------------------------------------------------
>
> Key: SENTRY-2491
> URL: https://issues.apache.org/jira/browse/SENTRY-2491
> Project: Sentry
> Issue Type: Bug
> Components: Sentry
> Affects Versions: 2.2.0
> Reporter: Na Li
> Assignee: Na Li
> Priority: Major
> Attachments: SENTRY-2491.001.patch, SENTRY-2491.002.patch
>
>
> In sentry unit tests, we don't create schema before running a test. Instead,
> we use dataNucleus to create sentry tables when they are accessed. This
> creates potential deadlock when running test for Sentry HA setup.
> For example, the following shows the event sequence that causes deadlock
> 1) thread_1 gets shared lock of SYSTABLES in order to read table
> SENTRY_HMS_NOTIFICATION_ID
> 2) thread_2 gets shared lock of SYSTABLES in order to read table
> SENTRY_HMS_NOTIFICATION_ID
> 3) thread_1 tries to get execution lock to create table
> SENTRY_HMS_NOTIFICATION_ID,
> and wait for execution lock because thread_2 got shared lock already.
> 4) thread_2 tries to get execution lock to create table
> SENTRY_HMS_NOTIFICATION_ID,
> and wait for execution lock because thread_1 got shared lock already.
> The solution is to let the instances of sentry service start with delay.
> Specifically,
> let HMS follower threads separate as far as possible, i.e., half of the
> interval.
>
> This deadlock only exists in unit tests, and does not exist in production
> because schema is created before starting Sentry services. Therefore, there
> is no table creation after service starts.
> Example of the log message when such deadlock happens
> {code}
> 2019-01-04 18:32:46,332 (pool-13-thread-1) [INFO -
> org.apache.sentry.hdfs.DBUpdateForwarder.getAllUpdatesFrom(DBUpdateForwarder.java:140)]
> (org.apache.sentry.hdfs.PermImageRetriever) Requested sequence number 0 is
> less than 0 or requested deltas for that sequence number are not available.
> Fetch a full update
> 2019-01-04 18:32:49,346 (hms-follower) [DEBUG -
> org.apache.sentry.service.thrift.SentryStateBank.enableState(SentryStateBank.java:102)]
> HMSFollower entered state STARTED
> 2019-01-04 18:32:50,091 (hms-follower) [DEBUG -
> org.apache.sentry.service.thrift.SentryStateBank.enableState(SentryStateBank.java:102)]
> HMSFollower entered state STARTED
> 2019-01-04 18:33:00,286 (store-cleaner) [ERROR -
> org.datanucleus.util.Log4JLogger.error(Log4JLogger.java:115)] Error thrown
> executing CREATE TABLE SENTRY_HMS_NOTIFICATION_ID
> (
> NOTIFICATION_ID BIGINT NOT NULL
> ) : A lock could not be obtained due to a deadlock, cycle of locks and
> waiters is:
> Lock : ROW, SYSTABLES, (1,3)
> Waiting XID : {436, X} , SENTRY, CREATE TABLE SENTRY_HMS_NOTIFICATION_ID
> (
> NOTIFICATION_ID BIGINT NOT NULL
> )
> Granted XID : {419, S}
> Lock : ROW, SYSTABLES, (1,3)
> Waiting XID : {419, X} , SENTRY, CREATE TABLE SENTRY_HMS_NOTIFICATION_ID
> (
> NOTIFICATION_ID BIGINT NOT NULL
> )
> Granted XID : {419, S} , {436, S}
> . The selected victim is XID : 436.
> java.sql.SQLTransactionRollbackException: A lock could not be obtained due to
> a deadlock, cycle of locks and waiters is:
> Lock : ROW, SYSTABLES, (1,3)
> Waiting XID : {436, X} , SENTRY, CREATE TABLE SENTRY_HMS_NOTIFICATION_ID
> (
> NOTIFICATION_ID BIGINT NOT NULL
> )
> Granted XID : {419, S}
> Lock : ROW, SYSTABLES, (1,3)
> Waiting XID : {419, X} , SENTRY, CREATE TABLE SENTRY_HMS_NOTIFICATION_ID
> (
> NOTIFICATION_ID BIGINT NOT NULL
> )
> Granted XID : {419, S} , {436, S}
> . The selected victim is XID : 436.
> at
> org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
> at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown
> Source)
> at
> org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown
> Source)
> at
> org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown
> Source)
> at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown
> Source)
> at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown
> Source)
> at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown
> Source)
> at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
> at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
> at com.jolbox.bonecp.StatementHandle.execute(StatementHandle.java:254)
> at
> org.datanucleus.store.rdbms.table.AbstractTable.executeDdlStatement(AbstractTable.java:879)
> at
> org.datanucleus.store.rdbms.table.AbstractTable.executeDdlStatementList(AbstractTable.java:830)
> at
> org.datanucleus.store.rdbms.table.AbstractTable.create(AbstractTable.java:546)
> at
> org.datanucleus.store.rdbms.table.AbstractTable.exists(AbstractTable.java:609)
> at
> org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.performTablesValidation(RDBMSStoreManager.java:3385)
> at
> org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.run(RDBMSStoreManager.java:2896)
> at
> org.datanucleus.store.rdbms.AbstractSchemaTransaction.execute(AbstractSchemaTransaction.java:119)
> at
> ...[truncated 36339941 chars]...
> eImpl(RetryClientInvocationHandler.java:94)] Calling getAllUpdatesFrom
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)