[ 
https://issues.apache.org/jira/browse/SENTRY-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Na Li updated SENTRY-2491:
--------------------------
    Description: 
In sentry unit tests, we don't create schema before running a test. Instead, we 
use dataNucleus to create sentry tables when they are accessed. This creates 
potential deadlock when running test for Sentry HA setup. 

For example, the following shows the event sequence that causes deadlock
1) thread_1 gets shared lock of SYSTABLES in order to read table 
SENTRY_HMS_NOTIFICATION_ID
2) thread_2 gets shared lock of SYSTABLES in order to read table 
SENTRY_HMS_NOTIFICATION_ID
3) thread_1 tries to get execution lock to create table 
SENTRY_HMS_NOTIFICATION_ID,
   and wait for execution lock because thread_2 got shared lock already.
4) thread_2 tries to get execution lock to create table 
SENTRY_HMS_NOTIFICATION_ID,
   and wait for execution lock because thread_1 got shared lock already.

The solution is to let the instances of sentry service start with delay. 
Specifically,
let HMS follower threads separate as far as possible, i.e., half of the 
interval.
      
This deadlock only exists in unit tests, and does not exist in production 
because schema is created before starting Sentry services. Therefore, there is 
no table creation after service starts.

Example of the log message when such deadlock happens
{code}
2019-01-04 18:32:46,332 (pool-13-thread-1) [INFO - 
org.apache.sentry.hdfs.DBUpdateForwarder.getAllUpdatesFrom(DBUpdateForwarder.java:140)]
 (org.apache.sentry.hdfs.PermImageRetriever) Requested sequence number 0 is 
less than 0 or requested deltas for that sequence number are not available. 
Fetch a full update
2019-01-04 18:32:49,346 (hms-follower) [DEBUG - 
org.apache.sentry.service.thrift.SentryStateBank.enableState(SentryStateBank.java:102)]
 HMSFollower entered state STARTED
2019-01-04 18:32:50,091 (hms-follower) [DEBUG - 
org.apache.sentry.service.thrift.SentryStateBank.enableState(SentryStateBank.java:102)]
 HMSFollower entered state STARTED
2019-01-04 18:33:00,286 (store-cleaner) [ERROR - 
org.datanucleus.util.Log4JLogger.error(Log4JLogger.java:115)] Error thrown 
executing CREATE TABLE SENTRY_HMS_NOTIFICATION_ID
(
    NOTIFICATION_ID BIGINT NOT NULL
) : A lock could not be obtained due to a deadlock, cycle of locks and waiters 
is:
Lock : ROW, SYSTABLES, (1,3)
  Waiting XID : {436, X} , SENTRY, CREATE TABLE SENTRY_HMS_NOTIFICATION_ID
(
    NOTIFICATION_ID BIGINT NOT NULL
)
  Granted XID : {419, S} 
Lock : ROW, SYSTABLES, (1,3)
  Waiting XID : {419, X} , SENTRY, CREATE TABLE SENTRY_HMS_NOTIFICATION_ID
(
    NOTIFICATION_ID BIGINT NOT NULL
)
  Granted XID : {419, S} , {436, S} 
. The selected victim is XID : 436.
java.sql.SQLTransactionRollbackException: A lock could not be obtained due to a 
deadlock, cycle of locks and waiters is:
Lock : ROW, SYSTABLES, (1,3)
  Waiting XID : {436, X} , SENTRY, CREATE TABLE SENTRY_HMS_NOTIFICATION_ID
(
    NOTIFICATION_ID BIGINT NOT NULL
)
  Granted XID : {419, S} 
Lock : ROW, SYSTABLES, (1,3)
  Waiting XID : {419, X} , SENTRY, CREATE TABLE SENTRY_HMS_NOTIFICATION_ID
(
    NOTIFICATION_ID BIGINT NOT NULL
)
  Granted XID : {419, S} , {436, S} 
. The selected victim is XID : 436.
        at 
org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
        at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
Source)
        at 
org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown 
Source)
        at 
org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown 
Source)
        at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown 
Source)
        at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown 
Source)
        at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown 
Source)
        at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
        at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
        at com.jolbox.bonecp.StatementHandle.execute(StatementHandle.java:254)
        at 
org.datanucleus.store.rdbms.table.AbstractTable.executeDdlStatement(AbstractTable.java:879)
        at 
org.datanucleus.store.rdbms.table.AbstractTable.executeDdlStatementList(AbstractTable.java:830)
        at 
org.datanucleus.store.rdbms.table.AbstractTable.create(AbstractTable.java:546)
        at 
org.datanucleus.store.rdbms.table.AbstractTable.exists(AbstractTable.java:609)
        at 
org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.performTablesValidation(RDBMSStoreManager.java:3385)
        at 
org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.run(RDBMSStoreManager.java:2896)
        at 
org.datanucleus.store.rdbms.AbstractSchemaTransaction.execute(AbstractSchemaTransaction.java:119)
        at 
...[truncated 36339941 chars]...
eImpl(RetryClientInvocationHandler.java:94)] Calling getAllUpdatesFrom
{code}

  was:
In sentry unit tests, we don't create schema before running a test. Instead, we 
use dataNucleus to create sentry tables when they are accessed. This creates 
potential deadlock when running test for Sentry HA setup. 

For example, the following shows the event sequence that causes deadlock
1) thread_1 gets shared lock of SYSTABLES in order to read table 
SENTRY_HMS_NOTIFICATION_ID
2) thread_2 gets shared lock of SYSTABLES in order to read table 
SENTRY_HMS_NOTIFICATION_ID
3) thread_1 tries to get execution lock to create table 
SENTRY_HMS_NOTIFICATION_ID,
   and wait for execution lock because thread_2 got shared lock already.
4) thread_2 tries to get execution lock to create table 
SENTRY_HMS_NOTIFICATION_ID,
   and wait for execution lock because thread_1 got shared lock already.

The solution is to let the instances of sentry service start with delay. 
Specifically,
let HMS follower threads separate as far as possible, i.e., half of the 
interval.
      
This deadlock only exists in unit tests, and does not exist in production 
because schema is created before starting Sentry services. Therefore, there is 
no table creation after service starts.


> Sentry High availability unit tests run into deadlock sometimes
> ---------------------------------------------------------------
>
>                 Key: SENTRY-2491
>                 URL: https://issues.apache.org/jira/browse/SENTRY-2491
>             Project: Sentry
>          Issue Type: Bug
>          Components: Sentry
>    Affects Versions: 2.2.0
>            Reporter: Na Li
>            Assignee: Na Li
>            Priority: Major
>         Attachments: SENTRY-2491.001.patch
>
>
> In sentry unit tests, we don't create schema before running a test. Instead, 
> we use dataNucleus to create sentry tables when they are accessed. This 
> creates potential deadlock when running test for Sentry HA setup. 
> For example, the following shows the event sequence that causes deadlock
> 1) thread_1 gets shared lock of SYSTABLES in order to read table 
> SENTRY_HMS_NOTIFICATION_ID
> 2) thread_2 gets shared lock of SYSTABLES in order to read table 
> SENTRY_HMS_NOTIFICATION_ID
> 3) thread_1 tries to get execution lock to create table 
> SENTRY_HMS_NOTIFICATION_ID,
>    and wait for execution lock because thread_2 got shared lock already.
> 4) thread_2 tries to get execution lock to create table 
> SENTRY_HMS_NOTIFICATION_ID,
>    and wait for execution lock because thread_1 got shared lock already.
> The solution is to let the instances of sentry service start with delay. 
> Specifically,
> let HMS follower threads separate as far as possible, i.e., half of the 
> interval.
>       
> This deadlock only exists in unit tests, and does not exist in production 
> because schema is created before starting Sentry services. Therefore, there 
> is no table creation after service starts.
> Example of the log message when such deadlock happens
> {code}
> 2019-01-04 18:32:46,332 (pool-13-thread-1) [INFO - 
> org.apache.sentry.hdfs.DBUpdateForwarder.getAllUpdatesFrom(DBUpdateForwarder.java:140)]
>  (org.apache.sentry.hdfs.PermImageRetriever) Requested sequence number 0 is 
> less than 0 or requested deltas for that sequence number are not available. 
> Fetch a full update
> 2019-01-04 18:32:49,346 (hms-follower) [DEBUG - 
> org.apache.sentry.service.thrift.SentryStateBank.enableState(SentryStateBank.java:102)]
>  HMSFollower entered state STARTED
> 2019-01-04 18:32:50,091 (hms-follower) [DEBUG - 
> org.apache.sentry.service.thrift.SentryStateBank.enableState(SentryStateBank.java:102)]
>  HMSFollower entered state STARTED
> 2019-01-04 18:33:00,286 (store-cleaner) [ERROR - 
> org.datanucleus.util.Log4JLogger.error(Log4JLogger.java:115)] Error thrown 
> executing CREATE TABLE SENTRY_HMS_NOTIFICATION_ID
> (
>     NOTIFICATION_ID BIGINT NOT NULL
> ) : A lock could not be obtained due to a deadlock, cycle of locks and 
> waiters is:
> Lock : ROW, SYSTABLES, (1,3)
>   Waiting XID : {436, X} , SENTRY, CREATE TABLE SENTRY_HMS_NOTIFICATION_ID
> (
>     NOTIFICATION_ID BIGINT NOT NULL
> )
>   Granted XID : {419, S} 
> Lock : ROW, SYSTABLES, (1,3)
>   Waiting XID : {419, X} , SENTRY, CREATE TABLE SENTRY_HMS_NOTIFICATION_ID
> (
>     NOTIFICATION_ID BIGINT NOT NULL
> )
>   Granted XID : {419, S} , {436, S} 
> . The selected victim is XID : 436.
> java.sql.SQLTransactionRollbackException: A lock could not be obtained due to 
> a deadlock, cycle of locks and waiters is:
> Lock : ROW, SYSTABLES, (1,3)
>   Waiting XID : {436, X} , SENTRY, CREATE TABLE SENTRY_HMS_NOTIFICATION_ID
> (
>     NOTIFICATION_ID BIGINT NOT NULL
> )
>   Granted XID : {419, S} 
> Lock : ROW, SYSTABLES, (1,3)
>   Waiting XID : {419, X} , SENTRY, CREATE TABLE SENTRY_HMS_NOTIFICATION_ID
> (
>     NOTIFICATION_ID BIGINT NOT NULL
> )
>   Granted XID : {419, S} , {436, S} 
> . The selected victim is XID : 436.
>       at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
>       at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
> Source)
>       at 
> org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown 
> Source)
>       at 
> org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown 
> Source)
>       at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown 
> Source)
>       at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown 
> Source)
>       at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown 
> Source)
>       at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
>       at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
>       at com.jolbox.bonecp.StatementHandle.execute(StatementHandle.java:254)
>       at 
> org.datanucleus.store.rdbms.table.AbstractTable.executeDdlStatement(AbstractTable.java:879)
>       at 
> org.datanucleus.store.rdbms.table.AbstractTable.executeDdlStatementList(AbstractTable.java:830)
>       at 
> org.datanucleus.store.rdbms.table.AbstractTable.create(AbstractTable.java:546)
>       at 
> org.datanucleus.store.rdbms.table.AbstractTable.exists(AbstractTable.java:609)
>       at 
> org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.performTablesValidation(RDBMSStoreManager.java:3385)
>       at 
> org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.run(RDBMSStoreManager.java:2896)
>       at 
> org.datanucleus.store.rdbms.AbstractSchemaTransaction.execute(AbstractSchemaTransaction.java:119)
>       at 
> ...[truncated 36339941 chars]...
> eImpl(RetryClientInvocationHandler.java:94)] Calling getAllUpdatesFrom
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to