[ 
https://issues.apache.org/jira/browse/SENTRY-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16754059#comment-16754059
 ] 

Na Li commented on SENTRY-2491:
-------------------------------

The failure was not due to test failure. Instead it is because failed to 
execute maven plugin. It is not related to my code change. Re-attach the patch 
again 
{code}
03:56:24 [ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:2.20.1:test (default-test) on 
project sentry-tests-hive: There was a timeout or other error in the fork -> 
[Help 1]
03:56:24 [ERROR] 
03:56:24 [ERROR] To see the full stack trace of the errors, re-run Maven with 
the -e switch.
03:56:24 [ERROR] Re-run Maven using the -X switch to enable full debug logging.
03:56:24 [ERROR] 
03:56:24 [ERROR] For more information about the errors and possible solutions, 
please read the following articles:
03:56:24 [ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
03:56:24 [ERROR] 
03:56:24 [ERROR] After correcting the problems, you can resume the build with 
the command
03:56:24 [ERROR]   mvn <goals> -rf :sentry-tests-hive
{code}

> Sentry High availability unit tests run into deadlock sometimes
> ---------------------------------------------------------------
>
>                 Key: SENTRY-2491
>                 URL: https://issues.apache.org/jira/browse/SENTRY-2491
>             Project: Sentry
>          Issue Type: Bug
>          Components: Sentry
>    Affects Versions: 2.2.0
>            Reporter: Na Li
>            Assignee: Na Li
>            Priority: Major
>         Attachments: SENTRY-2491.001.patch, SENTRY-2491.002.patch, 
> SENTRY-2491.002.patch, SENTRY-2491.002.patch, SENTRY-2491.002.patch
>
>
> In sentry unit tests, we don't create schema before running a test. Instead, 
> we use dataNucleus to create sentry tables when they are accessed. This 
> creates potential deadlock when running test for Sentry HA setup. 
> For example, the following shows the event sequence that causes deadlock
> 1) thread_1 gets shared lock of SYSTABLES in order to read table 
> SENTRY_HMS_NOTIFICATION_ID
> 2) thread_2 gets shared lock of SYSTABLES in order to read table 
> SENTRY_HMS_NOTIFICATION_ID
> 3) thread_1 tries to get execution lock to create table 
> SENTRY_HMS_NOTIFICATION_ID,
>    and wait for execution lock because thread_2 got shared lock already.
> 4) thread_2 tries to get execution lock to create table 
> SENTRY_HMS_NOTIFICATION_ID,
>    and wait for execution lock because thread_1 got shared lock already.
> The solution is to let the instances of sentry service start with delay. 
> Specifically,
> let HMS follower threads separate as far as possible, i.e., half of the 
> interval.
>       
> This deadlock only exists in unit tests, and does not exist in production 
> because schema is created before starting Sentry services. Therefore, there 
> is no table creation after service starts.
> Example of the log message when such deadlock happens
> {code}
> 2019-01-04 18:32:46,332 (pool-13-thread-1) [INFO - 
> org.apache.sentry.hdfs.DBUpdateForwarder.getAllUpdatesFrom(DBUpdateForwarder.java:140)]
>  (org.apache.sentry.hdfs.PermImageRetriever) Requested sequence number 0 is 
> less than 0 or requested deltas for that sequence number are not available. 
> Fetch a full update
> 2019-01-04 18:32:49,346 (hms-follower) [DEBUG - 
> org.apache.sentry.service.thrift.SentryStateBank.enableState(SentryStateBank.java:102)]
>  HMSFollower entered state STARTED
> 2019-01-04 18:32:50,091 (hms-follower) [DEBUG - 
> org.apache.sentry.service.thrift.SentryStateBank.enableState(SentryStateBank.java:102)]
>  HMSFollower entered state STARTED
> 2019-01-04 18:33:00,286 (store-cleaner) [ERROR - 
> org.datanucleus.util.Log4JLogger.error(Log4JLogger.java:115)] Error thrown 
> executing CREATE TABLE SENTRY_HMS_NOTIFICATION_ID
> (
>     NOTIFICATION_ID BIGINT NOT NULL
> ) : A lock could not be obtained due to a deadlock, cycle of locks and 
> waiters is:
> Lock : ROW, SYSTABLES, (1,3)
>   Waiting XID : {436, X} , SENTRY, CREATE TABLE SENTRY_HMS_NOTIFICATION_ID
> (
>     NOTIFICATION_ID BIGINT NOT NULL
> )
>   Granted XID : {419, S} 
> Lock : ROW, SYSTABLES, (1,3)
>   Waiting XID : {419, X} , SENTRY, CREATE TABLE SENTRY_HMS_NOTIFICATION_ID
> (
>     NOTIFICATION_ID BIGINT NOT NULL
> )
>   Granted XID : {419, S} , {436, S} 
> . The selected victim is XID : 436.
> java.sql.SQLTransactionRollbackException: A lock could not be obtained due to 
> a deadlock, cycle of locks and waiters is:
> Lock : ROW, SYSTABLES, (1,3)
>   Waiting XID : {436, X} , SENTRY, CREATE TABLE SENTRY_HMS_NOTIFICATION_ID
> (
>     NOTIFICATION_ID BIGINT NOT NULL
> )
>   Granted XID : {419, S} 
> Lock : ROW, SYSTABLES, (1,3)
>   Waiting XID : {419, X} , SENTRY, CREATE TABLE SENTRY_HMS_NOTIFICATION_ID
> (
>     NOTIFICATION_ID BIGINT NOT NULL
> )
>   Granted XID : {419, S} , {436, S} 
> . The selected victim is XID : 436.
>       at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
>       at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
> Source)
>       at 
> org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown 
> Source)
>       at 
> org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown 
> Source)
>       at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown 
> Source)
>       at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown 
> Source)
>       at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown 
> Source)
>       at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
>       at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
>       at com.jolbox.bonecp.StatementHandle.execute(StatementHandle.java:254)
>       at 
> org.datanucleus.store.rdbms.table.AbstractTable.executeDdlStatement(AbstractTable.java:879)
>       at 
> org.datanucleus.store.rdbms.table.AbstractTable.executeDdlStatementList(AbstractTable.java:830)
>       at 
> org.datanucleus.store.rdbms.table.AbstractTable.create(AbstractTable.java:546)
>       at 
> org.datanucleus.store.rdbms.table.AbstractTable.exists(AbstractTable.java:609)
>       at 
> org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.performTablesValidation(RDBMSStoreManager.java:3385)
>       at 
> org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.run(RDBMSStoreManager.java:2896)
>       at 
> org.datanucleus.store.rdbms.AbstractSchemaTransaction.execute(AbstractSchemaTransaction.java:119)
>       at 
> ...[truncated 36339941 chars]...
> eImpl(RetryClientInvocationHandler.java:94)] Calling getAllUpdatesFrom
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to