[ https://issues.apache.org/jira/browse/SENTRY-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16754233#comment-16754233 ]
Hadoop QA commented on SENTRY-2491: ----------------------------------- Here are the results of testing the latest attachment https://issues.apache.org/jira/secure/attachment/12956584/SENTRY-2491.002.patch against master. {color:red}Overall:{color} -1 due to an error {color:red}ERROR:{color} mvn test exited 1 Console output: https://builds.apache.org/job/PreCommit-SENTRY-Build/4342/console This message is automatically generated. > Sentry High availability unit tests run into deadlock sometimes > --------------------------------------------------------------- > > Key: SENTRY-2491 > URL: https://issues.apache.org/jira/browse/SENTRY-2491 > Project: Sentry > Issue Type: Bug > Components: Sentry > Affects Versions: 2.2.0 > Reporter: Na Li > Assignee: Na Li > Priority: Major > Attachments: SENTRY-2491.001.patch, SENTRY-2491.002.patch, > SENTRY-2491.002.patch, SENTRY-2491.002.patch, SENTRY-2491.002.patch, > SENTRY-2491.003.patch > > > In sentry unit tests, we don't create schema before running a test. Instead, > we use dataNucleus to create sentry tables when they are accessed. This > creates potential deadlock when running test for Sentry HA setup. > For example, the following shows the event sequence that causes deadlock > 1) thread_1 gets shared lock of SYSTABLES in order to read table > SENTRY_HMS_NOTIFICATION_ID > 2) thread_2 gets shared lock of SYSTABLES in order to read table > SENTRY_HMS_NOTIFICATION_ID > 3) thread_1 tries to get execution lock to create table > SENTRY_HMS_NOTIFICATION_ID, > and wait for execution lock because thread_2 got shared lock already. > 4) thread_2 tries to get execution lock to create table > SENTRY_HMS_NOTIFICATION_ID, > and wait for execution lock because thread_1 got shared lock already. > The solution is to let the instances of sentry service start with delay. > Specifically, > let HMS follower threads separate as far as possible, i.e., half of the > interval. > > This deadlock only exists in unit tests, and does not exist in production > because schema is created before starting Sentry services. Therefore, there > is no table creation after service starts. > Example of the log message when such deadlock happens > {code} > 2019-01-04 18:32:46,332 (pool-13-thread-1) [INFO - > org.apache.sentry.hdfs.DBUpdateForwarder.getAllUpdatesFrom(DBUpdateForwarder.java:140)] > (org.apache.sentry.hdfs.PermImageRetriever) Requested sequence number 0 is > less than 0 or requested deltas for that sequence number are not available. > Fetch a full update > 2019-01-04 18:32:49,346 (hms-follower) [DEBUG - > org.apache.sentry.service.thrift.SentryStateBank.enableState(SentryStateBank.java:102)] > HMSFollower entered state STARTED > 2019-01-04 18:32:50,091 (hms-follower) [DEBUG - > org.apache.sentry.service.thrift.SentryStateBank.enableState(SentryStateBank.java:102)] > HMSFollower entered state STARTED > 2019-01-04 18:33:00,286 (store-cleaner) [ERROR - > org.datanucleus.util.Log4JLogger.error(Log4JLogger.java:115)] Error thrown > executing CREATE TABLE SENTRY_HMS_NOTIFICATION_ID > ( > NOTIFICATION_ID BIGINT NOT NULL > ) : A lock could not be obtained due to a deadlock, cycle of locks and > waiters is: > Lock : ROW, SYSTABLES, (1,3) > Waiting XID : {436, X} , SENTRY, CREATE TABLE SENTRY_HMS_NOTIFICATION_ID > ( > NOTIFICATION_ID BIGINT NOT NULL > ) > Granted XID : {419, S} > Lock : ROW, SYSTABLES, (1,3) > Waiting XID : {419, X} , SENTRY, CREATE TABLE SENTRY_HMS_NOTIFICATION_ID > ( > NOTIFICATION_ID BIGINT NOT NULL > ) > Granted XID : {419, S} , {436, S} > . The selected victim is XID : 436. > java.sql.SQLTransactionRollbackException: A lock could not be obtained due to > a deadlock, cycle of locks and waiters is: > Lock : ROW, SYSTABLES, (1,3) > Waiting XID : {436, X} , SENTRY, CREATE TABLE SENTRY_HMS_NOTIFICATION_ID > ( > NOTIFICATION_ID BIGINT NOT NULL > ) > Granted XID : {419, S} > Lock : ROW, SYSTABLES, (1,3) > Waiting XID : {419, X} , SENTRY, CREATE TABLE SENTRY_HMS_NOTIFICATION_ID > ( > NOTIFICATION_ID BIGINT NOT NULL > ) > Granted XID : {419, S} , {436, S} > . The selected victim is XID : 436. > at > org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source) > at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown > Source) > at > org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown > Source) > at > org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown > Source) > at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown > Source) > at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown > Source) > at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown > Source) > at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source) > at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source) > at com.jolbox.bonecp.StatementHandle.execute(StatementHandle.java:254) > at > org.datanucleus.store.rdbms.table.AbstractTable.executeDdlStatement(AbstractTable.java:879) > at > org.datanucleus.store.rdbms.table.AbstractTable.executeDdlStatementList(AbstractTable.java:830) > at > org.datanucleus.store.rdbms.table.AbstractTable.create(AbstractTable.java:546) > at > org.datanucleus.store.rdbms.table.AbstractTable.exists(AbstractTable.java:609) > at > org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.performTablesValidation(RDBMSStoreManager.java:3385) > at > org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.run(RDBMSStoreManager.java:2896) > at > org.datanucleus.store.rdbms.AbstractSchemaTransaction.execute(AbstractSchemaTransaction.java:119) > at > ...[truncated 36339941 chars]... > eImpl(RetryClientInvocationHandler.java:94)] Calling getAllUpdatesFrom > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)