[GitHub] hive pull request #453: Fix the leak of lock during concurrent partition dro...

guangyy Wed, 24 Oct 2018 15:38:51 -0700

GitHub user guangyy opened a pull request:

    https://github.com/apache/hive/pull/453


    Fix the leak of lock during concurrent partition drop

    We have seen a leaked lock on hive metastore DB which caused all
    PARTITION insertion failed on timeout waiting for lock until the
    metastore service is restarted.
    
    A transaction dump on the DB shows there is a thread that is Sleep which
    potentiall holds the the lock, like:
      trx_id: 33603171058
                     trx_state: RUNNING
                   trx_started: 2018-10-23 06:43:22
         trx_requested_lock_id: NULL
              trx_wait_started: NULL
                    trx_weight: 70298
           trx_mysql_thread_id: 275402202
                     trx_query: NULL
           trx_operation_state: NULL
             trx_tables_in_use: 0
             trx_tables_locked: 0
              trx_lock_structs: 21286
         trx_lock_memory_bytes: 2881064
               trx_rows_locked: 98810
             trx_rows_modified: 49012
       trx_concurrency_tickets: 0
           trx_isolation_level: READ COMMITTED
             trx_unique_checks: 1
        trx_foreign_key_checks: 1
    trx_last_foreign_key_error: NULL
     trx_adaptive_hash_latched: 0
     trx_adaptive_hash_timeout: 0
              trx_is_read_only: 0
    trx_autocommit_non_locking: 0
                            ID: 275402202
                          USER: metastore_gold
                          HOST: 10.37.182.82:36684
                            DB: metastoregold
                       COMMAND: Sleep
                          TIME: 1
                         STATE:
                          INFO: NULL
                      duration: 1316
    
    Given the HOST ip, we trace back to the hive metastore instance and found 
the following exceptions:
    2018-10-23 06:43:22,805 WARN DataNucleus.Persistence: Exception thrown by 
StateManager.isLoaded
    No such database row
    org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database 
row
            at 
org.datanucleus.store.rdbms.request.FetchRequest.execute(FetchRequest.java:357)
            at 
org.datanucleus.store.rdbms.RDBMSPersistenceHandler.fetchObject(RDBMSPersistenceHandler.java:324)
            at 
org.datanucleus.state.AbstractStateManager.loadFieldsFromDatastore(AbstractStateManager.java:1120)
            at 
org.datanucleus.state.JDOStateManager.loadSpecifiedFields(JDOStateManager.java:2916)
            at 
org.datanucleus.state.JDOStateManager.isLoaded(JDOStateManager.java:3219)
    
    The problem is that the caller expects a NULL if the partition does not 
exist, however, the convertToPart function would throw
    an exception which lead to the leak.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/guangyy/hive guang--fix-db-lock-leak

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/hive/pull/453.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #453
    
----
commit 7f6ace1146c32b7e6b8f175cc9c18489119c7613
Author: Guang Yang <guang.yang@...>
Date:   2018-10-24T22:28:09Z

    Fix the leak of lock during concurrent partition drop
    
    We have seen a leaked lock on hive metastore DB which caused all
    PARTITION insertion failed on timeout waiting for lock until the
    metastore service is restarted.
    
    A transaction dump on the DB shows there is a thread that is Sleep which
    potentiall holds the the lock, like:
      trx_id: 33603171058
                     trx_state: RUNNING
                   trx_started: 2018-10-23 06:43:22
         trx_requested_lock_id: NULL
              trx_wait_started: NULL
                    trx_weight: 70298
           trx_mysql_thread_id: 275402202
                     trx_query: NULL
           trx_operation_state: NULL
             trx_tables_in_use: 0
             trx_tables_locked: 0
              trx_lock_structs: 21286
         trx_lock_memory_bytes: 2881064
               trx_rows_locked: 98810
             trx_rows_modified: 49012
       trx_concurrency_tickets: 0
           trx_isolation_level: READ COMMITTED
             trx_unique_checks: 1
        trx_foreign_key_checks: 1
    trx_last_foreign_key_error: NULL
     trx_adaptive_hash_latched: 0
     trx_adaptive_hash_timeout: 0
              trx_is_read_only: 0
    trx_autocommit_non_locking: 0
                            ID: 275402202
                          USER: metastore_gold
                          HOST: 10.37.182.82:36684
                            DB: metastoregold
                       COMMAND: Sleep
                          TIME: 1
                         STATE:
                          INFO: NULL
                      duration: 1316
    
    Given the HOST ip, we trace back to the hive metastore instance and found 
the following exceptions:
    2018-10-23 06:43:22,805 WARN DataNucleus.Persistence: Exception thrown by 
StateManager.isLoaded
    No such database row
    org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database 
row
            at 
org.datanucleus.store.rdbms.request.FetchRequest.execute(FetchRequest.java:357)
            at 
org.datanucleus.store.rdbms.RDBMSPersistenceHandler.fetchObject(RDBMSPersistenceHandler.java:324)
            at 
org.datanucleus.state.AbstractStateManager.loadFieldsFromDatastore(AbstractStateManager.java:1120)
            at 
org.datanucleus.state.JDOStateManager.loadSpecifiedFields(JDOStateManager.java:2916)
            at 
org.datanucleus.state.JDOStateManager.isLoaded(JDOStateManager.java:3219)
    
    The problem is that the caller expects a NULL if the partition does not 
exist, however, the convertToPart function would throw
    an exception which lead to the leak.

----


---

[GitHub] hive pull request #453: Fix the leak of lock during concurrent partition dro...

Reply via email to