[ 
https://issues.apache.org/jira/browse/HAWQ-347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15104902#comment-15104902
 ] 

ASF GitHub Bot commented on HAWQ-347:
-------------------------------------

GitHub user liming01 opened a pull request:

    https://github.com/apache/incubator-hawq/pull/276

    HAWQ-347. Fixed dead lock between pg_filespace lock and PersistentObj…

    …Lock
    
    In hawq1.x, segment on master node will use local disk, segments on other 
segment nodes will use hdfs. However in hawq2.0 we make all segment use hdfs. 
So now only the pg_system file space use local hdfs.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/liming01/incubator-hawq mli/deadlock_fs_pt

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-hawq/pull/276.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #276
    
----
commit 049ef2b4f3d718983144d1f503c1c657c891a194
Author: Ming LI <m...@pivotal.io>
Date:   2016-01-18T07:59:51Z

    HAWQ-347. Fixed dead lock between pg_filespace lock and PersistentObjLock

----


> Dead Lock Issue when concurrent use catalog and persistent table.
> -----------------------------------------------------------------
>
>                 Key: HAWQ-347
>                 URL: https://issues.apache.org/jira/browse/HAWQ-347
>             Project: Apache HAWQ
>          Issue Type: Bug
>          Components: Transaction
>            Reporter: Dong Li
>            Assignee: Lei Chang
>
> ------
> prepare 
> ------
> {code}
> create tablespace myts filespace dfs_system;
> {code}
> ----------
> Process 1  : use ctrl_c to abort the create transaction
> -----------
> {code}
> create FILESPACE fsinvalid ON hdfs ('localhost:10086/hawq/fsinvalid');
> ctrl_C
> {code}
> ---------
> Process 2 : during the ctrl_c period, drop a tablespace 
> ---------
> {code}
> drop TABLESPACE myts;
> {code}
> ************
> The lock state:
> ****************
> 1.abort (create Filespace)  
> get lock : Persistent table lock
> need lock: pg_filespace lock
> 2.commit(drop tablespace)
> get lock: pg_filespace lock
> need lock:   Persistent table lock
> ***********
> Debug information
> ***********
> 1.debug information for  abort (create Filespace)  process
> (lldb) bt
> * thread #1: tid = 0x9781a3, 0x99364f7a libsystem_kernel.dylib`semop + 10, 
> queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
>   * frame #0: 0x99364f7a libsystem_kernel.dylib`semop + 10
>     frame #1: 0x003eb14a postgres`PGSemaphoreLock(sema=0x18548cf8, 
> interruptOK='\0') + 301 at pg_sema.c:437
>     frame #2: 0x0045972c postgres`LWLockAcquire(lockid=PersistentObjLock, 
> mode=LW_EXCLUSIVE) + 799 at lwlock.c:557
>     frame #3: 0x006cd3a9 
> postgres`PersistentFilespace_Dropped(fsObjName=0x048f6598, 
> persistentTid=0x048f65bc, persistentSerialNum=3) + 411 at 
> cdbpersistentfilespace.c:1079
>     frame #4: 0x006e03b4 
> postgres`PersistentFileSysObj_DropObject(fsObjName=0x048f6598, 
> relStorageMgr=PersistentFileSysRelStorageMgr_None, relationName=0x00000000, 
> persistentTid=0x048f65bc, persistentSerialNum=3, ignoreNonExistence='\x01', 
> debugPrint='\0', debugPrintLevel=14) + 2170 at cdbpersistentfilesysobj.c:2090
>     frame #5: 0x006e04b8 
> postgres`PersistentFileSysObj_EndXactDrop(fsObjName=0x048f6598, 
> relStorageMgr=PersistentFileSysRelStorageMgr_None, relationName=0x00000000, 
> persistentTid=0x048f65bc, persistentSerialNum=3, ignoreNonExistence='\x01') + 
> 113 at cdbpersistentfilesysobj.c:2123
>     frame #6: 0x004603c2 postgres`smgrDoDeleteActions(list=0x00b40844, 
> listCount=0x00b40848, forCommit='\0') + 1444 at smgr.c:1527
>     frame #7: 0x00461258 postgres`AtEOXact_smgr(forCommit='\0') + 226 at 
> smgr.c:2111
>     frame #8: 0x00072abf postgres`AbortTransaction + 774 at xact.c:2867
>     frame #9: 0x0007343b postgres`AbortCurrentTransaction + 183 at xact.c:3289
>     frame #10: 0x0046ade6 postgres`PostgresMain(argc=4, argv=0x0484b480, 
> username=0x0484baa8) + 6294 at postgres.c:4487
>     frame #11: 0x0040c1e9 postgres`BackendRun(port=0x03835c30) + 1008 at 
> postmaster.c:5875
>     frame #12: 0x0040b583 postgres`BackendStartup(port=0x03835c30) + 381 at 
> postmaster.c:5468
>     frame #13: 0x0040463d postgres`ServerLoop + 1111 at postmaster.c:2147
>     frame #14: 0x004033ca postgres`PostmasterMain(argc=9, argv=0x03834620) + 
> 5127 at postmaster.c:1439
>     frame #15: 0x0031aa6b postgres`main(argc=9, argv=0x03834620) + 1007 at 
> main.c:226
>     frame #16: 0x0000201d postgres`_start + 212
>     frame #17: 0x00001f48 postgres`start + 40
> --------------------------------------------
> 2.debug information for commit(drop tablespace) process
> * thread #1: tid = 0x97829a, 0x99364f7a libsystem_kernel.dylib`semop + 10, 
> queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
>   * frame #0: 0x99364f7a libsystem_kernel.dylib`semop + 10
>     frame #1: 0x003eb14a postgres`PGSemaphoreLock(sema=0x185492e0, 
> interruptOK='\x01') + 301 at pg_sema.c:437
>     frame #2: 0x00456615 postgres`ProcSleep(locallock=0x04847cc8, 
> lockMethodTable=0x0097c804) + 736 at proc.c:983
>     frame #3: 0x00451f4b postgres`WaitOnLock(locallock=0x04847cc8, 
> owner=0x0484d15c) + 428 at lock.c:1351
>     frame #4: 0x004515a5 postgres`LockAcquire(locktag=0xbfffe788, lockmode=1, 
> sessionLock='\0', dontWait='\0') + 6297 at lock.c:995
>     frame #5: 0x0044e98f postgres`LockRelationOid(relid=5009, lockmode=1) + 
> 58 at lmgr.c:102
>     frame #6: 0x00039686 postgres`relation_open(relationId=5009, lockmode=1) 
> + 105 at heapam.c:880
>     frame #7: 0x0003a01f postgres`heap_open(relationId=5009, lockmode=1) + 31 
> at heapam.c:1286
>     frame #8: 0x0020444c postgres`is_filespace_shared(fsoid=16384) + 29 at 
> filespace.c:737
>     frame #9: 0x0009b637 
> postgres`emit_mmxlog_fs_record(type=MM_OBJ_TABLESPACE, filespace=16384, 
> tablespace=204983, database=0, relfilenode=0, persistentTid=0x0509564c, 
> persistentSerialNum=2, segnum=0, flags=' ', beginLoc=0xbfffed90) + 475 at 
> xlog_mm.c:498
>     frame #10: 0x0009ba23 
> postgres`mmxlog_log_remove_tablespace(tablespace=204983, 
> persistentTid=0x0509564c, persistentSerialNum=2) + 189 at xlog_mm.c:584
>     frame #11: 0x006dc13a 
> postgres`PersistentTablespace_DroppedVerifiedActionCallback(fsObjName=0x05095628,
>  persistentTid=0x0509564c, persistentSerialNum=2, 
> verifyExpectedResult=PersistentFileSysObjVerifyExpectedResult_StateChangeNeeded)
>  + 97 at cdbpersistenttablespace.c:1079
>     frame #12: 0x006def9a 
> postgres`PersistentFileSysObj_StateChange(fsObjName=0x05095628, 
> persistentTid=0x0509564c, persistentSerialNum=2, 
> nextState=PersistentFileSysState_Free, retryPossible='\0', flushToXLog='\0', 
> oldState=0xbfffef0c, verifiedActionCallback=0x006dc0d9) + 920 at 
> cdbpersistentfilesysobj.c:1432
>     frame #13: 0x006dc485 
> postgres`PersistentTablespace_Dropped(fsObjName=0x05095628, 
> persistentTid=0x0509564c, persistentSerialNum=2) + 771 at 
> cdbpersistenttablespace.c:1142
>     frame #14: 0x006e02da 
> postgres`PersistentFileSysObj_DropObject(fsObjName=0x05095628, 
> relStorageMgr=PersistentFileSysRelStorageMgr_None, relationName=0x00000000, 
> persistentTid=0x0509564c, persistentSerialNum=2, ignoreNonExistence='\0', 
> debugPrint='\0', debugPrintLevel=14) + 1952 at cdbpersistentfilesysobj.c:2064
>     frame #15: 0x006e04b8 
> postgres`PersistentFileSysObj_EndXactDrop(fsObjName=0x05095628, 
> relStorageMgr=PersistentFileSysRelStorageMgr_None, relationName=0x00000000, 
> persistentTid=0x0509564c, persistentSerialNum=2, ignoreNonExistence='\0') + 
> 113 at cdbpersistentfilesysobj.c:2123
>     frame #16: 0x004603c2 postgres`smgrDoDeleteActions(list=0x00b40844, 
> listCount=0x00b40848, forCommit='\x01') + 1444 at smgr.c:1527
>     frame #17: 0x00461258 postgres`AtEOXact_smgr(forCommit='\x01') + 226 at 
> smgr.c:2111
>     frame #18: 0x000720f6 postgres`CommitTransaction + 749 at xact.c:2427
>     frame #19: 0x00072fb6 postgres`CommitTransactionCommand + 310 at 
> xact.c:3056
>     frame #20: 0x004686b8 postgres`finish_xact_command + 116 at 
> postgres.c:3123
>     frame #21: 0x00465b6a postgres`exec_simple_query(query_string=0x0507c42c, 
> seqServerHost=0x00000000, seqServerPort=-1) + 2137 at postgres.c:1780
>     frame #22: 0x0046b46c postgres`PostgresMain(argc=4, argv=0x0484b480, 
> username=0x0484baa8) + 7964 at postgres.c:4711
>     frame #23: 0x0040c1e9 postgres`BackendRun(port=0x03911ea0) + 1008 at 
> postmaster.c:5875
>     frame #24: 0x0040b583 postgres`BackendStartup(port=0x03911ea0) + 381 at 
> postmaster.c:5468
>     frame #25: 0x0040463d postgres`ServerLoop + 1111 at postmaster.c:2147
>     frame #26: 0x004033ca postgres`PostmasterMain(argc=9, argv=0x03834620) + 
> 5127 at postmaster.c:1439
>     frame #27: 0x0031aa6b postgres`main(argc=9, argv=0x03834620) + 1007 at 
> main.c:226
>     frame #28: 0x0000201d postgres`_start + 212
>     frame #29: 0x00001f48 postgres`start + 40
> -----------------------
> Other information
> -----------------------
> intern=# select * from pg_stat_activity ;
>  datid  |  datname   | procpid | sess_id | usesysid | usename |               
>               current_query                              | waiting |          
> query_start          |         backend_start         | client_addr | 
> client_port | application_name |          xact_start           | 
> waiting_resource
> --------+------------+---------+---------+----------+---------+------------------------------------------------------------------------+---------+-------------------------------+-------------------------------+-------------+-------------+------------------+-------------------------------+------------------
>   16387 | postgres   |   83974 |      65 |       10 | intern  | drop 
> TABLESPACE myts;                                                  | t       | 
> 2016-01-15 18:48:18.910256+08 | 2016-01-15 18:47:57.794331+08 |             | 
>          -1 | psql             | 2016-01-15 18:48:18.910256+08 | f
>  205198 | dbindifffs |   83863 |      64 |       10 | intern  | create 
> FILESPACE fsinvalid ON hdfs ('localhost:10086/hawq/fsinvalid'); | f       | 
> 2016-01-15 18:47:26.849302+08 | 2016-01-15 18:47:26.790389+08 |             | 
>          -1 | psql             | 2016-01-15 18:47:26.849302+08 | f
>   16534 | intern     |   89599 |      66 |       10 | intern  | select * from 
> pg_stat_activity ;                                       | f       | 
> 2016-01-15 19:25:14.432401+08 | 2016-01-15 19:13:54.205517+08 |             | 
>          -1 | psql             | 2016-01-15 19:25:14.432401+08 | f
> (3 rows)
> intern=# select * from pg_locks ;
>    locktype    | database | relation | page | tuple | transactionid | classid 
> | objid | objsubid | transaction |  pid  |        mode         | granted | 
> mppsessionid | mppiswriter | gp_segment_id
> ---------------+----------+----------+------+-------+---------------+---------+-------+----------+-------------+-------+---------------------+---------+--------------+-------------+---------------
>  relation      |        0 |     5009 |      |       |               |         
> |       |          |           0 | 83863 | AccessExclusiveLock | t       |    
>        64 | f           |        -10000
>  transactionid |          |          |      |       |          2133 |         
> |       |          |        2133 | 89599 | ExclusiveLock       | t       |    
>        66 | f           |        -10000
>  relation      |        0 |     1213 |      |       |               |         
> |       |          |           0 | 83974 | RowExclusiveLock    | t       |    
>        65 | f           |        -10000
>  transactionid |          |          |      |       |          2118 |         
> |       |          |           0 | 83863 | ExclusiveLock       | t       |    
>        64 | f           |        -10000
>  relation      |        0 |     5009 |      |       |               |         
> |       |          |           0 | 83974 | AccessShareLock     | f       |    
>        65 | f           |        -10000
>  relation      |    16534 |    10336 |      |       |               |         
> |       |          |        2133 | 89599 | AccessShareLock     | t       |    
>        66 | f           |        -10000
>  transactionid |          |          |      |       |          2124 |         
> |       |          |           0 | 83974 | ExclusiveLock       | t       |    
>        65 | f           |        -10000
> (7 rows)
> intern=# select * from pg_class where oid=5009;
>    relname    | relnamespace | reltype | relowner | relam | relfilenode | 
> reltablespace | relpages | reltuples | reltoastrelid | reltoastidxid | 
> relaosegrelid | relaosegidxid | relhasindex | relisshared | relkind | 
> relstorage | relnatts | relchecks | reltriggers | relukeys | relfkeys | 
> relrefs | relhasoids | relhaspkey | relhasrules | relhassubclass | 
> relfrozenxid |   relacl    | reloptions
> --------------+--------------+---------+----------+-------+-------------+---------------+----------+-----------+---------------+---------------+---------------+---------------+-------------+-------------+---------+------------+----------+-----------+-------------+----------+----------+---------+------------+------------+-------------+----------------+--------------+-------------+------------
>  pg_filespace |           11 |    6438 |       10 |     0 |        5009 |     
>      1664 |        1 |         1 |             0 |             0 |            
>  0 |             0 | t           | t           | r       | h          |       
>  4 |         0 |           0 |        0 |        0 |       0 | t          | f 
>          | f           | f              |          903 | {=r/intern} |
> (1 row)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to