At the end of SetupLockInTable(), there is a check for the "lock already held" error. Because the nRequested and requested[lockmode] value of a lock is bumped before "lock already held" error, and there is no way to reduce them later for this situation, then it will keep the inconsistency in lock structure until cluster restart or reset.
The inconsistency is: * nRequested will never reduce to zero, the lock will never be garbage-collected * if there is a waitMask for this lock, the waitMast will never be removed, then new proc will be blocked to wait for a lock with zero holder (looks weird in the pg_locks table) I think moving the "lock already held" error before the bump operation of nRequested and requested[lockmode] value in SetupLockInTable() will fix it. (maybe also fix the lock_twophase_recover() function) To recreate the inconsistency: 1. create a backend 1 to lock table a, keep it idle in transaction 2. terminate backend 1 and hack it to skip the LockReleaseAll() function 3. create another backend 2 to lock table a, it will wait for the lock to release 4. reuse the backend 1 (reuse the same proc) to lock table a again, it will trigger the "lock already held" error 5. quit both backend 1 and 2 6. create backend 3 to lock table a, it will wait for the lock's waitMask 7. check the pg_locks table -- GaoZengqi pgf...@gmail.com zengqi...@gmail.com