I've tracked down the real root cause. The fix is very simple. Please check the attached one-liner patch.

The cause is that the temporary relations are truncated unconditionally regardless of whether they are accessed in the transaction or not. That is, the following sequence of steps result in the hang:

1. A session creates a temporary table with ON COMMIT DELETE ROWS. It adds the temporary table to the list of relations that should be truncated at transaction commit.

2. The session receives a sinval catchup signal (SIGUSR1) from another session. It starts a transaction and processes sinval messages in the SIGUSR1 signal handler. No WAL is output while processing the sinval messages.

3. When the transaction commits, the list of temporary relations are checked to see if they need to be truncated.

4. The temporary table created in step 1 is truncated. To truncate a relation, Access Exclusive lock is acquired on it. When hot standby is used, acquiring an Access Exclusive lock generates a WAL record (RM_STANDBY_ID, XLOG_STANDBY_LOCK).

5. The transaction waits on a latch for a reply from a synchronous standby, because it wrote some WAL. But the latch wait never returns, because the latch needs to receive SIGUSR1 but the SIGUSR1 handler is already in progress from step 2.


The correct behavior is for the transaction not to truncate the temporary table in step 4, because the transaction didn't use the temporary table.

I confirmed that the fix is already in 9.3 and 9.5devel, so I just copied the code fragment from 9.5devel to 9.2.9. The attached patch is for 9.2.9. I didn't check 9.4 and other versions. Why wasn't the fix applied to 9.2?

Finally, I found a very easy way to reproduce the problem:

1. On terminal session 1, start psql and run:
 CREATE TEMPORARY TABLE t (c int) ON COMMIT DELETE ROWS;
Leave the psql session open.

2. On terminal session 2, run:
 pgbench -c8 -t500 -s1 -n -f test.sql dbname
[test.sql]
CREATE TEMPORARY TABLE t (c int) ON COMMIT DELETE ROWS;
DROP TABLE t;

3. On the psql session on terminal session 1, run any SQL statement. It doesn't reply. The backend is stuck at SyncRepWaitForLSN().

Regards
MauMau

Attachment: sinval_catchup_hang_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to