From: "Robert Haas" <robertmh...@gmail.com>
I think the problem here is that it actually is possible for one
session to access the temporary objects of another session:
Now, we could prohibit that specific thing.  But at the very least, it
has to be possible for one session to drop another session's temporary
objects, because autovacuum does it eventually, and superusers will
want to do it sooner to shut autovacuum up.  So it's hard to reason
about whether and to what extent it's safe to not send sinval messages
for temporary objects.

I was a bit surprised to know that one session can access the data of another session's temporary tables. That implenentation nay be complicating the situation -- extra sinval messages.


I think you might be approaching this problem from the wrong end,
though.  The question in my mind is: why does the
StartTransactionCommand() / CommitTransactionCommand() pair in
ProcessCatchupEvent() end up writing a commit record?  The obvious
possibility that occurs to me is that maybe rereading the invalidated
catalog entries causes a HOT prune, and maybe there ought to be some
way for a transaction that has only done HOT pruning to commit
asynchronously, just as we already do for transactions that only
modify temporary tables.  Or, failing that, maybe there's a way to
suppress synchronous commit for this particular transaction.

I could figure out what log record was output in the transaction started in ProcessCatchupEvent() by inserting elog() in XLogInsert(). The log record was (RM_STANDBY_ID, XLOG_STANDBY_LOCK).

The cause of the hang turned out clear.  It was caused as follows:

1. When a transaction commits which used a temporary table created with ON COMMIT DELETE ROWS, the sinval catchup signal (SIGUSR1) was issued from smgrtruncate(). This is because the temporary table is truncated at transaction end.

2. Another session, which is waiting for a client request, receives SIGUSR1. It calls ProcessCatchupEvent().

3. ProcessCatchupEvent() calls StartTransactionCommand(), emits the XLOG_STANDBY_LOCK WAL record, and then calls CommitTransactionCommand().

4. It then calls SyncRepWaitForLSN(), which in turn waits on the latch.

5. But the WaitLatch() never returns, because the session is already running inside the SIGUSR1 handler in step 2. WaitLatch() needs SIGUSR1 to complete.

I think there is a problem with the latch and SIGUSR1 mechanism. How can we fix this problem?

Regards
MauMau




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to