On 26.05.2012 12:21, Erik Rijkers wrote:
But when that if-block is added the client crashes after a while (sometimes 
almost immediately; it
never survives longer then 20 minutes):

2012-05-26 10:44:22.617 CEST 10274 ERROR:  could not fsync file 
"base/21268/32807": No such file
or directory
2012-05-26 10:44:28.465 CEST 10274 ERROR:  could not fsync file 
"base/21268/32867": No such file
or directory
2012-05-26 10:44:28.587 CEST 10270 FATAL:  could not open file 
"base/21268/32994": No such file or
directory
2012-05-26 10:44:28.588 CEST 10270 CONTEXT:  writing block 2508 of relation 
base/21268/32994
         xlog redo multi-insert (init): rel 1663/21268/33006; blk 3117; 58 
tuples
TRAP: FailedAssertion("!(PrivateRefCount[i] == 0)", File: "bufmgr.c", Line: 
1741)
2012-05-26 10:44:31.131 CEST 10269 LOG:  startup process (PID 10270) was 
terminated by signal 6:
Aborted
2012-05-26 10:44:31.131 CEST 10269 LOG:  terminating any other active server 
processes


Crazy scenario , I'll admit, but surely this shouldn't be able to crash the 
client?

Thanks for the report. I was able to reproduce this with that script, and I think I see what's going on now.

There's something wrong with the way AccessExclusiveLocks work on a standby. I did "begin; truncate foo; -- leave the xact open" in the master, and waited until the xlog records are shipped to the standby. Then I did this in the standby:

testdb=# begin;
BEGIN
testdb=# select * from foo;
 id
----
(0 rows)

testdb=# select locktype, database, relation, virtualtransaction, pid, mode, granted, fastpath from pg_locks where locktype='relation' and relation='foo'::regclass; locktype | database | relation | virtualtransaction | pid | mode | granted | fastpath
----------+----------+----------+--------------------+-------+---------------------+---------+----------
relation | 16384 | 27332 | 2/78 | 24984 | AccessShareLock | t | t relation | 16384 | 27332 | 1/0 | 24344 | AccessExclusiveLock | t | f
(2 rows)

The "select * from foo" query should have blocked, because the transaction in the master is holding an AccessExclusiveLock on the table.

The process holding the AccessExclusiveLock is the startup process. It's holding the lock on behalf of the transaction in the master. But something's wrong, and the AccessExclusiveLock doesn't stop a regular backend from acquiring the AccessShareLock on the table. I suspect the fast-path locking patch, because this works on 9.1.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to