[BUGS] bgwriter interfering with consistent view of system tables?

2004-10-04 Thread Sean Chittenden
When making lots of DDL changes to a database (I believe this includes 
temp tables too), delayed flushing of dirty buffers from the system 
catalogs is causing a severe problem with maintaining a consistent view 
of the structure of the database.  For these examples, I'd create a 
quick Makefile to aid in testing.

printf testing_delay:  Makefile.bug
printf \tpsql -c 'DROP DATABASE mydb' template1  Makefile.bug
printf \tpsql -c 'CREATE DATABASE mydb' template1  Makefile.bug
To reproduce and test this bug, issue `make -f Makefile.bug`.
With the following config settings:
# - Background writer -
bgwriter_delay = 5000   # 10-5000 milliseconds
bgwriter_percent = 1# 0-100% of dirty buffers
bgwriter_maxpages = 1   # 1-1000 buffers max at once
it is *very* easy to reproduce this problem (note, there is a bug in 
the default config, the min percent is 1, no 0 as the comment 
suggests).  With the default settings, it has been harder to spot on my 
laptop.  I believe that higher end systems with higher values will trip 
over this problem less frequently.

With the settings set:
% make -f Makefile.bug
psql -c DROP DATABASE mydb template1
DROP DATABASE
psql -c CREATE DATABASE mydb template1
ERROR:  source database template1 is being accessed by other users
*** Error code 1
The problem being, I've disconnected from template1 already, but the 
database hasn't flushed this to disk so the parent postmaster process 
isn't aware of the disconnection, so when I connect to the backend 
again, the newly created child has an inconsistent view of the current 
connections which prevents me from creating a new database (maybe the 
old backend is still around cleaning up and really hasn't exited, I'm 
not sure).

I think the same phenomena used to exist with temp tables across 
connections that reconnected to a backend with the same backend # (ie, 
connect to backend 123, create a temp table, disconnect, reconnect and 
get backend 123, recreate the same temp table and you'll get an 
error... though I can't reproduce the temp table error right now, 
yay!).

Anyway, Tom/Jan, this code seems to be your areas of expertise, could 
either of you take a look? -sc

--
Sean Chittenden
---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [BUGS] bgwriter interfering with consistent view of system tables?

2004-10-04 Thread Tom Lane
Sean Chittenden [EMAIL PROTECTED] writes:
 When making lots of DDL changes to a database (I believe this includes 
 temp tables too), delayed flushing of dirty buffers from the system 
 catalogs is causing a severe problem with maintaining a consistent view 
 of the structure of the database.

This analysis is completely bogus.

 % make -f Makefile.bug
 psql -c DROP DATABASE mydb template1
 DROP DATABASE
 psql -c CREATE DATABASE mydb template1
 ERROR:  source database template1 is being accessed by other users

It's always been possible for this to happen, primarily because libpq
doesn't wait around for the connected backend to exit.  If the kernel
prefers to schedule other processes then the old backend may still be
alive when the new one tries to do CREATE DATABASE.  There is nothing
stopping the old one from exiting, it's just that the kernel hasn't
given the old backend any cycles at all.

There's been some discussion of making PQfinish() wait to observe
connection closure, which would guarantee that the backend has exited
in the non-SSL-connection case.  It's not clear how well it would work
in the SSL case, though.  In any case it's a bit of a band-aid solution.
I think the real solution is to find a way to not need the accessed by
other users interlock for CREATE DATABASE.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [BUGS] bgwriter interfering with consistent view of system tables?

2004-10-04 Thread Sean Chittenden
When making lots of DDL changes to a database (I believe this includes
temp tables too), delayed flushing of dirty buffers from the system
catalogs is causing a severe problem with maintaining a consistent 
view
of the structure of the database.
This analysis is completely bogus.
That doesn't surprise me at all: I couldn't think of what else it 
would've been.

% make -f Makefile.bug
psql -c DROP DATABASE mydb template1
DROP DATABASE
psql -c CREATE DATABASE mydb template1
ERROR:  source database template1 is being accessed by other users
It's always been possible for this to happen, primarily because libpq
doesn't wait around for the connected backend to exit.  If the kernel
prefers to schedule other processes then the old backend may still be
alive when the new one tries to do CREATE DATABASE.  There is nothing
stopping the old one from exiting, it's just that the kernel hasn't
given the old backend any cycles at all.
There's been some discussion of making PQfinish() wait to observe
connection closure, which would guarantee that the backend has exited
in the non-SSL-connection case.  It's not clear how well it would work
in the SSL case, though.  In any case it's a bit of a band-aid 
solution.
I think the real solution is to find a way to not need the accessed by
other users interlock for CREATE DATABASE.
*shrug*  It'd be good from a security stand point to wait if there is 
any chance the connection could be resurrected via a man-in-the-middle 
attack.  As it stands, this isn't a real important bug given that the 
SQL is programatically created and it's trivial to throw in some kind 
of a sleep... still, it did bother me.  I figured locks on tables were 
stored in stuffed into some kind of a refcount in shared memory segment 
and that the time needed to decrease the refcount would be 
insignificant or done as soon as the client signaled their intention to 
disconnect, not controlled by wait*(2) and the listening postmaster.

-sc
--
Sean Chittenden
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org