Re: [GENERAL] vacuum issues under load?

2010-01-18 Thread Scott Marlowe
On Mon, Jan 18, 2010 at 12:28 AM, Martijn van Oosterhout
klep...@svana.org wrote:
 On Fri, Jan 15, 2010 at 08:41:38AM -0700, Scott Marlowe wrote:
 With slony 2.0.3 or so, I had occasional complete lockups of my
 database that I didn't have time to troubleshoot as it was a live
 cluster and I had to restart slony and the databases to get them back
 up and running.

 With slony 2.0.2 I had similar problems. A CREATE INDEX on the slave
 caused slony on the slave to block while inserting into the table which
 eventually blocked the server during the log switch (TRUNCATE) which
 eventually blocked everything else.

You could have used create index interactively.

 It occurs to me that the slony daemon should be able to get the
 TRUNCATE command to abort if it takes too long.

No, I don't want it stopping my truncates.

The problem I ran into was on a db with no creating indexes, no
truncates nothing like that going on.  It runs fine for a week or two,
then hangs hard.  No updates, no selects, nothing works.  There are no
locks in the way that I can see, just hung database.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] vacuum issues under load?

2010-01-18 Thread Scott Marlowe
On Mon, Jan 18, 2010 at 6:35 PM, Scott Marlowe scott.marl...@gmail.com wrote:
 You could have used create index interactively.

s/interactively/concurrently/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] vacuum issues under load?

2010-01-18 Thread Scott Marlowe
On Mon, Jan 18, 2010 at 6:35 PM, Scott Marlowe scott.marl...@gmail.com wrote:
 On Mon, Jan 18, 2010 at 12:28 AM, Martijn van Oosterhout
 klep...@svana.org wrote:
 On Fri, Jan 15, 2010 at 08:41:38AM -0700, Scott Marlowe wrote:
 With slony 2.0.3 or so, I had occasional complete lockups of my
 database that I didn't have time to troubleshoot as it was a live
 cluster and I had to restart slony and the databases to get them back
 up and running.

 With slony 2.0.2 I had similar problems. A CREATE INDEX on the slave
 caused slony on the slave to block while inserting into the table which
 eventually blocked the server during the log switch (TRUNCATE) which
 eventually blocked everything else.

 You could have used create index interactively.

 It occurs to me that the slony daemon should be able to get the
 TRUNCATE command to abort if it takes too long.

 No, I don't want it stopping my truncates.

 The problem I ran into was on a db with no creating indexes, no
 truncates nothing like that going on.  It runs fine for a week or two,
 then hangs hard.  No updates, no selects, nothing works.  There are no
 locks in the way that I can see, just hung database.

Wait a minute, I'm describing the scenario we had with 2.0.3, where it
would just hang.

The vacuum issue is different.  Vacuum starts, slony goes to lock the
table for ddl, waits on lock from vacuum, then everything backs up
behind slony.  With large numbers of tables, the problem gets much
worse much faster.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] vacuum issues under load?

2010-01-17 Thread Martijn van Oosterhout
On Fri, Jan 15, 2010 at 08:41:38AM -0700, Scott Marlowe wrote:
 With slony 2.0.3 or so, I had occasional complete lockups of my
 database that I didn't have time to troubleshoot as it was a live
 cluster and I had to restart slony and the databases to get them back
 up and running.

With slony 2.0.2 I had similar problems. A CREATE INDEX on the slave
caused slony on the slave to block while inserting into the table which
eventually blocked the server during the log switch (TRUNCATE) which
eventually blocked everything else.

It occurs to me that the slony daemon should be able to get the
TRUNCATE command to abort if it takes too long.

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 Please line up in a tree and maintain the heap invariant while 
 boarding. Thank you for flying nlogn airlines.


signature.asc
Description: Digital signature


Re: [GENERAL] vacuum issues under load?

2010-01-15 Thread Tom Lane
Ben Chobot be...@silentmedia.com writes:
 We have recently discovered a problem with our slony-1 cluster of 8.1.19 
 installs. Specifically, we are unable to vacuum a table on the master 
 node; vacuum always hangs on the same index of the same table. If we do a 
 slony switchover and make the other node the master, then *it* will become 
 unable to vacuum that index. Vacuum on the slave always works quickly and 
 without issue. Vacuum does not hang anywhere else.

 When we tried to strace the vacuuming backend, it appeared as if it was 
 trying to acquire a lock, but pg_lock showed nothing unexpected for that 
 index.

Try attaching to the process with gdb and getting a stack trace.

Also ask on the slony lists if anyone's seen anything like this.  I
don't know a reason for slony to have any effect like that, but there's
got to be *something* weird going on.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] vacuum issues under load?

2010-01-15 Thread Ben Chobot

On Fri, 15 Jan 2010, Tom Lane wrote:


Ben Chobot be...@silentmedia.com writes:

We have recently discovered a problem with our slony-1 cluster of 8.1.19
installs. Specifically, we are unable to vacuum a table on the master
node; vacuum always hangs on the same index of the same table. If we do a
slony switchover and make the other node the master, then *it* will become
unable to vacuum that index. Vacuum on the slave always works quickly and
without issue. Vacuum does not hang anywhere else.



When we tried to strace the vacuuming backend, it appeared as if it was
trying to acquire a lock, but pg_lock showed nothing unexpected for that
index.


Try attaching to the process with gdb and getting a stack trace.


#0  0x7f59a5d78ca7 in semop () from /lib/libc.so.6
#1  0x00546656 in PGSemaphoreLock ()
#2  0x005618ee in LockBufferForCleanup ()
#3  0x0045c8e2 in btbulkdelete ()
#4  0x005eb179 in FunctionCall3 ()
#5  0x004ed81a in ?? ()
#6  0x004ee132 in lazy_vacuum_rel ()
#7  0x004ec0e4 in ?? ()
#8  0x004ecdeb in vacuum ()
#9  0x00577837 in ?? ()
#10 0x00578da1 in PortalRun ()
#11 0x00574b2b in ?? ()
#12 0x0057611e in PostgresMain ()
#13 0x0054e66b in ?? ()
#14 0x0054f5d1 in PostmasterMain ()
#15 0x005128de in main ()



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] vacuum issues under load?

2010-01-15 Thread Tom Lane
Ben Chobot be...@silentmedia.com writes:
 On Fri, 15 Jan 2010, Tom Lane wrote:
 Try attaching to the process with gdb and getting a stack trace.

 #0  0x7f59a5d78ca7 in semop () from /lib/libc.so.6
 #1  0x00546656 in PGSemaphoreLock ()
 #2  0x005618ee in LockBufferForCleanup ()
 #3  0x0045c8e2 in btbulkdelete ()

OK, so it's trying to get an exclusive page-level lock on some page of
the index.  The only thing that could block that for any long period is
if some other process is sitting with an open indexscan.  Look around
for processes that have been idle in transaction for a long time ...

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] vacuum issues under load?

2010-01-15 Thread Ben Chobot

On Fri, 15 Jan 2010, Tom Lane wrote:


#0  0x7f59a5d78ca7 in semop () from /lib/libc.so.6
#1  0x00546656 in PGSemaphoreLock ()
#2  0x005618ee in LockBufferForCleanup ()
#3  0x0045c8e2 in btbulkdelete ()


OK, so it's trying to get an exclusive page-level lock on some page of
the index.  The only thing that could block that for any long period is
if some other process is sitting with an open indexscan.  Look around
for processes that have been idle in transaction for a long time ...


There are some, but they've all been running less than a second. There's 
no better place to look than pg_stat_activity, is there?



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] vacuum issues under load?

2010-01-15 Thread Tom Lane
Ben Chobot be...@silentmedia.com writes:
 On Fri, 15 Jan 2010, Tom Lane wrote:
 OK, so it's trying to get an exclusive page-level lock on some page of
 the index.  The only thing that could block that for any long period is
 if some other process is sitting with an open indexscan.  Look around
 for processes that have been idle in transaction for a long time ...

 There are some, but they've all been running less than a second. There's 
 no better place to look than pg_stat_activity, is there?

Hmph.  Given that, what we seem to be looking at is a leaked buffer pin
count.  There isn't any easy way to fix that except a database restart
--- perhaps you can manage one over the weekend?

I don't recall having heard of any such bugs lately, but 8.1.x is kind
of an old branch.  I'm still wondering if slony might've been involved
somehow.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] vacuum issues under load?

2010-01-15 Thread Scott Marlowe
On Fri, Jan 15, 2010 at 7:45 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Ben Chobot be...@silentmedia.com writes:
 We have recently discovered a problem with our slony-1 cluster of 8.1.19
 installs. Specifically, we are unable to vacuum a table on the master
 node; vacuum always hangs on the same index of the same table. If we do a
 slony switchover and make the other node the master, then *it* will become
 unable to vacuum that index. Vacuum on the slave always works quickly and
 without issue. Vacuum does not hang anywhere else.

 When we tried to strace the vacuuming backend, it appeared as if it was
 trying to acquire a lock, but pg_lock showed nothing unexpected for that
 index.

 Try attaching to the process with gdb and getting a stack trace.

 Also ask on the slony lists if anyone's seen anything like this.  I
 don't know a reason for slony to have any effect like that, but there's
 got to be *something* weird going on.

I've seen a situation like there during ddl ops (using the slony
execute command to run them on the cluster) where slony halts waiting
on vacuum and everything else halts waiting on slony.  That was with
slony 1.2

With slony 2.0.3 or so, I had occasional complete lockups of my
database that I didn't have time to troubleshoot as it was a live
cluster and I had to restart slony and the databases to get them back
up and running.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] vacuum issues under load?

2010-01-14 Thread Ben Chobot
We have recently discovered a problem with our slony-1 cluster of 8.1.19 
installs. Specifically, we are unable to vacuum a table on the master 
node; vacuum always hangs on the same index of the same table. If we do a 
slony switchover and make the other node the master, then *it* will become 
unable to vacuum that index. Vacuum on the slave always works quickly and 
without issue. Vacuum does not hang anywhere else.


When we tried to strace the vacuuming backend, it appeared as if it was 
trying to acquire a lock, but pg_lock showed nothing unexpected for that 
index. We can also manually acquire an access exclusive lock on the 
offending table if we desire. FWIW, we have about 1,000 sessions, and 
while most of them are idle, we still average a couple hundred queries/s. 
The index in question is a simple unique btree over a column of type 
character(40).


Has anybody else experienced anything like this? We are hoping this 
problem magically goes away when we upgrade to 8.4 next month, but it 
would be great if we could solve it before then.


Thanks for any suggestions

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general