Re: [GENERAL] vacuum issues under load?
On Mon, Jan 18, 2010 at 12:28 AM, Martijn van Oosterhout klep...@svana.org wrote: On Fri, Jan 15, 2010 at 08:41:38AM -0700, Scott Marlowe wrote: With slony 2.0.3 or so, I had occasional complete lockups of my database that I didn't have time to troubleshoot as it was a live cluster and I had to restart slony and the databases to get them back up and running. With slony 2.0.2 I had similar problems. A CREATE INDEX on the slave caused slony on the slave to block while inserting into the table which eventually blocked the server during the log switch (TRUNCATE) which eventually blocked everything else. You could have used create index interactively. It occurs to me that the slony daemon should be able to get the TRUNCATE command to abort if it takes too long. No, I don't want it stopping my truncates. The problem I ran into was on a db with no creating indexes, no truncates nothing like that going on. It runs fine for a week or two, then hangs hard. No updates, no selects, nothing works. There are no locks in the way that I can see, just hung database. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] vacuum issues under load?
On Mon, Jan 18, 2010 at 6:35 PM, Scott Marlowe scott.marl...@gmail.com wrote: You could have used create index interactively. s/interactively/concurrently/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] vacuum issues under load?
On Mon, Jan 18, 2010 at 6:35 PM, Scott Marlowe scott.marl...@gmail.com wrote: On Mon, Jan 18, 2010 at 12:28 AM, Martijn van Oosterhout klep...@svana.org wrote: On Fri, Jan 15, 2010 at 08:41:38AM -0700, Scott Marlowe wrote: With slony 2.0.3 or so, I had occasional complete lockups of my database that I didn't have time to troubleshoot as it was a live cluster and I had to restart slony and the databases to get them back up and running. With slony 2.0.2 I had similar problems. A CREATE INDEX on the slave caused slony on the slave to block while inserting into the table which eventually blocked the server during the log switch (TRUNCATE) which eventually blocked everything else. You could have used create index interactively. It occurs to me that the slony daemon should be able to get the TRUNCATE command to abort if it takes too long. No, I don't want it stopping my truncates. The problem I ran into was on a db with no creating indexes, no truncates nothing like that going on. It runs fine for a week or two, then hangs hard. No updates, no selects, nothing works. There are no locks in the way that I can see, just hung database. Wait a minute, I'm describing the scenario we had with 2.0.3, where it would just hang. The vacuum issue is different. Vacuum starts, slony goes to lock the table for ddl, waits on lock from vacuum, then everything backs up behind slony. With large numbers of tables, the problem gets much worse much faster. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] vacuum issues under load?
On Fri, Jan 15, 2010 at 08:41:38AM -0700, Scott Marlowe wrote: With slony 2.0.3 or so, I had occasional complete lockups of my database that I didn't have time to troubleshoot as it was a live cluster and I had to restart slony and the databases to get them back up and running. With slony 2.0.2 I had similar problems. A CREATE INDEX on the slave caused slony on the slave to block while inserting into the table which eventually blocked the server during the log switch (TRUNCATE) which eventually blocked everything else. It occurs to me that the slony daemon should be able to get the TRUNCATE command to abort if it takes too long. Have a nice day, -- Martijn van Oosterhout klep...@svana.org http://svana.org/kleptog/ Please line up in a tree and maintain the heap invariant while boarding. Thank you for flying nlogn airlines. signature.asc Description: Digital signature
Re: [GENERAL] vacuum issues under load?
Ben Chobot be...@silentmedia.com writes: We have recently discovered a problem with our slony-1 cluster of 8.1.19 installs. Specifically, we are unable to vacuum a table on the master node; vacuum always hangs on the same index of the same table. If we do a slony switchover and make the other node the master, then *it* will become unable to vacuum that index. Vacuum on the slave always works quickly and without issue. Vacuum does not hang anywhere else. When we tried to strace the vacuuming backend, it appeared as if it was trying to acquire a lock, but pg_lock showed nothing unexpected for that index. Try attaching to the process with gdb and getting a stack trace. Also ask on the slony lists if anyone's seen anything like this. I don't know a reason for slony to have any effect like that, but there's got to be *something* weird going on. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] vacuum issues under load?
On Fri, 15 Jan 2010, Tom Lane wrote: Ben Chobot be...@silentmedia.com writes: We have recently discovered a problem with our slony-1 cluster of 8.1.19 installs. Specifically, we are unable to vacuum a table on the master node; vacuum always hangs on the same index of the same table. If we do a slony switchover and make the other node the master, then *it* will become unable to vacuum that index. Vacuum on the slave always works quickly and without issue. Vacuum does not hang anywhere else. When we tried to strace the vacuuming backend, it appeared as if it was trying to acquire a lock, but pg_lock showed nothing unexpected for that index. Try attaching to the process with gdb and getting a stack trace. #0 0x7f59a5d78ca7 in semop () from /lib/libc.so.6 #1 0x00546656 in PGSemaphoreLock () #2 0x005618ee in LockBufferForCleanup () #3 0x0045c8e2 in btbulkdelete () #4 0x005eb179 in FunctionCall3 () #5 0x004ed81a in ?? () #6 0x004ee132 in lazy_vacuum_rel () #7 0x004ec0e4 in ?? () #8 0x004ecdeb in vacuum () #9 0x00577837 in ?? () #10 0x00578da1 in PortalRun () #11 0x00574b2b in ?? () #12 0x0057611e in PostgresMain () #13 0x0054e66b in ?? () #14 0x0054f5d1 in PostmasterMain () #15 0x005128de in main () -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] vacuum issues under load?
Ben Chobot be...@silentmedia.com writes: On Fri, 15 Jan 2010, Tom Lane wrote: Try attaching to the process with gdb and getting a stack trace. #0 0x7f59a5d78ca7 in semop () from /lib/libc.so.6 #1 0x00546656 in PGSemaphoreLock () #2 0x005618ee in LockBufferForCleanup () #3 0x0045c8e2 in btbulkdelete () OK, so it's trying to get an exclusive page-level lock on some page of the index. The only thing that could block that for any long period is if some other process is sitting with an open indexscan. Look around for processes that have been idle in transaction for a long time ... regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] vacuum issues under load?
On Fri, 15 Jan 2010, Tom Lane wrote: #0 0x7f59a5d78ca7 in semop () from /lib/libc.so.6 #1 0x00546656 in PGSemaphoreLock () #2 0x005618ee in LockBufferForCleanup () #3 0x0045c8e2 in btbulkdelete () OK, so it's trying to get an exclusive page-level lock on some page of the index. The only thing that could block that for any long period is if some other process is sitting with an open indexscan. Look around for processes that have been idle in transaction for a long time ... There are some, but they've all been running less than a second. There's no better place to look than pg_stat_activity, is there? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] vacuum issues under load?
Ben Chobot be...@silentmedia.com writes: On Fri, 15 Jan 2010, Tom Lane wrote: OK, so it's trying to get an exclusive page-level lock on some page of the index. The only thing that could block that for any long period is if some other process is sitting with an open indexscan. Look around for processes that have been idle in transaction for a long time ... There are some, but they've all been running less than a second. There's no better place to look than pg_stat_activity, is there? Hmph. Given that, what we seem to be looking at is a leaked buffer pin count. There isn't any easy way to fix that except a database restart --- perhaps you can manage one over the weekend? I don't recall having heard of any such bugs lately, but 8.1.x is kind of an old branch. I'm still wondering if slony might've been involved somehow. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] vacuum issues under load?
On Fri, Jan 15, 2010 at 7:45 AM, Tom Lane t...@sss.pgh.pa.us wrote: Ben Chobot be...@silentmedia.com writes: We have recently discovered a problem with our slony-1 cluster of 8.1.19 installs. Specifically, we are unable to vacuum a table on the master node; vacuum always hangs on the same index of the same table. If we do a slony switchover and make the other node the master, then *it* will become unable to vacuum that index. Vacuum on the slave always works quickly and without issue. Vacuum does not hang anywhere else. When we tried to strace the vacuuming backend, it appeared as if it was trying to acquire a lock, but pg_lock showed nothing unexpected for that index. Try attaching to the process with gdb and getting a stack trace. Also ask on the slony lists if anyone's seen anything like this. I don't know a reason for slony to have any effect like that, but there's got to be *something* weird going on. I've seen a situation like there during ddl ops (using the slony execute command to run them on the cluster) where slony halts waiting on vacuum and everything else halts waiting on slony. That was with slony 1.2 With slony 2.0.3 or so, I had occasional complete lockups of my database that I didn't have time to troubleshoot as it was a live cluster and I had to restart slony and the databases to get them back up and running. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] vacuum issues under load?
We have recently discovered a problem with our slony-1 cluster of 8.1.19 installs. Specifically, we are unable to vacuum a table on the master node; vacuum always hangs on the same index of the same table. If we do a slony switchover and make the other node the master, then *it* will become unable to vacuum that index. Vacuum on the slave always works quickly and without issue. Vacuum does not hang anywhere else. When we tried to strace the vacuuming backend, it appeared as if it was trying to acquire a lock, but pg_lock showed nothing unexpected for that index. We can also manually acquire an access exclusive lock on the offending table if we desire. FWIW, we have about 1,000 sessions, and while most of them are idle, we still average a couple hundred queries/s. The index in question is a simple unique btree over a column of type character(40). Has anybody else experienced anything like this? We are hoping this problem magically goes away when we upgrade to 8.4 next month, but it would be great if we could solve it before then. Thanks for any suggestions -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general