Re: [HACKERS] 7.4Beta1 hang?
What is the index temp_obs_i_obs_id_index on, exactly? Is it a serial column or some such? Yup: Okay, that explains it then. In a serial column's index, essentially all splits will be on the rightmost page of the tree. This means that when bt_split tries to get a new free page, it will almost always be holding lock on the most recently acquired free page (since that was the righthand side of the previous split). That's the factor that makes the coincidence likely. A vacuum running concurrently with a page split may mistakenly place a just-used page back into FSM (if the page gets used between the time vacuum examines it and the time vacuum finishes and loads its results into FSM). So if that happens, and said page is the first to be returned by FSM for the next split, you lose. I've committed the attached fix. regards, tom lane *** src/backend/access/nbtree/nbtpage.c.origFri Aug 8 17:47:01 2003 --- src/backend/access/nbtree/nbtpage.c Sun Aug 10 15:32:16 2003 *** *** 409,414 --- 409,430 * that the page is still free. (For example, an already-free * page could have been re-used between the time the last VACUUM * scanned it and the time the VACUUM made its FSM updates.) +* +* In fact, it's worse than that: we can't even assume that it's +* safe to take a lock on the reported page. If somebody else +* has a lock on it, or even worse our own caller does, we could +* deadlock. (The own-caller scenario is actually not improbable. +* Consider an index on a serial or timestamp column. Nearly all +* splits will be at the rightmost page, so it's entirely likely +* that _bt_split will call us while holding a lock on the page most +* recently acquired from FSM. A VACUUM running concurrently with +* the previous split could well have placed that page back in FSM.) +* +* To get around that, we ask for only a conditional lock on the +* reported page. If we fail, then someone else is using the page, +* and we may reasonably assume it's not free. (If we happen to be +* wrong, the worst consequence is the page will be lost to use till +* the next VACUUM, which is no big problem.) */ for (;;) { *** *** 416,431 if (blkno == InvalidBlockNumber) break; buf = ReadBuffer(rel, blkno); ! LockBuffer(buf, access); ! page = BufferGetPage(buf); ! if (_bt_page_recyclable(page)) { ! /* Okay to use page. Re-initialize and return it */ ! _bt_pageinit(page, BufferGetPageSize(buf)); ! return buf; } - elog(DEBUG2, FSM returned nonrecyclable page); - _bt_relbuf(rel, buf); } /* --- 432,455 if (blkno == InvalidBlockNumber) break; buf = ReadBuffer(rel, blkno); ! if (ConditionalLockBuffer(buf)) { ! page = BufferGetPage(buf); ! if (_bt_page_recyclable(page)) ! { ! /* Okay to use page. Re-initialize and return it */ ! _bt_pageinit(page, BufferGetPageSize(buf)); ! return buf; ! } ! elog(DEBUG2, FSM returned nonrecyclable page); ! _bt_relbuf(rel, buf); ! } ! else ! { ! elog(DEBUG2, FSM returned nonlockable page); ! /* couldn't get lock, so just drop pin */ ! ReleaseBuffer(buf); } } /* *** src/backend/storage/buffer/bufmgr.c.origSun Aug 3 23:00:55 2003 --- src/backend/storage/buffer/bufmgr.c Sun Aug 10 15:17:28 2003 *** *** 1937,1942 --- 1937,1973 } /* + * Acquire the cntx_lock for the buffer, but only if we don't have to wait. + * + * This assumes the caller wants BUFFER_LOCK_EXCLUSIVE mode. + */ + bool + ConditionalLockBuffer(Buffer buffer) + { + BufferDesc *buf; + + Assert(BufferIsValid(buffer)); + if (BufferIsLocal(buffer)) + return true;
[HACKERS] 7.4Beta1 hang?
I appear to have a hang on my system (40 minutes so far, it's now 17:40). The code is from CVS updated 8/6 if I remember correctly. The machine is idle, with a vacuum waiting and an INSERT doing nothing. The vacuum is being generated via pg_autovacuum. The output from the perl script leading up to the hang is: 2755/kir_27551925.fits.apm... 1775 stars imported 3421 per second 2755/kvr_27551925.fits.apm... 1219 stars imported 4639 per second .Kept 925 (75.9%) pairs of stars at 249.9 per second ( 3.7) 2755/kir_27551989.fits.apm... 1727 stars imported 3564 per second 2755/kvr_27551989.fits.apm... 918 stars imported 3518 per second .Kept 694 (75.6%) pairs of stars at 635.3 per second ( 1.1) 2755/kir_27552015.fits.apm... 1817 stars imported 4549 per second 2755/kvr_27552015.fits.apm... 958 stars imported 4197 per second As you this shows, the insert that is hung is part of a series of operaqtions which usually completes in under 4 seconds. The insert is: INSERT INTO obs_i SELECT * FROM temp_obs_i WHERE obs_id IN (SELECT i_obs_id FROM obs_keep) The PostgreSQL processes: postgres 32242 1 0 Aug07 ?00:00:01 /usr/local/pgsql/bin/postmaster -D /var/lib/pgsql/data postgres 32246 32242 0 Aug07 ?00:02:38 postgres: stats buffer process postgres 32247 32246 0 Aug07 ?00:02:30 postgres: stats collector process postgres 6426 32242 32 16:36 ?00:09:21 postgres: robert tassiv 192.168.0.250 INSERT postgres 6427 32242 1 16:36 ?00:00:21 postgres: robert tassiv 192.168.0.250 idle postgres 6615 32242 3 16:48 ?00:00:34 postgres: robert tassiv localhost VACUUM waiting postgres 6824 32242 0 17:01 ?00:00:00 postgres: checkpoint subprocess Anything to look at before I kick it? It's not built with debug, but I can still get a backtrace. Tables: obs_keep is a temp table generated via perl script SELECT i.obs_id AS i_obs_id, v.obs_id AS v_obs_id INTO TEMP obs_keep FROM (SELECT obs_id, file_id, loc FROM temp_obs_v NATURAL JOIN files WHERE group_id = $group_id AND pair_id = $pair_id) AS v, (SELECT obs_id, file_id, loc FROM temp_obs_i NATURAL JOIN files WHERE group_id = $group_id AND pair_id = $pair_id) AS i WHERE i.loc @ scircle( v.loc, $MATCH_RADIUS ) tassiv=# \d temp_obs_i Table public.temp_obs_i Column | Type | Modifiers -+-+ x | real| not null y | real| not null imag| real| not null smag| real| not null loc | spoint | not null obs_id | integer | not null default nextval('obs_id_seq'::text) file_id | integer | not null use | boolean | default false solve | boolean | default false star_id | integer | mag | real| Indexes: temp_obs_i_file_id_index btree (file_id) temp_obs_i_index gist (loc) temp_obs_i_loc_index gist (loc) temp_obs_i_obs_id_index btree (obs_id) Foreign-key constraints: temp_obs_i_files_constraint FOREIGN KEY (file_id) REFERENCES files(file_id) ON DELETE CASCADE Inherits: obs_root tassiv=# \d obs_i Table public.obs_i Column | Type | Modifiers -+-+ x | real| not null y | real| not null imag| real| not null smag| real| not null loc | spoint | not null obs_id | integer | not null default nextval('obs_id_seq'::text) file_id | integer | not null use | boolean | default false solve | boolean | default false star_id | integer | mag | real| Indexes: obs_i_file_id_index btree (file_id) obs_i_loc_index gist (loc) obs_i_obs_id_index btree (obs_id) obs_i_star_id_index btree (star_id) obs_i_use_index btree (use) Foreign-key constraints: obs_i_files_constraint FOREIGN KEY (file_id) REFERENCES files(file_id) ON DELETE CASCADEobs_i_star_id_constraint FOREIGN KEY (star_id) REFERENCES catalog(star_id) ON DELETE SET NULL Triggers: obs_i_trig BEFORE INSERT OR DELETE OR UPDATE ON obs_i FOR EACH ROW EXECUTE PROCEDURE observations_trigger() Inherits: obs_root -- 17:05:52 up 8 days, 9:51, 4 users, load average: 0.03, 0.51, 0.92 pgp0.pgp Description: PGP signature
Re: [HACKERS] 7.4Beta1 hang?
Robert Creager [EMAIL PROTECTED] writes: Could you supply the relation names corresponding to the relation OIDs appearing in pg_locks, so we can be sure who's processing what? Sure, if you tell me how ;-) I looked at the view definition and that didn't help me much... select relname from pg_class where oid = 'whatever'; or if you still have the pg_locks data select relname,pg_locks.* from pg_locks,pg_class where pg_class.oid = relation; regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] 7.4Beta1 hang?
Robert Creager [EMAIL PROTECTED] writes: [much data] Well, I'm baffled. The vacuum process is evidently waiting for the insert, but I don't think it could be holding any lock that the other two want. The insert is trying to grab a per-buffer lock, which is reasonable. I do not see that the VACUUM could have that lock, but it's possible the checkpoint process does. However, as best I can tell the checkpoint process is trying to get the BufMgrLock, and neither of the other two can be holding that. So what the heck is blocking the checkpoint? A point of interest is that the VACUUM is trying to get the lock on a GIST index (it must be so, as lazy_vacuum_index would not need to get any lock at all on btree indexes). OTOH, the insert is busy trying to insert into one of the btree indexes on obs_i (can't tell which); and furthermore it's trying to do a page split in that index, which is relatively uncommon. It's possible that the combination of these events would reproducibly yield the hang, though I sure don't see how. Could you supply the relation names corresponding to the relation OIDs appearing in pg_locks, so we can be sure who's processing what? After that, we've probably got all the info we can get from this occurrence. Please rebuild with debugging enabled, and hope that it happens again so we can get more data. regards, tom lane ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] 7.4Beta1 hang?
On Sat, 09 Aug 2003 21:17:05 -0400 Tom Lane [EMAIL PROTECTED] said something like: Robert Creager [EMAIL PROTECTED] writes: [much data] Could you supply the relation names corresponding to the relation OIDs appearing in pg_locks, so we can be sure who's processing what? Well, there appears to be a mail problem. Not sure if it's on my end, or somewhere else. I dug around a little and I might have what you asked for. The relname looks correct for what's currently being processed, but I don't know why some of the relation wouldn't have a name. Of course, I made this up, so it may be vey wrong... tassiv=# select relation, mode, relname from pg_locks left join pg_class on ( pg_locks.relation = pg_class.relfilenode ); relation | mode | relname --+--+--- 1259 | AccessShareLock | pg_class 16759 | AccessShareLock | pg_locks 17835 | AccessShareLock | groups 17835 | RowShareLock | groups 17844 | AccessShareLock | pairs_pair_id_seq 17846 | AccessShareLock | pairs 17846 | RowShareLock | pairs 17846 | RowExclusiveLock | pairs 17855 | AccessShareLock | files_file_id_seq 17857 | AccessShareLock | files 17857 | RowShareLock | files 17857 | RowExclusiveLock | files 17879 | AccessShareLock | fits 17879 | RowExclusiveLock | fits 17912 | AccessShareLock | obs_id_seq 17971 | AccessShareLock | obs_v 17971 | RowExclusiveLock | obs_v 17984 | AccessShareLock | temp_obs_v 17984 | RowExclusiveLock | temp_obs_v 17984 | ShareLock| temp_obs_v 17984 | AccessExclusiveLock | temp_obs_v 18015 | ShareUpdateExclusiveLock | obs_i 18015 | AccessShareLock | obs_i 18015 | RowExclusiveLock | obs_i 18015 | ShareUpdateExclusiveLock | obs_i 18028 | AccessShareLock | temp_obs_i 18028 | RowExclusiveLock | temp_obs_i 18028 | ShareLock| temp_obs_i 18028 | AccessExclusiveLock | temp_obs_i 18045 | AccessShareLock | imported 18045 | RowExclusiveLock | imported 18096 | AccessExclusiveLock | obs_i_loc_index 18096 | AccessExclusiveLock | obs_i_loc_index 18099 | ShareLock| 18099 | AccessExclusiveLock | 18101 | ShareLock| 18101 | AccessExclusiveLock | 34526324 | AccessShareLock | 34526324 | ShareLock| 34526324 | AccessExclusiveLock | 34527019 | ShareLock| 34527019 | AccessExclusiveLock | 34527020 | ShareLock| 34527020 | AccessExclusiveLock | | ExclusiveLock| | ExclusiveLock| | ExclusiveLock| -- 09:02:24 up 9 days, 1:47, 4 users, load average: 2.01, 2.03, 2.00 pgp0.pgp Description: PGP signature
Re: [HACKERS] 7.4Beta1 hang?
On Sat, 09 Aug 2003 20:12:36 -0400 Tom Lane [EMAIL PROTECTED] said something like: Robert Creager [EMAIL PROTECTED] writes: Anything to look at before I kick it? pg_locks and pg_stat_activity, if you can select from them in a non-stuck backend. tassiv=# select * from pg_locks; relation | database | transaction | pid | mode | granted --+--+-+--+--+- 17846 |17140 | | 6426 | AccessShareLock | t 17846 |17140 | | 6426 | RowShareLock | t 17846 |17140 | | 6426 | RowExclusiveLock | t 17835 |17140 | | 6426 | AccessShareLock | t 17835 |17140 | | 6426 | RowShareLock | t 18096 |17140 | | 6426 | AccessExclusiveLock | t 18045 |17140 | | 6426 | AccessShareLock | t 18045 |17140 | | 6426 | RowExclusiveLock | t 34527020 |17140 | | 6426 | ShareLock| t 34527020 |17140 | | 6426 | AccessExclusiveLock | t 18015 |17140 | | 6615 | ShareUpdateExclusiveLock | t 17971 |17140 | | 6426 | AccessShareLock | t 17971 |17140 | | 6426 | RowExclusiveLock | t 16759 |17140 | | 8069 | AccessShareLock | t 34526324 |17140 | | 6426 | AccessShareLock | t 34526324 |17140 | | 6426 | ShareLock| t 34526324 |17140 | | 6426 | AccessExclusiveLock | t 18101 |17140 | | 6426 | ShareLock| t 18101 |17140 | | 6426 | AccessExclusiveLock | t 18096 |17140 | | 6615 | AccessExclusiveLock | f 18028 |17140 | | 6426 | AccessShareLock | t 18028 |17140 | | 6426 | RowExclusiveLock | t 18028 |17140 | | 6426 | ShareLock| t 18028 |17140 | | 6426 | AccessExclusiveLock | t 17879 |17140 | | 6426 | AccessShareLock | t 17879 |17140 | | 6426 | RowExclusiveLock | t | | 79764 | 8069 | ExclusiveLock| t 18015 |17140 | | 6426 | AccessShareLock | t 18015 |17140 | | 6426 | RowExclusiveLock | t | | 79690 | 6615 | ExclusiveLock| t 17855 |17140 | | 6426 | AccessShareLock | t 17857 |17140 | | 6426 | AccessShareLock | t 17857 |17140 | | 6426 | RowShareLock | t 17857 |17140 | | 6426 | RowExclusiveLock | t 18099 |17140 | | 6426 | ShareLock| t 18099 |17140 | | 6426 | AccessExclusiveLock | t 17912 |17140 | | 6426 | AccessShareLock | t | | 79712 | 6426 | ExclusiveLock| t 17984 |17140 | | 6426 | AccessShareLock | t 17984 |17140 | | 6426 | RowExclusiveLock | t 17984 |17140 | | 6426 | ShareLock| t 17984 |17140 | | 6426 | AccessExclusiveLock | t 17844 |17140 | | 6426 | AccessShareLock | t 34527019 |17140 | | 6426 | ShareLock| t 34527019 |17140 | | 6426 | AccessExclusiveLock | t 18015 |17140 | | 6615 | ShareUpdateExclusiveLock | t tassiv=# select * from pg_stat_activity; datid | datname | procpid | usesysid | usename | current_query | query_start ---+-+-+--+-+---+- 17140 | tassiv |8069 | 100 | robert | | 17140 | tassiv |6426 | 100 | robert | | 17140 | tassiv |6427 | 100 | robert | | 17140 | tassiv |6615 | 100 | robert | | It's not built with debug, but I can still get a backtrace. Might be useful. regards, tom lane 00:00:01 /usr/local/pgsql/bin/postmaster -D /var/lib/pgsql/data #0 0x4031fec2 in select () from /lib/i686/libc.so.6 #1 0xb368 in ?? () #2 0x081340bd in PostmasterMain () #3 0x08109edc in main () #4 0x4025e7f7 in __libc_start_main () from /lib/i686/libc.so.6 postgres: robert tassiv 192.168.0.250 INSERT (gdb) backtrace #0 0x403279d7 in semop () from /lib/i686/libc.so.6 #1 0x08133151 in PGSemaphoreLock () #2 0x081517cc in LWLockAcquire () #3 0x081482fd in LockBuffer () #4 0x0808439f in _bt_getbuf () #5 0x080827e4 in _bt_split () #6 0x08082202 in _bt_insertonpg () #7
Re: [HACKERS] 7.4Beta1 hang?
On Sat, 09 Aug 2003 21:17:05 -0400 Tom Lane [EMAIL PROTECTED] said something like: Could you supply the relation names corresponding to the relation OIDs appearing in pg_locks, so we can be sure who's processing what? Sure, if you tell me how ;-) I looked at the view definition and that didn't help me much... -- 19:32:24 up 8 days, 12:18, 4 users, load average: 2.07, 2.05, 2.00 pgp0.pgp Description: PGP signature
Re: [HACKERS] 7.4Beta1 hang?
Robert Creager [EMAIL PROTECTED] writes: Anything to look at before I kick it? pg_locks and pg_stat_activity, if you can select from them in a non-stuck backend. It's not built with debug, but I can still get a backtrace. Might be useful. regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org