Re: [HACKERS] heap vacuum & cleanup locks
On Mon, Jun 30, 2014 at 7:55 PM, Greg Stark wrote: > On Wed, Nov 2, 2011 at 3:19 PM, Robert Haas wrote: >> On Tue, Jun 7, 2011 at 3:24 PM, Greg Stark wrote: >>> Well it's super-exclusive-vacuum-lock avoidance techniques. Why >>> shouldn't it make more sense to try to reduce the frequency and impact >>> of the single-purpose outlier in a non-critical-path instead of >>> burdening every other data reader with extra overhead? >>> >>> I think Robert's plan is exactly right though I would phrase it >>> differently. We should get the exclusive lock, freeze/kill any xids >>> and line pointers, then if the pin-count is 1 do the compaction. >> >> I wrote a really neat patch to do this today... and then, as I >> thought about it some more, I started to think that it's probably >> unsafe. Here's the trouble: with this approach, we assume that it's >> OK to change the contents of the line pointer while holding only an >> exclusive lock on the buffer. But there is a good deal of code out >> there that thinks it's OK to examine a line pointer with only a pin on >> the buffer (no lock). You need a content lock to scan the item >> pointer array, but once you've identified an item of interest, you're >> entitled to assume that it won't be modified while you hold a buffer >> pin. Now, if you've identified a particular tuple as being visible to >> your scan, then you might think that VACUUM shouldn't be removing it >> anyway. But I think that's only true for MVCC scans - for example, >> what happens under SnapshotNow semantics? > > Well now that you've done all that amazing work eliminating > SnapshotNow ... Thank you. :-) > ... maybe this patch deserves another look? I see > anti-wraparound vacuums more and more often as database transaction > velocity increases and vacuum takes longer and longer as database > sizes increase. Having a faster vacuum that can skip vacuuming pages > but is still guaranteed to freeze every page is pretty tempting. > > Of course the freeze epoch in the header obviating the need for > freezing is an even more attractive goal but AIUI we're not even sure > that's going to work and this is a nice easy win. Well, there's still SnapshotSelf, SnapshotAny, SnapshotDirty, ... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap vacuum & cleanup locks
Zombie thread arise! I was searching for old threads on a specific problem and came across this patch that was dropped due to concerns about SnapshotNow: On Wed, Nov 2, 2011 at 3:19 PM, Robert Haas wrote: > On Tue, Jun 7, 2011 at 3:24 PM, Greg Stark wrote: >> Well it's super-exclusive-vacuum-lock avoidance techniques. Why >> shouldn't it make more sense to try to reduce the frequency and impact >> of the single-purpose outlier in a non-critical-path instead of >> burdening every other data reader with extra overhead? >> >> I think Robert's plan is exactly right though I would phrase it >> differently. We should get the exclusive lock, freeze/kill any xids >> and line pointers, then if the pin-count is 1 do the compaction. > > I wrote a really neat patch to do this today... and then, as I > thought about it some more, I started to think that it's probably > unsafe. Here's the trouble: with this approach, we assume that it's > OK to change the contents of the line pointer while holding only an > exclusive lock on the buffer. But there is a good deal of code out > there that thinks it's OK to examine a line pointer with only a pin on > the buffer (no lock). You need a content lock to scan the item > pointer array, but once you've identified an item of interest, you're > entitled to assume that it won't be modified while you hold a buffer > pin. Now, if you've identified a particular tuple as being visible to > your scan, then you might think that VACUUM shouldn't be removing it > anyway. But I think that's only true for MVCC scans - for example, > what happens under SnapshotNow semantics? Well now that you've done all that amazing work eliminating SnapshotNow maybe this patch deserves another look? I see anti-wraparound vacuums more and more often as database transaction velocity increases and vacuum takes longer and longer as database sizes increase. Having a faster vacuum that can skip vacuuming pages but is still guaranteed to freeze every page is pretty tempting. Of course the freeze epoch in the header obviating the need for freezing is an even more attractive goal but AIUI we're not even sure that's going to work and this is a nice easy win. > But then then on third > thought, if you've also got an MVCC snapshot taken before the start of > the SnapshotNow scan, you are probably OK, because your advertised > xmin should prevent anyone from removing anything anyway, so how do > you actually provoke a failure? > > Anyway, I'm attaching the patch, in case anyone has any ideas on where > to go with this. > >> I'm really wishing we had more bits in the vm. It looks like we could use: >> - contains not-all-visible tuples >> - contains not-frozen xids >> - in need of compaction Another idea would be to store the upper 2-4 bits of the oldest xid in the page. That would let you determine whether it's in need of an anti-wraparound vacuum. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap vacuum & cleanup locks
On Wed, Nov 9, 2011 at 6:10 PM, Simon Riggs wrote: > On Wed, Nov 9, 2011 at 10:20 PM, Tom Lane wrote: >> Simon Riggs writes: >>> heapgetpage() gets a page and a pin, but holds the pin until it reads >>> the next page. Wow! >> >>> That is both annoying and very dumb. It should hold the pin long >>> enough to copy the data and then release the pin. >> >> I don't find that anywhere near as obvious as you seem to. I think you >> are trying to optimize for the wrong set of conditions. > > ISTM we should optimise to access the cachelines in the buffer once. > Holding a pin and re-accessing the buffer via main memory seems pretty > bad plan to me. Which conditions are being optimised by doing that? I believe it reduces memory copying. But we can certainly test some alternative you may have in mind and see how it shakes out. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap vacuum & cleanup locks
On Wed, Nov 9, 2011 at 5:18 PM, Tom Lane wrote: > Robert Haas writes: >> As for what to do about all this, I think Tom's idea would work for >> good tuples, but the current freezing code can't handle dead tuples; >> it counts on those having been already removed. > > I have not gone back to look at the code, but are you worried about the > fact that it doesn't consider replacing xmax with FrozenTransactionId? > Surely we could do that if we wanted. It just never seemed necessary > before --- but if vacuum is to be allowed to punt repeatedly on the same > page, maybe we do need to cover the case. No, I'm worried about the fact that that it does this: xid = HeapTupleHeaderGetXmin(tuple); if (TransactionIdIsNormal(xid) && TransactionIdPrecedes(xid, cutoff_xid)) { if (buf != InvalidBuffer) { /* trade in share lock for exclusive lock */ LockBuffer(buf, BUFFER_LOCK_UNLOCK); LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE); buf = InvalidBuffer; } HeapTupleHeaderSetXmin(tuple, FrozenTransactionId); Note that the ONLY thing we're checking with respect to the tuple xmin is that it's a normal XID that precedes the cutoff. We are not checking whether it's committed. So there had better not be any tuples left on the page that are from inserting transactions that aborted, or this is going to go boom. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap vacuum & cleanup locks
On Wed, Nov 9, 2011 at 10:20 PM, Tom Lane wrote: > Simon Riggs writes: >> heapgetpage() gets a page and a pin, but holds the pin until it reads >> the next page. Wow! > >> That is both annoying and very dumb. It should hold the pin long >> enough to copy the data and then release the pin. > > I don't find that anywhere near as obvious as you seem to. I think you > are trying to optimize for the wrong set of conditions. ISTM we should optimise to access the cachelines in the buffer once. Holding a pin and re-accessing the buffer via main memory seems pretty bad plan to me. Which conditions are being optimised by doing that? > I will also note that the behavior of holding pin for as long as we are > stopped on a particular tuple is not specific to seqscans. Agreed. Bad things may happen in more than one place. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap vacuum & cleanup locks
Simon Riggs writes: > heapgetpage() gets a page and a pin, but holds the pin until it reads > the next page. Wow! > That is both annoying and very dumb. It should hold the pin long > enough to copy the data and then release the pin. I don't find that anywhere near as obvious as you seem to. I think you are trying to optimize for the wrong set of conditions. I will also note that the behavior of holding pin for as long as we are stopped on a particular tuple is not specific to seqscans. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap vacuum & cleanup locks
Robert Haas writes: > As for what to do about all this, I think Tom's idea would work for > good tuples, but the current freezing code can't handle dead tuples; > it counts on those having been already removed. I have not gone back to look at the code, but are you worried about the fact that it doesn't consider replacing xmax with FrozenTransactionId? Surely we could do that if we wanted. It just never seemed necessary before --- but if vacuum is to be allowed to punt repeatedly on the same page, maybe we do need to cover the case. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap vacuum & cleanup locks
On Wed, Nov 9, 2011 at 9:48 PM, simon wrote: > On Wed, Nov 9, 2011 at 9:12 PM, Robert Haas wrote: > >> Well, I'm not sure of the details of how page-at-a-time mode works for >> seq scans, but I am absolutely 100% sure that you can reproduce this >> problem using a cursor over a sequential scan. Just do this: >> >> create table test (a text); >> insert into test values ('aaa'), ('bbb'); >> delete from test where a = 'aaa'; >> begin; >> declare x cursor for select * from test; >> fetch next from x; > > That's a bug. No, I'm wrong. It's not a bug at all. heapgetpage() gets a page and a pin, but holds the pin until it reads the next page. Wow! That is both annoying and very dumb. It should hold the pin long enough to copy the data and then release the pin. It looks inefficient from a memory access viewpoint and from a pin longevity viewpoint. If we copied out relevant data it would be more cache efficient without affecting visibility. Looking at a patch. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap vacuum & cleanup locks
On Wed, Nov 9, 2011 at 9:12 PM, Robert Haas wrote: > Well, I'm not sure of the details of how page-at-a-time mode works for > seq scans, but I am absolutely 100% sure that you can reproduce this > problem using a cursor over a sequential scan. Just do this: > > create table test (a text); > insert into test values ('aaa'), ('bbb'); > delete from test where a = 'aaa'; > begin; > declare x cursor for select * from test; > fetch next from x; That's a bug. heapam.c line 1202 says /* * we can use page-at-a-time mode if it's an MVCC-safe snapshot */ scan->rs_pageatatime = IsMVCCSnapshot(snapshot); So either the comment or the code is wrong. Can't see where, as yet. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap vacuum & cleanup locks
On Tue, Nov 8, 2011 at 3:26 PM, Tom Lane wrote: > Robert Haas writes: >> On Tue, Nov 8, 2011 at 2:26 AM, Simon Riggs wrote: >>> I think we need to avoid long pin hold times generally. > >> In the case of a suspended sequential scan, which is the case where >> this has most recently bitten me on a production system, it actually >> seems rather unnecessary to hold the pin for a long period of time. >> If we release the buffer pin, then someone could vacuum the buffer. > > This seems unlikely to be a productive line of thought. The only way > you could release buffer pin is if you first copied all the tuples you > need out of the page, and that seems like an unacceptable performance > hit. We should not be penalizing foreground query operations for the > benefit of background maintenance like VACUUM. (The fact that we do > an equivalent thing in btree index scans isn't an argument for doing > it here, because the tradeoffs are very different. In the index case, > the amount of data to be copied is a great deal less; the length of > time the lock would have to be held is often a great deal more; and > releasing the lock quickly gives a performance benefit for other > foreground operations, not only background maintenance.) > > It strikes me that the only case where vacuum now has to wait is where > it needs to freeze an old XID. Couldn't it do that without insisting on > exclusive access? We only need exclusive access if we're going to move > data around, but we could have a code path in vacuum that just replaces > old XIDs with FrozenXID without moving/deleting anything. Holding buffer pins for a long time is a problem in Hot Standby also, not just vacuum. AFAIK seq scans already work page at a time for normal tables. So the issue is when we *aren't* using a seq scan, e.g. nested loops joins. Is there a way to solve that? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap vacuum & cleanup locks
On Wed, Nov 9, 2011 at 3:46 PM, Simon Riggs wrote: > Holding buffer pins for a long time is a problem in Hot Standby also, > not just vacuum. Agreed. > AFAIK seq scans already work page at a time for normal tables. So the > issue is when we *aren't* using a seq scan, e.g. nested loops joins. > > Is there a way to solve that? Well, I'm not sure of the details of how page-at-a-time mode works for seq scans, but I am absolutely 100% sure that you can reproduce this problem using a cursor over a sequential scan. Just do this: create table test (a text); insert into test values ('aaa'), ('bbb'); delete from test where a = 'aaa'; begin; declare x cursor for select * from test; fetch next from x; Then switch to another session and run "VACUUM test". Prior to commit bbb6e559c4ea0fb4c346beda76736451dc24eb4e, this would hang. Now, it doesn't. But "VACUUM FREEZE test" still does. As for what to do about all this, I think Tom's idea would work for good tuples, but the current freezing code can't handle dead tuples; it counts on those having been already removed. I wonder if we could just set xmin = InvalidTransactionId and set HEAP_XMIN_INVALID, or something like that. I'm worried that there might be code out there that thinks InvalidTransactionId can never appear in a real tuple. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap vacuum & cleanup locks
On Tue, Nov 8, 2011 at 10:54 AM, Tom Lane wrote: > Robert Haas writes: >> Interesting idea. I think in general we insist that you must have a >> buffer content lock to inspect the tuple visibility info, in which >> case that would be safe. But I'm not sure we do that absolutely >> everywhere. For instance, just last night I noticed this: > >> /* >> * If xmin isn't what we're expecting, the >> slot must have been >> * recycled and reused for an unrelated tuple. >> This implies that >> * the latest version of the row was deleted, >> so we need do >> * nothing. (Should be safe to examine xmin >> without getting >> * buffer's content lock, since xmin never >> changes in an existing >> * tuple.) >> */ >> if > > Hmm ... I think that code is OK but the comment needs work. Here we are > necessarily looking for a pretty recent value of xmin (it has to be > later than GlobalXmin), so there's no need to worry that it might get > changed to FrozenXID. OK. Here's another possible concern: what happens if the page we're freezing contains a dead tuple? It looks to me like heap_freeze_tuple() is written so as not to require a cleanup lock - indeed, the comments claim it's called when holding only a share lock on the buffer, which doesn't appear to match what lazy_scan_heap() is actually doing. But it does seem to assume that any tuples that still exist are all-visible, which only works if vacuum has already pruned the page. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap vacuum & cleanup locks
Robert Haas writes: > Interesting idea. I think in general we insist that you must have a > buffer content lock to inspect the tuple visibility info, in which > case that would be safe. But I'm not sure we do that absolutely > everywhere. For instance, just last night I noticed this: > /* > * If xmin isn't what we're expecting, the > slot must have been > * recycled and reused for an unrelated tuple. > This implies that > * the latest version of the row was deleted, > so we need do > * nothing. (Should be safe to examine xmin > without getting > * buffer's content lock, since xmin never > changes in an existing > * tuple.) > */ > if Hmm ... I think that code is OK but the comment needs work. Here we are necessarily looking for a pretty recent value of xmin (it has to be later than GlobalXmin), so there's no need to worry that it might get changed to FrozenXID. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap vacuum & cleanup locks
On Tue, Nov 8, 2011 at 10:26 AM, Tom Lane wrote: > It strikes me that the only case where vacuum now has to wait is where > it needs to freeze an old XID. Couldn't it do that without insisting on > exclusive access? We only need exclusive access if we're going to move > data around, but we could have a code path in vacuum that just replaces > old XIDs with FrozenXID without moving/deleting anything. Interesting idea. I think in general we insist that you must have a buffer content lock to inspect the tuple visibility info, in which case that would be safe. But I'm not sure we do that absolutely everywhere. For instance, just last night I noticed this: /* * If xmin isn't what we're expecting, the slot must have been * recycled and reused for an unrelated tuple. This implies that * the latest version of the row was deleted, so we need do * nothing. (Should be safe to examine xmin without getting * buffer's content lock, since xmin never changes in an existing * tuple.) */ if (!TransactionIdEquals(HeapTupleHeaderGetXmin(tuple.t_data), priorXmax)) { ReleaseBuffer(buffer); return NULL; } Maybe we can convince ourselves that that case is OK, or fixable; I'm not sure whether there are any others. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap vacuum & cleanup locks
Robert Haas writes: > On Tue, Nov 8, 2011 at 2:26 AM, Simon Riggs wrote: >> I think we need to avoid long pin hold times generally. > In the case of a suspended sequential scan, which is the case where > this has most recently bitten me on a production system, it actually > seems rather unnecessary to hold the pin for a long period of time. > If we release the buffer pin, then someone could vacuum the buffer. This seems unlikely to be a productive line of thought. The only way you could release buffer pin is if you first copied all the tuples you need out of the page, and that seems like an unacceptable performance hit. We should not be penalizing foreground query operations for the benefit of background maintenance like VACUUM. (The fact that we do an equivalent thing in btree index scans isn't an argument for doing it here, because the tradeoffs are very different. In the index case, the amount of data to be copied is a great deal less; the length of time the lock would have to be held is often a great deal more; and releasing the lock quickly gives a performance benefit for other foreground operations, not only background maintenance.) It strikes me that the only case where vacuum now has to wait is where it needs to freeze an old XID. Couldn't it do that without insisting on exclusive access? We only need exclusive access if we're going to move data around, but we could have a code path in vacuum that just replaces old XIDs with FrozenXID without moving/deleting anything. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap vacuum & cleanup locks
On Tue, Nov 8, 2011 at 10:08 AM, Simon Riggs wrote: > On Tue, Nov 8, 2011 at 1:50 PM, Robert Haas wrote: >> But there's an efficiency argument against doing it that way. First, >> if we release the pin then we'll have to reacquire the buffer, which >> means taking and releasing a BufMappingLock, the buffer header >> spinlock, and the buffer content lock. Second, instead of returning a >> pointer to the data in the page, we'll have to copy the data out of >> the buffer before releasing the pin. > > The only way I can see this working is to optimise this in the > planner, so that when we have a nested loop within a loop, we avoid > having the row on the outer loop pinned while we perform the inner > loop. Hmm. I've actually never run into a problem that involved that particular situation. In any case, I think the issues are basically the same: keeping the pin improves performance; dropping it helps VACUUM. Both are important. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap vacuum & cleanup locks
On Tue, Nov 8, 2011 at 1:50 PM, Robert Haas wrote: > But there's an efficiency argument against doing it that way. First, > if we release the pin then we'll have to reacquire the buffer, which > means taking and releasing a BufMappingLock, the buffer header > spinlock, and the buffer content lock. Second, instead of returning a > pointer to the data in the page, we'll have to copy the data out of > the buffer before releasing the pin. The only way I can see this working is to optimise this in the planner, so that when we have a nested loop within a loop, we avoid having the row on the outer loop pinned while we perform the inner loop. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap vacuum & cleanup locks
On Tue, Nov 8, 2011 at 2:26 AM, Simon Riggs wrote: > On Tue, Nov 8, 2011 at 2:54 AM, Robert Haas wrote: >> It would still be nice to fix the case where we need to freeze a tuple >> that is on a page someone else has pinned, but I don't have any good >> ideas for how to do that. > > I think we need to avoid long pin hold times generally. In the case of a suspended sequential scan, which is the case where this has most recently bitten me on a production system, it actually seems rather unnecessary to hold the pin for a long period of time. If we release the buffer pin, then someone could vacuum the buffer. I haven't looked in detail at the issues, but in theory that doesn't seem like a huge problem: just remember which TIDs you've already looked at and, when you re-acquire the buffer, pick up where you left off. Any tuples that have been vacuumed away meanwhile weren't going to be visible to your scan anyway. But there's an efficiency argument against doing it that way. First, if we release the pin then we'll have to reacquire the buffer, which means taking and releasing a BufMappingLock, the buffer header spinlock, and the buffer content lock. Second, instead of returning a pointer to the data in the page, we'll have to copy the data out of the buffer before releasing the pin. The situation is similar (perhaps even simpler) for index-only scans. We could easily release the heap buffer pin after returning a tuple, but it will make things much slower if the next heap fetch hits the same page. I wonder if we could arrange for a vacuum that's waiting for a cleanup lock to signal the backends that could possibly be holding a conflicting pin. Sort of like what the startup process does during Hot Standby, except that instead of killing the people holding the pins, we'd send them a signal that says "if at all possible, could you please release those buffer pins right away?", and then the backends would try to comply. Actually making that work though seems a bit tricky, though, and getting it wrong would mean very, very rare, nearly unreproducible bugs. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap vacuum & cleanup locks
On Tue, Nov 8, 2011 at 2:54 AM, Robert Haas wrote: > It would still be nice to fix the case where we need to freeze a tuple > that is on a page someone else has pinned, but I don't have any good > ideas for how to do that. I think we need to avoid long pin hold times generally. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap vacuum & cleanup locks
On Fri, Nov 4, 2011 at 3:39 PM, Robert Haas wrote: > Doing that actually makes the patch simpler, so I went ahead and did > that in the attached version, along with the renaming mentioned above. Hearing no objections, I went ahead and committed this version. Thanks for coding this up; I think this is a very useful improvement. It would still be nice to fix the case where we need to freeze a tuple that is on a page someone else has pinned, but I don't have any good ideas for how to do that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap vacuum & cleanup locks
On Fri, Nov 4, 2011 at 3:12 PM, Simon Riggs wrote: > On Fri, Nov 4, 2011 at 6:18 PM, Robert Haas wrote: > >> Here's a new version. I fixed the second pass as discussed (which >> turned out to be trivial). To address the concern about relpages, I >> moved this pre-existing line to after we get the buffer lock: >> >> + vacrelstats->scanned_pages++; >> >> That appears to do the right thing. > > I think we need to count skipped pages also. I don't like the idea > that vacuum would just report less pages than there are in the table. > We'll just get requests to explain that. It's been skipping pages due to the visibility map since 8.4. This seems like small potatoes by comparison, but we could add some counters if you like. >> I found it kind of confusing that lazy_scan_page_for_wraparound() >> returns with the pin either held or not held depending on the return >> value, so I rearranged things very slightly so that it doesn't need to >> do that. I'm wondering whether we should rename that function to >> something like lazy_check_needs_freeze(). > > OK > >> I tested this out and discovered that "VACUUM FREEZE" doesn't set the >> for_wraparound flag. On further review, I think that we should >> probably conditionalize the behavior on the scan_all flag that >> lazy_vacuum_rel sets, rather than for_wraparound. Otherwise, there's >> no way for the user to manually force relfrozenxid to advance, which >> doesn't seem good. I haven't made that change in this version, >> though. > > Agreed, separate patch. Doing that actually makes the patch simpler, so I went ahead and did that in the attached version, along with the renaming mentioned above. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company vacuum_skip_busy_pages.v3.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap vacuum & cleanup locks
On Fri, Nov 4, 2011 at 6:18 PM, Robert Haas wrote: > Here's a new version. I fixed the second pass as discussed (which > turned out to be trivial). To address the concern about relpages, I > moved this pre-existing line to after we get the buffer lock: > > + vacrelstats->scanned_pages++; > > That appears to do the right thing. I think we need to count skipped pages also. I don't like the idea that vacuum would just report less pages than there are in the table. We'll just get requests to explain that. > I found it kind of confusing that lazy_scan_page_for_wraparound() > returns with the pin either held or not held depending on the return > value, so I rearranged things very slightly so that it doesn't need to > do that. I'm wondering whether we should rename that function to > something like lazy_check_needs_freeze(). OK > I tested this out and discovered that "VACUUM FREEZE" doesn't set the > for_wraparound flag. On further review, I think that we should > probably conditionalize the behavior on the scan_all flag that > lazy_vacuum_rel sets, rather than for_wraparound. Otherwise, there's > no way for the user to manually force relfrozenxid to advance, which > doesn't seem good. I haven't made that change in this version, > though. Agreed, separate patch. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap vacuum & cleanup locks
On Thu, Nov 3, 2011 at 12:57 PM, Simon Riggs wrote: > On Thu, Nov 3, 2011 at 2:22 PM, Robert Haas wrote: >> On Thu, Nov 3, 2011 at 9:52 AM, Simon Riggs wrote: >>> On Thu, Nov 3, 2011 at 1:26 PM, Robert Haas wrote: >>> I think that should be OK, but: - It looks to me like you haven't done anything about the second heap pass. That should probably get a similar fix. >>> >>> I was assuming this worked with Pavan's patch to remove second pass. >> >> It's not entirely certain that will make it into 9.2, so I would >> rather get this done first. If you want I can pick up what you've >> done and send you back a version that addresses this. > > OK, that seems efficient. Thanks. Here's a new version. I fixed the second pass as discussed (which turned out to be trivial). To address the concern about relpages, I moved this pre-existing line to after we get the buffer lock: + vacrelstats->scanned_pages++; That appears to do the right thing. I found it kind of confusing that lazy_scan_page_for_wraparound() returns with the pin either held or not held depending on the return value, so I rearranged things very slightly so that it doesn't need to do that. I'm wondering whether we should rename that function to something like lazy_check_needs_freeze(). I tested this out and discovered that "VACUUM FREEZE" doesn't set the for_wraparound flag. On further review, I think that we should probably conditionalize the behavior on the scan_all flag that lazy_vacuum_rel sets, rather than for_wraparound. Otherwise, there's no way for the user to manually force relfrozenxid to advance, which doesn't seem good. I haven't made that change in this version, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company vacuum_skip_busy_pages.v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap vacuum & cleanup locks
On Thu, Nov 3, 2011 at 2:22 PM, Robert Haas wrote: > On Thu, Nov 3, 2011 at 9:52 AM, Simon Riggs wrote: >> On Thu, Nov 3, 2011 at 1:26 PM, Robert Haas wrote: >> >>> I think that should be OK, but: >>> >>> - It looks to me like you haven't done anything about the second heap >>> pass. That should probably get a similar fix. >> >> I was assuming this worked with Pavan's patch to remove second pass. > > It's not entirely certain that will make it into 9.2, so I would > rather get this done first. If you want I can pick up what you've > done and send you back a version that addresses this. OK, that seems efficient. Thanks. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap vacuum & cleanup locks
On Thu, Nov 3, 2011 at 9:52 AM, Simon Riggs wrote: > On Thu, Nov 3, 2011 at 1:26 PM, Robert Haas wrote: > >> I think that should be OK, but: >> >> - It looks to me like you haven't done anything about the second heap >> pass. That should probably get a similar fix. > > I was assuming this worked with Pavan's patch to remove second pass. It's not entirely certain that will make it into 9.2, so I would rather get this done first. If you want I can pick up what you've done and send you back a version that addresses this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap vacuum & cleanup locks
On Thu, Nov 3, 2011 at 1:26 PM, Robert Haas wrote: > I think that should be OK, but: > > - It looks to me like you haven't done anything about the second heap > pass. That should probably get a similar fix. I was assuming this worked with Pavan's patch to remove second pass. Not in any rush to commit this, so will wait till that is thru. > - I think that this is going to screw up the reltuples calculation > unless we fudge it somehow. The number of scanned pages has already > been incremented by the time your new code is reached. Yeh, I'll have a look at that in more detail. Thanks for the review. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap vacuum & cleanup locks
On Thu, Nov 3, 2011 at 7:15 AM, Simon Riggs wrote: >> A while >> back, someone (Greg Stark? me?) floated the idea of not waiting for >> the cleanup lock. If we can't get it immediately, or within some >> short period of time, then we just skip the page and continue on. > > Separately, that sounds like a great idea and it's simple to implement > - patch attached. Oh, that's kind of clever. I was thinking that you'd have to disable this entirely for anti-wraparound vacuum, but the way you've done it avoids that. You'll still have to wait if there's a competing pin on a buffer that contains tuples actually in need of freezing, but that should be relatively rare. > Enhancements to that are that I don't see any particular reason why > Also, ISTM that LockBufferForCleanup() waits for just a single buffer, > but it could equally well wait for multiple buffers at the same time. > By this, we would then be able to simply register our interest in > multiple buffers and get woken as soon as one of them were free. That > way we could read the blocks sequentially, but lock and clean them out > of sequence if necessary. Do this in chunks, so it plays nicely with > buffer strategy. (Patch doesn't do that yet). I doubt this would help much. The real issue is with open cursors, and those can easily be left open for long enough that those optimizations won't help. I think the patch as it stands is probably gets just about all of the benefit that can be had from this approach while still being reasonably simple. > (Not sure if the patch handles vacuum map correctly if we skip the > page, but its a reasonable prototype for discussion). Yeah. I think that should be OK, but: - It looks to me like you haven't done anything about the second heap pass. That should probably get a similar fix. - I think that this is going to screw up the reltuples calculation unless we fudge it somehow. The number of scanned pages has already been incremented by the time your new code is reached. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap vacuum & cleanup locks
On Sun, Jun 5, 2011 at 4:03 AM, Robert Haas wrote: > We've occasionally seen problems with VACUUM getting stuck for failure > to acquire a cleanup lock due to, for example, a cursor holding a pin > on the buffer page. In the worst case, this can cause an undetected > deadlock, if the backend holding the buffer pin blocks trying to > acquire a heavyweight lock that is in turn blocked by VACUUM. Those deadlocks can be detected in exactly the same way as is used for Hot Standby. Cleanup waiter registers interest in pin, anyone with a lock request that must wait checks to see if they hold a pin that would cause deadlock. I'll look at doing a patch for that. Shouldn't take long. > A while > back, someone (Greg Stark? me?) floated the idea of not waiting for > the cleanup lock. If we can't get it immediately, or within some > short period of time, then we just skip the page and continue on. Separately, that sounds like a great idea and it's simple to implement - patch attached. Enhancements to that are that I don't see any particular reason why the heap pages need to be vacuumed in exactly sequential order. If they are on disk, reading sequentially is useful, in which case nobody has a pin and so we will continue. But if the blocks are already in shared_buffers, then the sequential order doesn't matter at all. So we could skip pages and then return to them later on. Also, ISTM that LockBufferForCleanup() waits for just a single buffer, but it could equally well wait for multiple buffers at the same time. By this, we would then be able to simply register our interest in multiple buffers and get woken as soon as one of them were free. That way we could read the blocks sequentially, but lock and clean them out of sequence if necessary. Do this in chunks, so it plays nicely with buffer strategy. (Patch doesn't do that yet). (Not sure if the patch handles vacuum map correctly if we skip the page, but its a reasonable prototype for discussion). -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services vacuum_skip_busy_pages.v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap vacuum & cleanup locks
On Tue, Jun 7, 2011 at 3:24 PM, Greg Stark wrote: > Well it's super-exclusive-vacuum-lock avoidance techniques. Why > shouldn't it make more sense to try to reduce the frequency and impact > of the single-purpose outlier in a non-critical-path instead of > burdening every other data reader with extra overhead? > > I think Robert's plan is exactly right though I would phrase it > differently. We should get the exclusive lock, freeze/kill any xids > and line pointers, then if the pin-count is 1 do the compaction. I wrote a really neat patch to do this today... and then, as I thought about it some more, I started to think that it's probably unsafe. Here's the trouble: with this approach, we assume that it's OK to change the contents of the line pointer while holding only an exclusive lock on the buffer. But there is a good deal of code out there that thinks it's OK to examine a line pointer with only a pin on the buffer (no lock). You need a content lock to scan the item pointer array, but once you've identified an item of interest, you're entitled to assume that it won't be modified while you hold a buffer pin. Now, if you've identified a particular tuple as being visible to your scan, then you might think that VACUUM shouldn't be removing it anyway. But I think that's only true for MVCC scans - for example, what happens under SnapshotNow semantics? But then then on third thought, if you've also got an MVCC snapshot taken before the start of the SnapshotNow scan, you are probably OK, because your advertised xmin should prevent anyone from removing anything anyway, so how do you actually provoke a failure? Anyway, I'm attaching the patch, in case anyone has any ideas on where to go with this. > I'm really wishing we had more bits in the vm. It looks like we could use: > - contains not-all-visible tuples > - contains not-frozen xids > - in need of compaction > > I'm sure we could find a use for one more page-level vm bit too. We've got plenty of room for more page-level bits, if we need them. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company vacuum-unstick-v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap vacuum & cleanup locks
On Tue, Jun 7, 2011 at 3:43 PM, Simon Riggs wrote: > On Tue, Jun 7, 2011 at 8:24 PM, Greg Stark wrote: >> On Mon, Jun 6, 2011 at 11:30 PM, Simon Riggs wrote: >>> But I think you've hit the important point here. The problem is not >>> whether VACUUM waits for the pin, its that the pins can be held for >>> extended periods. >> >> Yes >> >>> It makes more sense to try to limit pin hold times than it does to >>> come up with pin avoidance techniques. >> >> Well it's super-exclusive-vacuum-lock avoidance techniques. Why >> shouldn't it make more sense to try to reduce the frequency and impact >> of the single-purpose outlier in a non-critical-path instead of >> burdening every other data reader with extra overhead? >> >> I think Robert's plan is exactly right though I would phrase it >> differently. We should get the exclusive lock, freeze/kill any xids >> and line pointers, then if the pin-count is 1 do the compaction. > > Would that also be possible during recovery? > > A similar problem exists with Hot Standby, so I'm worried fixing just > VACUUMs would be a kluge. We have to do the same operation on both the master and standby, so if the master decides to skip the compaction then the slave will skip it as well (and need not worry about waiting for pin-count 1). But if the master does the compaction then the slave will have to get a matching cleanup lock, just as now. Your idea of somehow adjusting things so that we don't hold the buffer pin for a long period of time would be better in that regard, but I'm not sure how to do it. Presumably we could rejigger things to copy the tuples instead of holding a pin, but that would carry a performance penalty for the (very common) case where there is no conflict with VACUUM. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap vacuum & cleanup locks
On Tue, Jun 7, 2011 at 8:24 PM, Greg Stark wrote: > On Mon, Jun 6, 2011 at 11:30 PM, Simon Riggs wrote: >> But I think you've hit the important point here. The problem is not >> whether VACUUM waits for the pin, its that the pins can be held for >> extended periods. > > Yes > >> It makes more sense to try to limit pin hold times than it does to >> come up with pin avoidance techniques. > > Well it's super-exclusive-vacuum-lock avoidance techniques. Why > shouldn't it make more sense to try to reduce the frequency and impact > of the single-purpose outlier in a non-critical-path instead of > burdening every other data reader with extra overhead? > > I think Robert's plan is exactly right though I would phrase it > differently. We should get the exclusive lock, freeze/kill any xids > and line pointers, then if the pin-count is 1 do the compaction. Would that also be possible during recovery? A similar problem exists with Hot Standby, so I'm worried fixing just VACUUMs would be a kluge. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap vacuum & cleanup locks
On Mon, Jun 6, 2011 at 11:30 PM, Simon Riggs wrote: > But I think you've hit the important point here. The problem is not > whether VACUUM waits for the pin, its that the pins can be held for > extended periods. Yes > It makes more sense to try to limit pin hold times than it does to > come up with pin avoidance techniques. Well it's super-exclusive-vacuum-lock avoidance techniques. Why shouldn't it make more sense to try to reduce the frequency and impact of the single-purpose outlier in a non-critical-path instead of burdening every other data reader with extra overhead? I think Robert's plan is exactly right though I would phrase it differently. We should get the exclusive lock, freeze/kill any xids and line pointers, then if the pin-count is 1 do the compaction. I'm really wishing we had more bits in the vm. It looks like we could use: - contains not-all-visible tuples - contains not-frozen xids - in need of compaction I'm sure we could find a use for one more page-level vm bit too. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap vacuum & cleanup locks
On Mon, Jun 6, 2011 at 8:02 AM, Heikki Linnakangas wrote: > On 06.06.2011 09:35, Jim Nasby wrote: >> >> I've had a related idea that I haven't looked into... if you're scanning a >> relation (ie: index scan, seq scan) I've wondered if it would be more >> efficient to deal with the entire page at once, possibly be making a copy of >> it. This would reduce the number of times you pin the page (often quite >> dramatically). I realize that means copying the entire page, but I suspect >> that would occur entirely in the L1 cache, which would be fast. > > We already do that. When an index scan moves to an index page, the heap tid > pointers of all the matching index tuples are copied to backend-private > memory in one go, and the lock is released. And for a seqscan, the > visibility of all the tuples on the page is checked in one go while holding > the lock, then the lock is released but the pin is kept. The pin is only > released after all the tuples have been read. There's no repeated pin-unpin > for each tuple. But I think you've hit the important point here. The problem is not whether VACUUM waits for the pin, its that the pins can be held for extended periods. It makes more sense to try to limit pin hold times than it does to come up with pin avoidance techniques. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap vacuum & cleanup locks
On Mon, Jun 6, 2011 at 12:49 PM, Alvaro Herrera wrote: > Excerpts from Robert Haas's message of lun jun 06 08:06:10 -0400 2011: > >> But the problem of vacuum stalling out because it can't get the >> cleanup lock is a very real one. I've seen at least one customer hit >> this in production, and it was pretty painful. Now, granted, you need >> some bad application design, too: you have to leave a cursor lying >> around instead of running it to completion and then stopping. But >> supposing you do make that mistake, you might hope that it wouldn't >> cause VACUUM starvation, which is what happens today. IOW, I'm less >> worried about whether the cleanup lock is slowing vacuum down than I >> am about eliminating the pathological cases where an autovacuum >> workers gets pinned down, stuck waiting for a cleanup lock that never >> arrives. Now the table doesn't get vacuumed (bad) and the system as a >> whole is one AV worker short of what it's supposed to have (also bad). > > One of the good things about your proposal is that (AFAICS) you can > freeze tuples without the cleanup lock, so the antiwraparound cleanup > would still work. Yeah, I think that's a major selling point. VACUUM getting stuck is Bad. Anti-wraparound VACUUM getting stuck is Really Bad. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap vacuum & cleanup locks
Excerpts from Robert Haas's message of lun jun 06 08:06:10 -0400 2011: > But the problem of vacuum stalling out because it can't get the > cleanup lock is a very real one. I've seen at least one customer hit > this in production, and it was pretty painful. Now, granted, you need > some bad application design, too: you have to leave a cursor lying > around instead of running it to completion and then stopping. But > supposing you do make that mistake, you might hope that it wouldn't > cause VACUUM starvation, which is what happens today. IOW, I'm less > worried about whether the cleanup lock is slowing vacuum down than I > am about eliminating the pathological cases where an autovacuum > workers gets pinned down, stuck waiting for a cleanup lock that never > arrives. Now the table doesn't get vacuumed (bad) and the system as a > whole is one AV worker short of what it's supposed to have (also bad). One of the good things about your proposal is that (AFAICS) you can freeze tuples without the cleanup lock, so the antiwraparound cleanup would still work. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap vacuum & cleanup locks
On Mon, Jun 6, 2011 at 2:36 AM, Pavan Deolasee wrote: > Do we know if this is really a problem though ? The deadlock for > example, can happen only when a backend tries to get a table level > conflicting lock while holding the buffer pin and I am not sure if we > do that. The deadlock isn't terribly common, because, as you say, you need the process holding the buffer pin to try to take a lock on the relation being vacuumed that is strong enough to conflict with ShareUpdateExclusiveLock. That's a slightly unusual thing to do. But the problem of vacuum stalling out because it can't get the cleanup lock is a very real one. I've seen at least one customer hit this in production, and it was pretty painful. Now, granted, you need some bad application design, too: you have to leave a cursor lying around instead of running it to completion and then stopping. But supposing you do make that mistake, you might hope that it wouldn't cause VACUUM starvation, which is what happens today. IOW, I'm less worried about whether the cleanup lock is slowing vacuum down than I am about eliminating the pathological cases where an autovacuum workers gets pinned down, stuck waiting for a cleanup lock that never arrives. Now the table doesn't get vacuumed (bad) and the system as a whole is one AV worker short of what it's supposed to have (also bad). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap vacuum & cleanup locks
On Sun, Jun 5, 2011 at 4:03 AM, Robert Haas wrote: > We've occasionally seen problems with VACUUM getting stuck for failure > to acquire a cleanup lock due to, for example, a cursor holding a pin > on the buffer page. In the worst case, this can cause an undetected > deadlock, if the backend holding the buffer pin blocks trying to > acquire a heavyweight lock that is in turn blocked by VACUUM. A while > back, someone (Greg Stark? me?) floated the idea of not waiting for > the cleanup lock. If we can't get it immediately, or within some > short period of time, then we just skip the page and continue on. > > Today I had what might be a better idea: don't try to acquire a > cleanup lock at all. Instead, acquire an exclusive lock. After > having done so, observe the pin count. If there are no other buffer > pins, that means our exclusive lock is actually a cleanup lock, and we > proceed as now. If other buffer pins do exist, then we can't > defragment the page, but that doesn't mean no useful work can be done: > we can still mark used line pointers dead, or dead line pointers > unused. We cannot defragment, but that can be done either by the next > VACUUM or by a HOT cleanup. We can even arrange - using existing > mechanism - to leave behind a hint that the page is a good candidate > for a HOT cleanup, by setting pd_prune_xid to, say, FrozenXID. > > Like the idea of skipping pages on which we can't acquire a cleanup > lock altogether, this should prevent VACUUM from getting stuck trying > to lock a heap page. While buffer pins can be held for extended > periods of time, I don't think there is any operation that holds a > buffer content lock more than very briefly. Furthermore, unlike the > idea of skipping the page altogether, we could use this approach even > during an anti-wraparound vacuum. > > Thoughts? Not waiting seems like a good idea. Not returning to the block while it is in RAM or not cleaning the block at all would cause a different performance issues, which I would wish to avoid. Hot Standby has specific code to avoid this situation. Perhaps you could copy that, not sure. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap vacuum & cleanup locks
On 06.06.2011 09:35, Jim Nasby wrote: I've had a related idea that I haven't looked into... if you're scanning a relation (ie: index scan, seq scan) I've wondered if it would be more efficient to deal with the entire page at once, possibly be making a copy of it. This would reduce the number of times you pin the page (often quite dramatically). I realize that means copying the entire page, but I suspect that would occur entirely in the L1 cache, which would be fast. We already do that. When an index scan moves to an index page, the heap tid pointers of all the matching index tuples are copied to backend-private memory in one go, and the lock is released. And for a seqscan, the visibility of all the tuples on the page is checked in one go while holding the lock, then the lock is released but the pin is kept. The pin is only released after all the tuples have been read. There's no repeated pin-unpin for each tuple. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap vacuum & cleanup locks
On Sun, Jun 5, 2011 at 8:33 AM, Robert Haas wrote: > We've occasionally seen problems with VACUUM getting stuck for failure > to acquire a cleanup lock due to, for example, a cursor holding a pin > on the buffer page. In the worst case, this can cause an undetected > deadlock, if the backend holding the buffer pin blocks trying to > acquire a heavyweight lock that is in turn blocked by VACUUM. A while > back, someone (Greg Stark? me?) floated the idea of not waiting for > the cleanup lock. If we can't get it immediately, or within some > short period of time, then we just skip the page and continue on. > Do we know if this is really a problem though ? The deadlock for example, can happen only when a backend tries to get a table level conflicting lock while holding the buffer pin and I am not sure if we do that. The contention issue would probably make sense for small tables because for large to very large tables, the probability that a backend and vacuum would process the same page would be quite low. With the current default for vac_threshold, the small tables can get vacuumed very frequently and if they are also heavily accessed, the cleanup lock can become a bottleneck. Another issue that might be worth paying attention to is the single pass vacuum that I am currently working on. The design that we agreed up on, assumes that the index vacuum must clear index pointers to all the dead line pointers. If we skip any page, we must at least collect the existing dead line pointers and remove those index pointers. If we create dead line pointers and we want to vacuum them later, we store the LSN in the page and that may require defrag. Of course, we can work around that, but I think it will be useful if we do some tests to show that the cleanup lock is indeed a major bottleneck. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap vacuum & cleanup locks
On Jun 6, 2011, at 1:00 AM, Robert Haas wrote: > On Mon, Jun 6, 2011 at 12:19 AM, Itagaki Takahiro > wrote: >> On Sun, Jun 5, 2011 at 12:03, Robert Haas wrote: >>> If other buffer pins do exist, then we can't >>> defragment the page, but that doesn't mean no useful work can be done: >>> we can still mark used line pointers dead, or dead line pointers >>> unused. We cannot defragment, but that can be done either by the next >>> VACUUM or by a HOT cleanup. >> >> This is just an idea -- Is it possible to have copy-on-write techniques? >> VACUUM allocates a duplicated page for the pinned page, and copy valid >> tuples into the new page. Following buffer readers after the VACUUM will >> see the cloned page instead of the old pinned one. > > Heikki suggested the same thing, and it's not a bad idea, but I think > it would be more work to implement than what I proposed. The caller > would need to be aware that, if it tries to re-acquire a content lock > on the same page, the offset of the tuple within the page might > change. I'm not sure how much work would be required to cope with > that possibility. I've had a related idea that I haven't looked into... if you're scanning a relation (ie: index scan, seq scan) I've wondered if it would be more efficient to deal with the entire page at once, possibly be making a copy of it. This would reduce the number of times you pin the page (often quite dramatically). I realize that means copying the entire page, but I suspect that would occur entirely in the L1 cache, which would be fast. So perhaps instead of copy on write we should try for copy on read on all appropriate plan nodes. On a related note, I've also wondered if it would be useful to allow nodes to deal with more than one tuple at a time; the idea being that it's better to execute a smaller chunk of code over a bigger chunk of data instead of dribbling tuples through an entire execution tree one at a time. Perhaps that will only be useful if nodes are executing in parallel... -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap vacuum & cleanup locks
On Mon, Jun 6, 2011 at 12:19 AM, Itagaki Takahiro wrote: > On Sun, Jun 5, 2011 at 12:03, Robert Haas wrote: >> If other buffer pins do exist, then we can't >> defragment the page, but that doesn't mean no useful work can be done: >> we can still mark used line pointers dead, or dead line pointers >> unused. We cannot defragment, but that can be done either by the next >> VACUUM or by a HOT cleanup. > > This is just an idea -- Is it possible to have copy-on-write techniques? > VACUUM allocates a duplicated page for the pinned page, and copy valid > tuples into the new page. Following buffer readers after the VACUUM will > see the cloned page instead of the old pinned one. Heikki suggested the same thing, and it's not a bad idea, but I think it would be more work to implement than what I proposed. The caller would need to be aware that, if it tries to re-acquire a content lock on the same page, the offset of the tuple within the page might change. I'm not sure how much work would be required to cope with that possibility. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap vacuum & cleanup locks
On Sun, Jun 5, 2011 at 12:03, Robert Haas wrote: > If other buffer pins do exist, then we can't > defragment the page, but that doesn't mean no useful work can be done: > we can still mark used line pointers dead, or dead line pointers > unused. We cannot defragment, but that can be done either by the next > VACUUM or by a HOT cleanup. This is just an idea -- Is it possible to have copy-on-write techniques? VACUUM allocates a duplicated page for the pinned page, and copy valid tuples into the new page. Following buffer readers after the VACUUM will see the cloned page instead of the old pinned one. Of course, copy-on-writing is more complex than skipping pinned pages, but I wonder we cannot vacuum at all in some edge cases with the skipping method. -- Itagaki Takahiro -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] heap vacuum & cleanup locks
We've occasionally seen problems with VACUUM getting stuck for failure to acquire a cleanup lock due to, for example, a cursor holding a pin on the buffer page. In the worst case, this can cause an undetected deadlock, if the backend holding the buffer pin blocks trying to acquire a heavyweight lock that is in turn blocked by VACUUM. A while back, someone (Greg Stark? me?) floated the idea of not waiting for the cleanup lock. If we can't get it immediately, or within some short period of time, then we just skip the page and continue on. Today I had what might be a better idea: don't try to acquire a cleanup lock at all. Instead, acquire an exclusive lock. After having done so, observe the pin count. If there are no other buffer pins, that means our exclusive lock is actually a cleanup lock, and we proceed as now. If other buffer pins do exist, then we can't defragment the page, but that doesn't mean no useful work can be done: we can still mark used line pointers dead, or dead line pointers unused. We cannot defragment, but that can be done either by the next VACUUM or by a HOT cleanup. We can even arrange - using existing mechanism - to leave behind a hint that the page is a good candidate for a HOT cleanup, by setting pd_prune_xid to, say, FrozenXID. Like the idea of skipping pages on which we can't acquire a cleanup lock altogether, this should prevent VACUUM from getting stuck trying to lock a heap page. While buffer pins can be held for extended periods of time, I don't think there is any operation that holds a buffer content lock more than very briefly. Furthermore, unlike the idea of skipping the page altogether, we could use this approach even during an anti-wraparound vacuum. Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers