Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM
On Fri, Nov 3, 2017 at 1:34 PM, Tom Lane wrote: > Jeff Janes writes: > > With this v3 patch (assuming this is the one you just committed > > as ec42a1dcb30de235b252f6d4) am now getting make check failures. > > There's a followup commit already :-( > > regards, tom lane > Yeah, that fixed it. Thanks. Jeff
Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM
Jeff Janes writes: > With this v3 patch (assuming this is the one you just committed > as ec42a1dcb30de235b252f6d4) am now getting make check failures. There's a followup commit already :-( regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM
Hi Alvaro, With this v3 patch (assuming this is the one you just committed as ec42a1dcb30de235b252f6d4) am now getting make check failures. brin_summarize_range is returning unexpected values. CentOS6, PostgreSQL 11devel on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-18), 64-bit Cheers, Jeff regression.diffs Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM
Alvaro Herrera wrote: > I think your argument is sensible for some uses (where people run manual > VACUUM after loading data) but not others (where people just use manual > VACUUM in place of autovacuuming -- because they don't trust autovac, or > the schedule isn't convenient, or whatever other reason). I've seen > both things being done in production. BTW I also noticed that creating an index does summarize the first range (rather than leaving it empty, as this rationale would suggest). I think we changed this at the last minute before commit in 9.5 and never revisited; I'm inclined to change it for pg11 (but not now, of course). -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM
Tom Lane wrote: > Alvaro Herrera writes: > > Tom Lane wrote: > >> Do we still need the complication in brinsummarize to discriminate > >> against the last partial range? Now that the lock consideration > >> is gone, I think that might be a wart. > > > In the case of VACUUM, it's not desirable to create a summarization for > > the last partial range, because if the table is still being filled, that > > would slow down the insertion process. > > Hm. Okay, but you should change the comment then, because "we do not want > to spend one RelationGetNumberOfBlocks call" is a pretty weak reason. Changed. > Also, I think I would accept that argument for autovacuum, but maybe > not so much for a manual vacuum. Maybe you should drive it off > IsAutovacuumWorker rather than which operation is being done. I think your argument is sensible for some uses (where people run manual VACUUM after loading data) but not others (where people just use manual VACUUM in place of autovacuuming -- because they don't trust autovac, or the schedule isn't convenient, or whatever other reason). I've seen both things being done in production. If we do as you suggest, there is no way to do the manual vacuum without summarizing the partial also; the approach of doing the partial only in brin_summarize_new_values lets the user choose what to do. Once upon a time I thought about adding a reloption to let the user choose what to do, but I never got around to writing a patch. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM
Alvaro Herrera writes: > Tom Lane wrote: >> Do we still need the complication in brinsummarize to discriminate >> against the last partial range? Now that the lock consideration >> is gone, I think that might be a wart. > In the case of VACUUM, it's not desirable to create a summarization for > the last partial range, because if the table is still being filled, that > would slow down the insertion process. Hm. Okay, but you should change the comment then, because "we do not want to spend one RelationGetNumberOfBlocks call" is a pretty weak reason. Also, I think I would accept that argument for autovacuum, but maybe not so much for a manual vacuum. Maybe you should drive it off IsAutovacuumWorker rather than which operation is being done. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM
Tom Lane wrote: > Alvaro Herrera writes: > > > Yeah, I think this approach results in better code. The attached patch > > implements that, and it passes the test for me (incl. calling > > brin_summarize_new_values concurrently with vacuum, when running the > > insert; the former does include the final page range whereas the latter > > does not.) > > Hm, so IIUC the point is that once the placeholder tuple is in, we can > rely on concurrent inserters to update it for insertions into pages that > are added after we determine our scan stop point. But if the scan stop > point is chosen before inserting the placeholder, then we have a race > condition. Exactly. We don't need to scan those pages once the placeholder tuple is in. > The given code seems a brick or so short of a load, though: if the table > has been extended sufficiently, it could compute scanNumBlks as larger > than bs_pagesPerRange, no? You need to clamp the computation result. Oops, right. > Also, shouldn't the passed-in heapBlk always be a multiple of > pagesPerRange already? Yeah, I guess I can turn that into an assert. > Do we still need the complication in brinsummarize to discriminate > against the last partial range? Now that the lock consideration > is gone, I think that might be a wart. You mean this code? /* * Unless requested to summarize even a partial range, go away now if * we think the next range is partial. * * Maybe the table already grew to cover this range completely, but we * don't want to spend a whole RelationGetNumberOfBlocks to find out, * so just leave it for next time. */ if (!include_partial && (startBlk + pagesPerRange > heapNumBlocks)) break; In the case of VACUUM, it's not desirable to create a summarization for the last partial range, because if the table is still being filled, that would slow down the insertion process. So we pass include_partial=false there. In brin_summarize_new_values, the theory is that the user called that function because they're done loading (at least temporarily) so it's better to process the partial range. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >From 6d31088c39d455637179853a3c72a7d9e20fe053 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Thu, 2 Nov 2017 18:35:10 +0100 Subject: [PATCH v3] Fix summarization concurrent with relation extension --- src/backend/access/brin/brin.c | 91 -- 1 file changed, 62 insertions(+), 29 deletions(-) diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c index e6909d7aea..72ac8929bd 100644 --- a/src/backend/access/brin/brin.c +++ b/src/backend/access/brin/brin.c @@ -67,7 +67,7 @@ static BrinBuildState *initialize_brin_buildstate(Relation idxRel, BrinRevmap *revmap, BlockNumber pagesPerRange); static void terminate_brin_buildstate(BrinBuildState *state); static void brinsummarize(Relation index, Relation heapRel, BlockNumber pageRange, - double *numSummarized, double *numExisting); + bool include_partial, double *numSummarized, double *numExisting); static void form_and_insert_tuple(BrinBuildState *state); static void union_tuples(BrinDesc *bdesc, BrinMemTuple *a, BrinTuple *b); @@ -791,7 +791,7 @@ brinvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats) brin_vacuum_scan(info->index, info->strategy); - brinsummarize(info->index, heapRel, BRIN_ALL_BLOCKRANGES, + brinsummarize(info->index, heapRel, BRIN_ALL_BLOCKRANGES, false, &stats->num_index_tuples, &stats->num_index_tuples); heap_close(heapRel, AccessShareLock); @@ -909,7 +909,7 @@ brin_summarize_range(PG_FUNCTION_ARGS) RelationGetRelationName(indexRel; /* OK, do it */ - brinsummarize(indexRel, heapRel, heapBlk, &numSummarized, NULL); + brinsummarize(indexRel, heapRel, heapBlk, true, &numSummarized, NULL); relation_close(indexRel, ShareUpdateExclusiveLock); relation_close(heapRel, ShareUpdateExclusiveLock); @@ -1129,7 +1129,8 @@ terminate_brin_buildstate(BrinBuildState *state) } /* - * Summarize the given page range of the given index. + * On the given BRIN index, summarize the heap page range that corresponds + * to the heap block number given. * * This routine can run in parallel with insertions into the heap. To avoid * missing those values from the summary tuple, we first insert a placeholder @@ -1139,6 +1140,12 @@ terminate_brin_buildstate(BrinBuildState *state) * update of the index value happens in a loop,
Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM
Alvaro Herrera writes: > Alvaro Herrera wrote: >> Maybe a solution is to call RelationGetNumberOfBlocks() after the >> placeholder tuple has been inserted, for the case where we would be >> scanning past end of relation; passing InvalidBlockNumber as stop point >> would indicate to do things that way. I'll try with that approach now. > Yeah, I think this approach results in better code. The attached patch > implements that, and it passes the test for me (incl. calling > brin_summarize_new_values concurrently with vacuum, when running the > insert; the former does include the final page range whereas the latter > does not.) Hm, so IIUC the point is that once the placeholder tuple is in, we can rely on concurrent inserters to update it for insertions into pages that are added after we determine our scan stop point. But if the scan stop point is chosen before inserting the placeholder, then we have a race condition. The given code seems a brick or so short of a load, though: if the table has been extended sufficiently, it could compute scanNumBlks as larger than bs_pagesPerRange, no? You need to clamp the computation result. Also, shouldn't the passed-in heapBlk always be a multiple of pagesPerRange already? Do we still need the complication in brinsummarize to discriminate against the last partial range? Now that the lock consideration is gone, I think that might be a wart. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM
Alvaro Herrera wrote: > Maybe a solution is to call RelationGetNumberOfBlocks() after the > placeholder tuple has been inserted, for the case where we would be > scanning past end of relation; passing InvalidBlockNumber as stop point > would indicate to do things that way. I'll try with that approach now. Yeah, I think this approach results in better code. The attached patch implements that, and it passes the test for me (incl. calling brin_summarize_new_values concurrently with vacuum, when running the insert; the former does include the final page range whereas the latter does not.) Tomas Vondra wrote: > FWIW this patch fixes the issue for me - I can no longer reproduce the > bitmapscan vs. seqscan result discrepancies (even with the extra UPDATE > phase). Thanks for testing! This confirms that the issue was correctly identified. Would you try the current patch, which is better than the other one? It's significantly different that I think it invalidates prior testing. Thanks! -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >From b3568a5475526fbd7768bd4d74c6ad681ede920f Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Thu, 2 Nov 2017 18:35:10 +0100 Subject: [PATCH v2] Fix summarization concurrent with relation extension --- src/backend/access/brin/brin.c | 78 ++ 1 file changed, 49 insertions(+), 29 deletions(-) diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c index e6909d7aea..ad512c2c5f 100644 --- a/src/backend/access/brin/brin.c +++ b/src/backend/access/brin/brin.c @@ -67,7 +67,7 @@ static BrinBuildState *initialize_brin_buildstate(Relation idxRel, BrinRevmap *revmap, BlockNumber pagesPerRange); static void terminate_brin_buildstate(BrinBuildState *state); static void brinsummarize(Relation index, Relation heapRel, BlockNumber pageRange, - double *numSummarized, double *numExisting); + bool include_partial, double *numSummarized, double *numExisting); static void form_and_insert_tuple(BrinBuildState *state); static void union_tuples(BrinDesc *bdesc, BrinMemTuple *a, BrinTuple *b); @@ -791,7 +791,7 @@ brinvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats) brin_vacuum_scan(info->index, info->strategy); - brinsummarize(info->index, heapRel, BRIN_ALL_BLOCKRANGES, + brinsummarize(info->index, heapRel, BRIN_ALL_BLOCKRANGES, false, &stats->num_index_tuples, &stats->num_index_tuples); heap_close(heapRel, AccessShareLock); @@ -909,7 +909,7 @@ brin_summarize_range(PG_FUNCTION_ARGS) RelationGetRelationName(indexRel; /* OK, do it */ - brinsummarize(indexRel, heapRel, heapBlk, &numSummarized, NULL); + brinsummarize(indexRel, heapRel, heapBlk, true, &numSummarized, NULL); relation_close(indexRel, ShareUpdateExclusiveLock); relation_close(heapRel, ShareUpdateExclusiveLock); @@ -1129,7 +1129,8 @@ terminate_brin_buildstate(BrinBuildState *state) } /* - * Summarize the given page range of the given index. + * On the given BRIN index, summarize the heap page range that corresponds + * to the heap block number given. * * This routine can run in parallel with insertions into the heap. To avoid * missing those values from the summary tuple, we first insert a placeholder @@ -1139,6 +1140,12 @@ terminate_brin_buildstate(BrinBuildState *state) * update of the index value happens in a loop, so that if somebody updates * the placeholder tuple after we read it, we detect the case and try again. * This ensures that the concurrently inserted tuples are not lost. + * + * A further corner case is this routine being asked to summarize the partial + * range at the end of the table. heapNumBlocks is the (possibly outdated) + * table size; if we notice that the requested range lies beyond that size, + * we re-compute the table size after inserting the placeholder tuple, to + * avoid missing pages that were appended recently. */ static void summarize_range(IndexInfo *indexInfo, BrinBuildState *state, Relation heapRel, @@ -1160,6 +1167,20 @@ summarize_range(IndexInfo *indexInfo, BrinBuildState *state, Relation heapRel, heapBlk, phtup, phsz); /* +* Whenever we're scanning what we believe to be the final range on the +* table (i.e. one that might be partial) we need to recompute our idea of +* what the latest page is after inserting the placeholder tuple. This +* ensures that we don't miss pages just because they were extended by +* concurrent backends that didn't have the chance to see our placeholder +* tuple. Fortunately, this shou
Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM
Tom Lane wrote: > Alvaro Herrera writes: > > Rather than remove the capability, I'd be inclined to make > > brin_summarize_new_values summarize the final partial range, and have > > VACUUM not do it. Would that be too inconsistent? > > That doesn't really get you out of the problem that this is an abuse of > the relation extension lock, and is likely to cause issues when people > try to optimize that locking mechanism. Right. > Why is it that the regular technique doesn't work, ie create a placeholder > tuple and let it get added to by any insertions that happen? The problem is that we determine relation size (and scan stop point) before inserting the placeholder tuple, so any relation extension that occurs after we read the size is not covered by the scan. The reason we do this is to avoid calling RelationGetNumberOfBlocks once for each range, since it's known to be very expensive. Maybe a solution is to call RelationGetNumberOfBlocks() after the placeholder tuple has been inserted, for the case where we would be scanning past end of relation; passing InvalidBlockNumber as stop point would indicate to do things that way. I'll try with that approach now. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM
Hi, On 11/02/2017 06:45 PM, Alvaro Herrera wrote: > Tomas Vondra wrote: > >> Unfortunately, I think we still have a problem ... I've been wondering >> if we end up producing correct indexes, so I've done a simple test. > > Here's a proposed patch that should fix this problem (and does, in my > testing). Would you please give it a try? > > This patch changes two things: > > 1. in VACUUM or brin_summarize_new_values, we only process fully loaded > ranges, and ignore the partial range at end of table. > > 2. when summarization is requested on the partial range at the end of a > table, we acquire extension lock on the rel, then compute relation size > and run summarization with the lock held. This guarantees that we don't > miss any pages. This is bad for concurrency though, so it's only done > in that specific scenario. > FWIW this patch fixes the issue for me - I can no longer reproduce the bitmapscan vs. seqscan result discrepancies (even with the extra UPDATE phase). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM
Alvaro Herrera writes: > Tom Lane wrote: >> So what would happen if we just don't summarize partial ranges? > Index scan would always have to read all the heap pages for that partial > range. Maybe not a big issue, but when you finish loading a table, it'd > be good to have a mechanism to summarize that partial range ... I guess if the ranges are big this might not be nice. > Rather than remove the capability, I'd be inclined to make > brin_summarize_new_values summarize the final partial range, and have > VACUUM not do it. Would that be too inconsistent? That doesn't really get you out of the problem that this is an abuse of the relation extension lock, and is likely to cause issues when people try to optimize that locking mechanism. Why is it that the regular technique doesn't work, ie create a placeholder tuple and let it get added to by any insertions that happen? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM
Tom Lane wrote: > Alvaro Herrera writes: > > Tom Lane wrote: > >> If VACUUM and brin_summarize_new_values both ignore the partial > >> range, then what else would request this? Can't we just decree > >> that we don't summarize the partial range, period? > > > brin_summarize_range() can do it. > > So what would happen if we just don't summarize partial ranges? Index scan would always have to read all the heap pages for that partial range. Maybe not a big issue, but when you finish loading a table, it'd be good to have a mechanism to summarize that partial range ... Rather than remove the capability, I'd be inclined to make brin_summarize_new_values summarize the final partial range, and have VACUUM not do it. Would that be too inconsistent? -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM
Alvaro Herrera writes: > Tom Lane wrote: >> If VACUUM and brin_summarize_new_values both ignore the partial >> range, then what else would request this? Can't we just decree >> that we don't summarize the partial range, period? > brin_summarize_range() can do it. So what would happen if we just don't summarize partial ranges? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM
Tom Lane wrote: > Alvaro Herrera writes: > > 2. when summarization is requested on the partial range at the end of a > > table, we acquire extension lock on the rel, then compute relation size > > and run summarization with the lock held. This guarantees that we don't > > miss any pages. This is bad for concurrency though, so it's only done > > in that specific scenario. > > Hm, I wonder how this will play with the active proposals around > reimplementing relation extension locks. All that work seems to be > assuming that the extension lock is only held for a short time and > nothing much beyond physical extension is done while holding it. > I'm afraid that you may be introducing a risk of e.g. deadlocks > if you do this. Ouch ... yeah, that could be a problem. Another idea I had was to just insert the placeholder tuple while holding the extension lock, then release the lock while the summarization is done. It would be a bit of a break of the current separation of concerns, but I'm not convinced that the current setup is perfect, so maybe that's okay. > If VACUUM and brin_summarize_new_values both ignore the partial > range, then what else would request this? Can't we just decree > that we don't summarize the partial range, period? brin_summarize_range() can do it. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM
Alvaro Herrera writes: > 1. in VACUUM or brin_summarize_new_values, we only process fully loaded > ranges, and ignore the partial range at end of table. OK. > 2. when summarization is requested on the partial range at the end of a > table, we acquire extension lock on the rel, then compute relation size > and run summarization with the lock held. This guarantees that we don't > miss any pages. This is bad for concurrency though, so it's only done > in that specific scenario. Hm, I wonder how this will play with the active proposals around reimplementing relation extension locks. All that work seems to be assuming that the extension lock is only held for a short time and nothing much beyond physical extension is done while holding it. I'm afraid that you may be introducing a risk of e.g. deadlocks if you do this. If VACUUM and brin_summarize_new_values both ignore the partial range, then what else would request this? Can't we just decree that we don't summarize the partial range, period? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM
Tomas Vondra wrote: > Unfortunately, I think we still have a problem ... I've been wondering > if we end up producing correct indexes, so I've done a simple test. Here's a proposed patch that should fix this problem (and does, in my testing). Would you please give it a try? This patch changes two things: 1. in VACUUM or brin_summarize_new_values, we only process fully loaded ranges, and ignore the partial range at end of table. 2. when summarization is requested on the partial range at the end of a table, we acquire extension lock on the rel, then compute relation size and run summarization with the lock held. This guarantees that we don't miss any pages. This is bad for concurrency though, so it's only done in that specific scenario. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >From a760bfd44afeff8d1629599411ac7ce87acc7d26 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Thu, 2 Nov 2017 18:35:10 +0100 Subject: [PATCH] Fix summarization concurrent with relation extension --- src/backend/access/brin/brin.c | 91 -- 1 file changed, 70 insertions(+), 21 deletions(-) diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c index b3aa6d1ced..7696c0700c 100644 --- a/src/backend/access/brin/brin.c +++ b/src/backend/access/brin/brin.c @@ -29,6 +29,7 @@ #include "postmaster/autovacuum.h" #include "storage/bufmgr.h" #include "storage/freespace.h" +#include "storage/lmgr.h" #include "utils/builtins.h" #include "utils/index_selfuncs.h" #include "utils/memutils.h" @@ -68,6 +69,10 @@ static BrinBuildState *initialize_brin_buildstate(Relation idxRel, static void terminate_brin_buildstate(BrinBuildState *state); static void brinsummarize(Relation index, Relation heapRel, BlockNumber pageRange, double *numSummarized, double *numExisting); +static void determine_summarization_range(BlockNumber pageRange, + BlockNumber heapNumBlocks, + BlockNumber pagesPerRange, + BlockNumber *startBlk, BlockNumber *endBlk); static void form_and_insert_tuple(BrinBuildState *state); static void union_tuples(BrinDesc *bdesc, BrinMemTuple *a, BrinTuple *b); @@ -1253,30 +1258,16 @@ brinsummarize(Relation index, Relation heapRel, BlockNumber pageRange, BlockNumber startBlk; BlockNumber endBlk; - /* determine range of pages to process; nothing to do for an empty table */ - heapNumBlocks = RelationGetNumberOfBlocks(heapRel); - if (heapNumBlocks == 0) - return; - revmap = brinRevmapInitialize(index, &pagesPerRange, NULL); - if (pageRange == BRIN_ALL_BLOCKRANGES) + /* determine range of pages to process */ + heapNumBlocks = RelationGetNumberOfBlocks(heapRel); + determine_summarization_range(pageRange, heapNumBlocks, pagesPerRange, + &startBlk, &endBlk); + if (startBlk > heapNumBlocks) { - startBlk = 0; - endBlk = heapNumBlocks; - } - else - { - startBlk = (pageRange / pagesPerRange) * pagesPerRange; - /* Nothing to do if start point is beyond end of table */ - if (startBlk > heapNumBlocks) - { - brinRevmapTerminate(revmap); - return; - } - endBlk = startBlk + pagesPerRange; - if (endBlk > heapNumBlocks) - endBlk = heapNumBlocks; + brinRevmapTerminate(revmap); + return; } /* @@ -1287,9 +1278,31 @@ brinsummarize(Relation index, Relation heapRel, BlockNumber pageRange, { BrinTuple *tup; OffsetNumber off; + boolext_lock_held = false; CHECK_FOR_INTERRUPTS(); + /* +* Whenever we're scanning a range that would go past what we know to +* be end-of-relation, we need to ensure we scan to the true end of the +* relation, or we risk missing tuples in recently added pages. To do +* this, we hold the relation extension lock from here till we're done +* summarizing, and compute the relation size afresh now. The logic in +* determine_summarization_range ensures that this is not done in the +* common cases of vacuum or brin_summarize_new_values(), but instead +* only when we're specifically asked to summarize the last range in +* the relation. +*/ + if (heapBlk + pagesPerRange > hea
Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM
Alvaro Herrera writes: > Tom Lane wrote: >> Where are we on this --- do you want me to push the brin_doupdate >> fix I proposed, or were you intending to merge that into a >> larger patch? > Please push your fixes, I'll post my proposed patch for the other bug > afterwards; they are unrelated problems after all. OK, I was not sure if you considered them related or not. Will push. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM
Tom Lane wrote: > Alvaro Herrera writes: > >> Hmm, I'm pretty sure we stress-tested brin in pretty much the same way. > >> But I see this misbehavior too. Looking ... > > > Turns out that this is related to concurrent growth of the table while > > the summarization process is scanning -- so new pages have appeared at > > the end of the table after the end point has been determined. It would > > be a pain to determine number of blocks for each range, so I'm looking > > for a simple way to fix it without imposing so much overhead. > > Where are we on this --- do you want me to push the brin_doupdate > fix I proposed, or were you intending to merge that into a > larger patch? Please push your fixes, I'll post my proposed patch for the other bug afterwards; they are unrelated problems after all. If you prefer me to push your fixes, I can do that -- let me know. > If I'm to do it, is there a reason not to back-patch to all branches > with BRIN? No, all these fixes should go back to 9.5. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM
Alvaro Herrera writes: >> Hmm, I'm pretty sure we stress-tested brin in pretty much the same way. >> But I see this misbehavior too. Looking ... > Turns out that this is related to concurrent growth of the table while > the summarization process is scanning -- so new pages have appeared at > the end of the table after the end point has been determined. It would > be a pain to determine number of blocks for each range, so I'm looking > for a simple way to fix it without imposing so much overhead. Where are we on this --- do you want me to push the brin_doupdate fix I proposed, or were you intending to merge that into a larger patch? If I'm to do it, is there a reason not to back-patch to all branches with BRIN? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM
Alvaro Herrera wrote: > Tomas Vondra wrote: > > > FWIW I can reproduce this on 9.5, and I don't even need to run the > > UPDATE part. That is, INSERT + VACUUM running concurrently is enough to > > produce broken BRIN indexes :-( > > Hmm, I'm pretty sure we stress-tested brin in pretty much the same way. > But I see this misbehavior too. Looking ... Turns out that this is related to concurrent growth of the table while the summarization process is scanning -- so new pages have appeared at the end of the table after the end point has been determined. It would be a pain to determine number of blocks for each range, so I'm looking for a simple way to fix it without imposing so much overhead. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM
Tomas Vondra wrote: > FWIW I can reproduce this on 9.5, and I don't even need to run the > UPDATE part. That is, INSERT + VACUUM running concurrently is enough to > produce broken BRIN indexes :-( Hmm, I'm pretty sure we stress-tested brin in pretty much the same way. But I see this misbehavior too. Looking ... -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM
On 10/31/2017 11:44 PM, Tomas Vondra wrote: > ... > Unfortunately, I think we still have a problem ... I've been wondering > if we end up producing correct indexes, so I've done a simple test. > > 1) create the table as before > > 2) let the insert + vacuum run for some time, to see if there are > crashes (result: no crashes after one hour, inserting ~92M rows) > > 3) do a bunch of random updates on the data (while still doing the > concurrent vacuum in another session) > > 4) run a bunch of simple queries to compare the results, essentially > >-- BRIN index >SET enable_bitmapscan = on; >SELECT COUNT(*) FROM brin_test WHERE a = $1; > > >-- seq scan >SET enable_bitmapscan = on; >SELECT COUNT(*) FROM brin_test WHERE a = $1; > > and unfortunately what I get is not particularly pleasant: > > test=# set enable_bitmapscan = on; > SET > test=# select count(*) from brin_test where a = 0; > count > --- > 9062 > (1 row) > > test=# set enable_bitmapscan = off; > SET > test=# select count(*) from brin_test where a = 0; > count > --- > 9175 > (1 row) > > Attached is a SQL script with commands I used. You'll need to copy the > commands into multiple psql sessions, though, to simulate concurrent > activity). > FWIW I can reproduce this on 9.5, and I don't even need to run the UPDATE part. That is, INSERT + VACUUM running concurrently is enough to produce broken BRIN indexes :-( regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM
Hi, On 10/31/2017 08:46 PM, Tom Lane wrote: > I wrote: >> maybe >> we just have some run-of-the-mill bugs to find, like the off-the-end >> bug I spotted in brin_doupdate. There's apparently at least one >> more, but given the error message it must be something like not >> checking for a page to have turned into a revmap page. Shouldn't >> be too hard to find... > > Actually, I think it might be as simple as the attached. > brin_getinsertbuffer checks for the old page having turned into revmap, > but the "samepage" path in brin_doupdate does not :-( > > With this applied, Alvaro's version of the test case has survived > without error for quite a bit longer than its former MTBF. There > might still be some issues though in other code paths. > That does fix the crashes for me - I've been unable to reproduce any even after one hour (it took a couple of minutes to crash before). Unfortunately, I think we still have a problem ... I've been wondering if we end up producing correct indexes, so I've done a simple test. 1) create the table as before 2) let the insert + vacuum run for some time, to see if there are crashes (result: no crashes after one hour, inserting ~92M rows) 3) do a bunch of random updates on the data (while still doing the concurrent vacuum in another session) 4) run a bunch of simple queries to compare the results, essentially -- BRIN index SET enable_bitmapscan = on; SELECT COUNT(*) FROM brin_test WHERE a = $1; -- seq scan SET enable_bitmapscan = on; SELECT COUNT(*) FROM brin_test WHERE a = $1; and unfortunately what I get is not particularly pleasant: test=# set enable_bitmapscan = on; SET test=# select count(*) from brin_test where a = 0; count --- 9062 (1 row) test=# set enable_bitmapscan = off; SET test=# select count(*) from brin_test where a = 0; count --- 9175 (1 row) Attached is a SQL script with commands I used. You'll need to copy the commands into multiple psql sessions, though, to simulate concurrent activity). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services brin-test.sql Description: application/sql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM
I wrote: > maybe > we just have some run-of-the-mill bugs to find, like the off-the-end > bug I spotted in brin_doupdate. There's apparently at least one > more, but given the error message it must be something like not > checking for a page to have turned into a revmap page. Shouldn't > be too hard to find... Actually, I think it might be as simple as the attached. brin_getinsertbuffer checks for the old page having turned into revmap, but the "samepage" path in brin_doupdate does not :-( With this applied, Alvaro's version of the test case has survived without error for quite a bit longer than its former MTBF. There might still be some issues though in other code paths. regards, tom lane diff --git a/src/backend/access/brin/brin_pageops.c b/src/backend/access/brin/brin_pageops.c index 80f803e..b0f86f3 100644 *** a/src/backend/access/brin/brin_pageops.c --- b/src/backend/access/brin/brin_pageops.c *** brin_doupdate(Relation idxrel, BlockNumb *** 113,121 /* * Check that the old tuple wasn't updated concurrently: it might have ! * moved someplace else entirely ... */ ! if (!ItemIdIsNormal(oldlp)) { LockBuffer(oldbuf, BUFFER_LOCK_UNLOCK); --- 113,127 /* * Check that the old tuple wasn't updated concurrently: it might have ! * moved someplace else entirely, and for that matter the whole page ! * might've become a revmap page. Note that in the first two cases ! * checked here, the "oldlp" we just calculated is garbage; but ! * PageGetItemId() is simple enough that it was safe to do that ! * calculation anyway. */ ! if (!BRIN_IS_REGULAR_PAGE(oldpage) || ! oldoff > PageGetMaxOffsetNumber(oldpage) || ! !ItemIdIsNormal(oldlp)) { LockBuffer(oldbuf, BUFFER_LOCK_UNLOCK); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM
Alvaro Herrera writes: > Tom Lane wrote: >> I really don't understand how any of this "let's release the buffer >> lock and then take it back later" logic is supposed to work reliably. > So summarize_range first inserts the placeholder tuple, which is there > purposefully for other processes to update concurrently; then, without > blocking any other process, scan the page range and update the > placeholder tuple (this could take a long time, so it'd be a bad idea > to hold buffer lock for that long). Yeah, agreed, and your upthread point about avoiding deadlock when we need to take two buffer locks at the same time is also something that it's hard to see any other way around. Maybe we just have to find a way to make the existing structure work. The sticking point is that not only might the tuple we expected have been deleted, but someone could have put an unrelated tuple in its place. Are you confident that you can detect that and recover from it? If you're sure that brin_tuples_equal() is trustworthy for this purpose, then maybe we just have some run-of-the-mill bugs to find, like the off-the-end bug I spotted in brin_doupdate. There's apparently at least one more, but given the error message it must be something like not checking for a page to have turned into a revmap page. Shouldn't be too hard to find... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM
Tom Lane wrote: > I really don't understand how any of this "let's release the buffer > lock and then take it back later" logic is supposed to work reliably. So summarize_range first inserts the placeholder tuple, which is there purposefully for other processes to update concurrently; then, without blocking any other process, scan the page range and update the placeholder tuple (this could take a long time, so it'd be a bad idea to hold buffer lock for that long). I think what we should do is rethink the locking considerations in brin_doupdate vs. brinGetTupleForHeapBlock, and how they are used in summarize_range and brininsert. In summarize_range, instead of hoping that in some cases we will not need to re-obtain the placeholder tuple, just do that in all cases keeping the buffer locked until the tuple is updated. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM
Tom Lane wrote: > So in a few more runs this morning using Alvaro's simplified test case, > I have seen the following behaviors not previously reported: > 1. Crashes in PageIndexTupleOverwrite, which has the same "invalid index > offnum: %u" error report as PageIndexTupleDeleteNoCompact. I note the > same message appears in plain PageIndexTupleDelete as well. > I think we've been too hasty to assume all instances of this came out of > PageIndexTupleDeleteNoCompact. Ah, I wasn't paying close attention to the originator routine of the message, but you're right, I see this one too. > 2. Crashes in the data-insertion process, not only the process running > summarize_range: Yeah, I saw these. I was expecting it, since the two routines (brininsert and summarize_range) pretty much share the insertion protocol. > I really don't understand how any of this "let's release the buffer > lock and then take it back later" logic is supposed to work reliably. Yeah, evidently that was way too optimistic and I'll need to figure out a better mechanism to handle this. The intention was to avoid deadlocks while locking the target page for the insertion: by having both pages start unlocked we can simply lock them in block number order. If we keep the page containing the tuple locked, I don't see how to reliably avoid a deadlock while acquiring a buffer to insert the new tuple. > BTW, while I'm bitching, it seems fairly insane from a concurrency > standpoint that brin_getinsertbuffer is calling RecordPageWithFreeSpace > while holding at least one and possibly two buffer locks. Shouldn't > that be done someplace else? Hmm. I spent a lot of effort (commit ccc4c074994d) to avoid leaving pages uninitialized / unrecorded in FSM. I left this on purpose on the rationale that trying to fix it would make the callsites more convoluted (the retry logic doesn't help). But as I recall this was supposed to be done only in the rare case where the buffer could not be returned to caller ... but that's not what the current code does, so there is something wrong there. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM
So in a few more runs this morning using Alvaro's simplified test case, I have seen the following behaviors not previously reported: 1. Crashes in PageIndexTupleOverwrite, which has the same "invalid index offnum: %u" error report as PageIndexTupleDeleteNoCompact. I note the same message appears in plain PageIndexTupleDelete as well. I think we've been too hasty to assume all instances of this came out of PageIndexTupleDeleteNoCompact. 2. Crashes in the data-insertion process, not only the process running summarize_range: #1 0x003b78a33c75 in abort () at abort.c:92 #2 0x0086e527 in errfinish (dummy=) at elog.c:557 #3 0x0086f334 in elog_finish (elevel=, fmt=) at elog.c:1378 #4 0x0075d48f in PageIndexTupleDeleteNoCompact ( rel=, buf=2772, page=0x7f0188438080 "\002", offnum=39) at bufpage.c:995 #5 0x00476fd2 in brin_doupdate (idxrel=0x7f0186633e90, pagesPerRange=1, revmap=0xba, heapBlk=2542, oldbuf=2772, oldoff=39, origtup=0x2784010, origsz=8, newtup=0x2784098, newsz=400, samepage=0 '\000') at brin_pageops.c:270 #6 0x00475430 in brininsert (idxRel=0x7f0186633e90, values=0x7ffce7b827f0, nulls=0x7ffce7b828f0 "", heaptid=0x279fb3c, heapRel=, checkUnique=, indexInfo=0x279e370) at brin.c:292 #7 0x0061da18 in ExecInsertIndexTuples (slot=0x279e638, tupleid=0x279fb3c, estate=0x279dd10, noDupErr=0 '\000', specConflict=0x0, arbiterIndexes=0x0) at execIndexing.c:387 The postmaster log for this, with even more debug logging thrown in: 2017-10-31 11:58:03.466 EDT [23506] LOG: summarize_range(brin_test_a_idx,2542,2543) created placeholder at 68,39 2017-10-31 11:58:03.466 EDT [23506] STATEMENT: select brin_summarize_new_values('brin_test_a_idx') 2017-10-31 11:58:03.466 EDT [23506] LOG: brin_doupdate(brin_test_a_idx) at 68,39: getinsertbuffer returned page 70 2017-10-31 11:58:03.466 EDT [23506] STATEMENT: select brin_summarize_new_values('brin_test_a_idx') 2017-10-31 11:58:03.466 EDT [23506] LOG: deleting tuple 39 (of 39) in rel brin_test_a_idx page 68 2017-10-31 11:58:03.466 EDT [23506] STATEMENT: select brin_summarize_new_values('brin_test_a_idx') 2017-10-31 11:58:03.466 EDT [23506] LOG: brin_doupdate(brin_test_a_idx) at 68,39: placed at 70,1 2017-10-31 11:58:03.466 EDT [23506] STATEMENT: select brin_summarize_new_values('brin_test_a_idx') 2017-10-31 11:58:03.466 EDT [23509] LOG: brin_doupdate(brin_test_a_idx) at 68,39: getinsertbuffer returned page 70 2017-10-31 11:58:03.466 EDT [23509] STATEMENT: insert into brin_test values (repeat(chr(ascii('A') + int4(random() * 61)), int4(random() * 200))); 2017-10-31 11:58:03.466 EDT [23509] LOG: deleting tuple 39 (of 38) in rel brin_test_a_idx page 68 2017-10-31 11:58:03.466 EDT [23509] STATEMENT: insert into brin_test values (repeat(chr(ascii('A') + int4(random() * 61)), int4(random() * 200))); 2017-10-31 11:58:03.466 EDT [23509] PANIC: invalid index offnum: 39 2017-10-31 11:58:03.466 EDT [23509] STATEMENT: insert into brin_test values (repeat(chr(ascii('A') + int4(random() * 61)), int4(random() * 200))); 2017-10-31 11:58:03.466 EDT [23506] LOG: summarize_range(brin_test_a_idx,2542,2543): update at 68,39 succeeded 2017-10-31 11:58:03.466 EDT [23506] STATEMENT: select brin_summarize_new_values('brin_test_a_idx') 2017-10-31 11:58:04.517 EDT [23493] LOG: server process (PID 23509) was terminated by signal 6: Aborted So what evidently happened here is that brininsert decided it needed to insert into the placeholder tuple that summarize_range had just created, but by the time it got lock on the buffer, summarize_range had deleted the placeholder and instead created a new tuple on another page. HAH: I see how the connection to PageIndexTupleDeleteNoCompact's change applies. After re-locking the buffer, brin_doupdate tries to check whether the tuple's been deleted, but it needs to do it more like this, else it might be accessing off the end of the lp array: /* * Check that the old tuple wasn't updated concurrently: it might have * moved someplace else entirely ... */ - if (!ItemIdIsNormal(oldlp)) + if (oldoff > PageGetMaxOffsetNumber(oldpage) || + !ItemIdIsNormal(oldlp)) { LockBuffer(oldbuf, BUFFER_LOCK_UNLOCK); This is a pre-existing bug, but before 24992c6d, it would only have manifested in the (very?) unusual corner case that PageIndexDeleteNoCompact was able to reset the page to empty. However, when I fix that and then run Alvaro's test case, I get ERROR: corrupted BRIN index: inconsistent range map so there's more creepy-crawlies around here somewhere. I really don't understand how any of this "let's release the buffer lock and then take it back later" logic is supposed to work reliably. BTW, while I'm bitching, it seems fairly insane from a concurrency standpoint that brin_getinsertbuffer is calling RecordPageWithFreeSpace while holding at least
Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM
Michael Paquier writes: > On Tue, Oct 31, 2017 at 4:56 AM, Tom Lane wrote: >> Yeah, we're still missing an understanding of why we didn't see it >> before; the inadequate locking was surely there before. > Because 24992c6d has added a check on the offset number by using > PageIndexTupleDeleteNoCompact() in brin_doupdate() making checks > tighter, no? No, I don't see how it's tighter. The old code matched supplied offnum(s) against the indexes of not-unused items, and then after that loop it complained if they weren't all matched. So it should also have failed, albeit with a different error message, if it were passed an offnum corresponding to a no-longer-live tuple. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM
On Tue, Oct 31, 2017 at 4:56 AM, Tom Lane wrote: > Alvaro Herrera writes: >> Tom Lane wrote: >>> So: I put the blame on the fact that summarize_range() thinks that >>> the tuple offset it has for the placeholder tuple is guaranteed to >>> hold good, even across possibly-long intervals where it's holding >>> no lock on the containing buffer. > >> Yeah, I think this is a pretty reasonable explanation for the problem. >> I don't understand why it doesn't fail in 9.6. > > Yeah, we're still missing an understanding of why we didn't see it > before; the inadequate locking was surely there before. I'm guessing > that somehow the previous behavior of PageIndexDeleteNoCompact managed > to mask the problem (perhaps only by not throwing an error, which doesn't > imply that the index state was good afterwards). But I don't see quite > how it did that. Because 24992c6d has added a check on the offset number by using PageIndexTupleDeleteNoCompact() in brin_doupdate() making checks tighter, no? I have not tested, and I lack knowledge about the brin code, but it seems to me that if we had a similar check then things could likely blow up. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM
Alvaro Herrera writes: > Tom Lane wrote: >> So: I put the blame on the fact that summarize_range() thinks that >> the tuple offset it has for the placeholder tuple is guaranteed to >> hold good, even across possibly-long intervals where it's holding >> no lock on the containing buffer. > Yeah, I think this is a pretty reasonable explanation for the problem. > I don't understand why it doesn't fail in 9.6. Yeah, we're still missing an understanding of why we didn't see it before; the inadequate locking was surely there before. I'm guessing that somehow the previous behavior of PageIndexDeleteNoCompact managed to mask the problem (perhaps only by not throwing an error, which doesn't imply that the index state was good afterwards). But I don't see quite how it did that. One thing I think we do know now is that the bug is triggered by two concurrent executions of summarize_range. So I'd look for edge cases like whether the placeholder tuple can be deleted and then reinserted into the same lp index. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM
Tom Lane wrote: > So: I put the blame on the fact that summarize_range() thinks that > the tuple offset it has for the placeholder tuple is guaranteed to > hold good, even across possibly-long intervals where it's holding > no lock on the containing buffer. Yeah, I think this is a pretty reasonable explanation for the problem. I don't understand why it doesn't fail in 9.6. > Fixing this without creating new problems is beyond my familiarity > with the BRIN code. It looks like it might be nontrivial :-( Looking. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM
Thanks everyone for the analysis downthread. Here's a test case that provokes the crash faster. Initialize with create table brin_test (a text); create index on brin_test using brin (a) with (pages_per_range = 1); Then in one psql, run this: select brin_summarize_new_values('brin_test_a_idx') \watch 0.1 and a pgbench running a script with this line (one client is enough): insert into brin_test values (repeat(chr(ascii('A') + int4(random() * 61)), int4(random() * 200))); It crashes in 10-20 seconds for me. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM
I wrote: > Hmm. The index offnum being complained of is one past the end of the > lp array. I think I see what about that commit changed the behavior: > the old code for PageIndexDeleteNoCompact never changed the length > of the lp array, except in the corner case where the page is becoming > completely empty. The new code will shorten the lp array (decrease > phdr->pd_lower) if it's told to remove the last item. So if you make > two successive calls specifying the same offnum, and it's the last one > on the page, the second one will fail with the symptoms we see here. > However, so far as I can see, a sequence like that would have failed > before too, just with a different error message, because once the > first call had marked the item unused, the second call would not see > it as a candidate to match. So I'm not quite sure how that's related > ... but it seems like it must be. I'm still confused about why it didn't fail before, but after adding some additional code to log each call of PageIndexTupleDeleteNoCompact, I think I've got a smoking gun: 2017-10-30 18:18:44.321 EDT [10932] LOG: deleting tuple 292 (of 292) in rel brin_test_c_idx page 2 2017-10-30 18:18:44.321 EDT [10932] STATEMENT: vacuum brin_test 2017-10-30 18:18:44.393 EDT [10932] LOG: deleting tuple 292 (of 292) in rel brin_test_d_idx page 2 2017-10-30 18:18:44.393 EDT [10932] STATEMENT: vacuum brin_test 2017-10-30 18:18:53.428 EDT [10932] LOG: deleting tuple 186 (of 186) in rel brin_test_e_idx page 3 2017-10-30 18:18:53.428 EDT [10932] STATEMENT: vacuum brin_test 2017-10-30 18:19:13.794 EDT [10938] LOG: deleting tuple 186 (of 186) in rel brin_test_e_idx page 4 2017-10-30 18:19:13.794 EDT [10938] STATEMENT: insert into brin_test select mod(i,10)/25, mod(i,10)/25, mod(i,10)/25, mod(i,10)/25, md5((mod(i,10)/25)::text)::uuid from generate_series(1,10) s(i) 2017-10-30 18:19:13.795 EDT [10932] LOG: deleting tuple 186 (of 185) in rel brin_test_e_idx page 4 2017-10-30 18:19:13.795 EDT [10932] STATEMENT: vacuum brin_test 2017-10-30 18:19:13.795 EDT [10932] PANIC: invalid index offnum: 186 2017-10-30 18:19:13.795 EDT [10932] STATEMENT: vacuum brin_test So what happened here is that the inserting process decided to summarize concurrently with the VACUUM process, and the inserting process deleted (or maybe just updated/moved?) the placeholder tuple that VACUUM was expecting to delete, and then we get the PANIC because the tuple we're expecting to delete is already gone. So: I put the blame on the fact that summarize_range() thinks that the tuple offset it has for the placeholder tuple is guaranteed to hold good, even across possibly-long intervals where it's holding no lock on the containing buffer. Fixing this without creating new problems is beyond my familiarity with the BRIN code. It looks like it might be nontrivial :-( regards, tom lane diff --git a/src/backend/access/brin/brin_pageops.c b/src/backend/access/brin/brin_pageops.c index 80f803e..04ad804 100644 *** a/src/backend/access/brin/brin_pageops.c --- b/src/backend/access/brin/brin_pageops.c *** brin_doupdate(Relation idxrel, BlockNumb *** 243,249 if (extended) brin_page_init(BufferGetPage(newbuf), BRIN_PAGETYPE_REGULAR); ! PageIndexTupleDeleteNoCompact(oldpage, oldoff); newoff = PageAddItem(newpage, (Item) newtup, newsz, InvalidOffsetNumber, false, false); if (newoff == InvalidOffsetNumber) --- 243,249 if (extended) brin_page_init(BufferGetPage(newbuf), BRIN_PAGETYPE_REGULAR); ! PageIndexTupleDeleteNoCompact(idxrel, oldbuf, oldpage, oldoff); newoff = PageAddItem(newpage, (Item) newtup, newsz, InvalidOffsetNumber, false, false); if (newoff == InvalidOffsetNumber) diff --git a/src/backend/access/brin/brin_revmap.c b/src/backend/access/brin/brin_revmap.c index 22f2076..4d5dad3 100644 *** a/src/backend/access/brin/brin_revmap.c --- b/src/backend/access/brin/brin_revmap.c *** brinRevmapDesummarizeRange(Relation idxr *** 409,415 ItemPointerSetInvalid(&invalidIptr); brinSetHeapBlockItemptr(revmapBuf, revmap->rm_pagesPerRange, heapBlk, invalidIptr); ! PageIndexTupleDeleteNoCompact(regPg, regOffset); /* XXX record free space in FSM? */ MarkBufferDirty(regBuf); --- 409,415 ItemPointerSetInvalid(&invalidIptr); brinSetHeapBlockItemptr(revmapBuf, revmap->rm_pagesPerRange, heapBlk, invalidIptr); ! PageIndexTupleDeleteNoCompact(idxrel, regBuf, regPg, regOffset); /* XXX record free space in FSM? */ MarkBufferDirty(regBuf); diff --git a/src/backend/access/brin/brin_xlog.c b/src/backend/access/brin/brin_xlog.c index 60daa54..c8cc9f8 100644 *** a/src/backend/access/brin/brin_xlog.c --- b/src/backend/access/brin/brin_xlog.c *** brin_xlog_update(XLogReaderState *record *** 150,
Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM
Tomas Vondra writes: > On 10/30/2017 09:03 PM, Tom Lane wrote: >> [ scratches head ... ] Not sure how that could've introduced any >> problems? Will look. > Not sure, but I can confirm Michael's findings - I've been unable to > reproduce the issue on 1a4be103a5 even after 20 minutes, and on > 24992c6db9 it failed after only 2. Hmm. The index offnum being complained of is one past the end of the lp array. I think I see what about that commit changed the behavior: the old code for PageIndexDeleteNoCompact never changed the length of the lp array, except in the corner case where the page is becoming completely empty. The new code will shorten the lp array (decrease phdr->pd_lower) if it's told to remove the last item. So if you make two successive calls specifying the same offnum, and it's the last one on the page, the second one will fail with the symptoms we see here. However, so far as I can see, a sequence like that would have failed before too, just with a different error message, because once the first call had marked the item unused, the second call would not see it as a candidate to match. So I'm not quite sure how that's related ... but it seems like it must be. Anyway, my opinion ATM is that PageIndexTupleDeleteNoCompact is fine, and what we're looking at is some kind of bug in summarize_range or brin_doupdate, causing them to pass a offset beyond the end of the page in some corner case. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM
On 10/30/2017 09:03 PM, Tom Lane wrote: > Michael Paquier writes: >> b1328d78f is close enough, but per what I see that's actually not >> true. I have been able to reproduce the problem, which shows up within >> a window of 2-4 minutes. Still sometimes it can take way longer, and I >> spotted one test where it took 15 minutes to show up... So, bisecting >> with a test that looks for core files for 20 minutes, I am seeing that >> the following commit is actually at fault: > >> commit 24992c6db9fd40f342db1f22747ec9e56483796d >> Author: Tom Lane >> Date: Fri Sep 9 19:00:59 2016 -0400 > >> Rewrite PageIndexDeleteNoCompact into a form that only deletes 1 tuple. > > [ scratches head ... ] Not sure how that could've introduced any > problems? Will look. > Not sure, but I can confirm Michael's findings - I've been unable to reproduce the issue on 1a4be103a5 even after 20 minutes, and on 24992c6db9 it failed after only 2. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM
Michael Paquier writes: > b1328d78f is close enough, but per what I see that's actually not > true. I have been able to reproduce the problem, which shows up within > a window of 2-4 minutes. Still sometimes it can take way longer, and I > spotted one test where it took 15 minutes to show up... So, bisecting > with a test that looks for core files for 20 minutes, I am seeing that > the following commit is actually at fault: > commit 24992c6db9fd40f342db1f22747ec9e56483796d > Author: Tom Lane > Date: Fri Sep 9 19:00:59 2016 -0400 > Rewrite PageIndexDeleteNoCompact into a form that only deletes 1 tuple. [ scratches head ... ] Not sure how that could've introduced any problems? Will look. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM
On Mon, Oct 30, 2017 at 9:42 AM, Alvaro Herrera wrote: > Tomas Vondra wrote: >> FWIW I can reproduce this on REL_10_STABLE, but not on REL9_6_STABLE. So >> it seems to be due to something that changed in the last release. > > I've been trying to reproduce it for half an hour with no success (I > think my laptop is just too slow). I bet this is related to the > addition of PageIndexTupleOverwrite (commit b1328d78f) though I find it > more likely that this was not *caused* by that commit but rather just > made it easier to hit. b1328d78f is close enough, but per what I see that's actually not true. I have been able to reproduce the problem, which shows up within a window of 2-4 minutes. Still sometimes it can take way longer, and I spotted one test where it took 15 minutes to show up... So, bisecting with a test that looks for core files for 20 minutes, I am seeing that the following commit is actually at fault: commit 24992c6db9fd40f342db1f22747ec9e56483796d Author: Tom Lane Date: Fri Sep 9 19:00:59 2016 -0400 Rewrite PageIndexDeleteNoCompact into a form that only deletes 1 tuple. The full generality of deleting an arbitrary number of tuples is no longer needed, so let's save some code and cycles by replacing the original coding with an implementation based on PageIndexTupleDelete. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers