Re: 13dev failed assert: comparetup_index_btree(): ItemPointer values should never be equal
> On 21 Mar 2024, at 18:54, Peter Geoghegan wrote: > Do the posting lists that are corrupt > (reported on elsewhere) also have duplicate TIDs? I do not have access now, but AFAIR yes. Thanks for pointers! BTW there were also some errors in logs like ERROR: index "tablename" contains unexpected zero page at block 1265985 HINT: and even MultiXactId 34043703 has not been created yet -- apparent wraparound "right sibling's left-link doesn't match: right sibling 4 of target 2 with leafblkno 2 and scanblkno 2 spuriously links to non-target 1 on level 0 of index "indexname" Multixact problem was fixed by vacuum freeze, other indexes were repacked. > On 21 Mar 2024, at 20:21, Matthias van de Meent > wrote: > > Would you happen to have a PostgreSQL version number (or commit hash) > to help debugging? Has it always had that PG version, or has the > version been upgraded? Vanilla 14.11 (14.10 when created). > Considering the age of this thread, and thus potential for v14 pre-.4: > Did this cluster use REINDEX (concurrently) for the relevant indexes? As now I see, chances are my case is not related to the original thread. Thanks! Best regards, Andrey Borodin.
Re: 13dev failed assert: comparetup_index_btree(): ItemPointer values should never be equal
On Thu, 21 Mar 2024 at 07:17, Andrey M. Borodin wrote: > > > On 29 Jun 2022, at 17:43, Robins Tharakan wrote: > > Sorry to bump ancient thread, I have some observations that might or might > not be relevant. > Recently we noticed a corruption on one of clusters. The corruption at hand > is not in system catalog, but in user indexes. > The cluster was correctly configured: checksums, fsync, FPI etc. > The cluster never was restored from a backup. It’s a single-node cluster, so > it was not ever promoted, pg_rewind-ed etc. VM had never been rebooted. > > But, the cluster had been experiencing 10 OOMs a day. There were no torn > pages, no checsum erros at log at all. Yet, B-tree indexes became corrupted. Would you happen to have a PostgreSQL version number (or commit hash) to help debugging? Has it always had that PG version, or has the version been upgraded? Considering the age of this thread, and thus potential for v14 pre-.4: Did this cluster use REINDEX (concurrently) for the relevant indexes? Kind regards, Matthias van de Meent
Re: 13dev failed assert: comparetup_index_btree(): ItemPointer values should never be equal
On Thu, Mar 21, 2024 at 2:16 AM Andrey M. Borodin wrote: > Most of these messages look similar, except last one: “cross page item order > invariant violated for index”. Indeed, index scans were hanging in a cycle. > I could not locate problem in WAL yet, because a lot of other stuff is going > on. But I have no other ideas, but suspect that posting list redo is > corrupting index in case of a crash. Some of these errors seem unrelated to posting lists. For example, this one: 2024-03-01 11:54:08,162 ERROR : Corrupted index: 96066 webhooks_webhookresponse_webhook_id_db49ebcd XX002 ERROR: item order invariant violated for index "webhooks_webhookresponse_webhook_id_db49ebcd" DETAIL: Lower index tid=(522,24) (points to heap tid=(73981,1)) higher index tid=(522,25) (points to heap tid=(73981,1)) page lsn=31B/E522B640. Notice that there are duplicate heap TIDs here, but no posting list. This is almost certainly a symptom of heap related corruption -- often a problem with recovery. Do the posting lists that are corrupt (reported on elsewhere) also have duplicate TIDs? Such problems tend to first get noticed when inserts fail with "posting list split failed" errors -- but that's just a symptom. It just so happens that the hardening added to places like _bt_swap_posting() and _bt_binsrch_insert() is much more likely to visibly break than anything else, at least in practice. -- Peter Geoghegan
Re: 13dev failed assert: comparetup_index_btree(): ItemPointer values should never be equal
> On 29 Jun 2022, at 17:43, Robins Tharakan wrote: Sorry to bump ancient thread, I have some observations that might or might not be relevant. Recently we noticed a corruption on one of clusters. The corruption at hand is not in system catalog, but in user indexes. The cluster was correctly configured: checksums, fsync, FPI etc. The cluster never was restored from a backup. It’s a single-node cluster, so it was not ever promoted, pg_rewind-ed etc. VM had never been rebooted. But, the cluster had been experiencing 10 OOMs a day. There were no torn pages, no checsum erros at log at all. Yet, B-tree indexes became corrupted. Sorry for this wall of text, I’m posing everything as-is in case if there is some useful information. $ /etc/cron.yandex/pg_corruption_check.py --index 2024-03-01 11:54:05,075 ERROR : Corrupted index: 96009 table1_table1message_table1_team_identity_06a95642 XX002 ERROR: posting list contains misplaced TID in index "table1_table1message_table1_team_identity_06a95642" DETAIL: Index tid=(267,34) posting list offset=137 page lsn=31B/62159608. 2024-03-01 11:54:05,100 ERROR : Corrupted index: 96008 table1_table1message_organization_id_66c18ed2 XX002 ERROR: posting list contains misplaced TID in index "table1_table1message_organization_id_66c18ed2" DETAIL: Index tid=(267,34) posting list offset=137 page lsn=31B/62158BC8. 2024-03-01 11:54:05,355 ERROR : Corrupted index: 95804 table2_aler_channel_81aeec_idx XX002 ERROR: posting list contains misplaced TID in index "table2_aler_channel_81aeec_idx" DETAIL: Index tid=(336,7) posting list offset=182 page lsn=314/9B794248. 2024-03-01 11:54:05,716 ERROR : Corrupted index: 95816 table2_table3_channel_id_91a1912f XX002 ERROR: posting list contains misplaced TID in index "table2_table3_channel_id_91a1912f" DETAIL: Index tid=(384,2) posting list offset=72 page lsn=317/3F14F390. 2024-03-01 11:54:06,068 ERROR : Corrupted index: 95815 table2_table3_channel_filter_id_6706c8b6 XX002 ERROR: posting list contains misplaced TID in index "table2_table3_channel_filter_id_6706c8b6" DETAIL: Index tid=(380,2) posting list offset=72 page lsn=317/3F0D8E30. 2024-03-01 11:54:06,302 ERROR : Corrupted index: 95824 table2_table3_root_alert_group_id_f327f122 XX002 ERROR: item order invariant violated for index "table2_table3_root_alert_group_id_f327f122" DETAIL: Lower index tid=(368,204) (points to heap tid=(48901,2)) higher index tid=(368,205) (points to heap tid=(48901,2)) page lsn=319/3C234588. 2024-03-01 11:54:06,538 ERROR : Corrupted index: 95810 table2_table3_acknowledged_by_user_id_dd6723dc XX002 ERROR: posting list contains misplaced TID in index "table2_table3_acknowledged_by_user_id_dd6723dc" DETAIL: Index tid=(380,69) posting list offset=35 page lsn=317/C14E2D50. 2024-03-01 11:54:06,775 ERROR : Corrupted index: 95825 table2_table3_silenced_by_user_id_40a833a1 XX002 ERROR: posting list contains misplaced TID in index "table2_table3_silenced_by_user_id_40a833a1" DETAIL: Index tid=(371,11) posting list offset=144 page lsn=318/61171918. 2024-03-01 11:54:07,009 ERROR : Corrupted index: 95829 table2_table3_wiped_by_id_4326ff61 XX002 ERROR: item order invariant violated for index "table2_table3_wiped_by_id_4326ff61" DETAIL: Lower index tid=(373,97) (points to heap tid=(48901,2)) higher index tid=(373,98) (points to heap tid=(48901,2)) page lsn=318/61172788. 2024-03-01 11:54:07,245 ERROR : Corrupted index: 95823 table2_table3_resolved_by_user_id_463cdf3d XX002 ERROR: posting list contains misplaced TID in index "table2_table3_resolved_by_user_id_463cdf3d" DETAIL: Index tid=(375,89) posting list offset=144 page lsn=319/3C1DCFC8. 2024-03-01 11:54:07,479 ERROR : Corrupted index: 95819 table2_table3_maintenance_uuid_9a7b8529_like XX002 ERROR: item order invariant violated for index "table2_table3_maintenance_uuid_9a7b8529_like" DETAIL: Lower index tid=(372,4) (points to heap tid=(48901,2)) higher index tid=(372,5) (points to heap tid=(48901,2)) page lsn=317/C1A210A8. 2024-03-01 11:54:07,717 ERROR : Corrupted index: 95827 table2_table3_table1_message_id_58a31784_like XX002 ERROR: posting list contains misplaced TID in index "table2_table3_table1_message_id_58a31784_like" DETAIL: Index tid=(373,89) posting list offset=144 page lsn=319/3C3EE660. 2024-03-01 11:54:08,162 ERROR : Corrupted index: 96066 webhooks_webhookresponse_webhook_id_db49ebcd XX002 ERROR: item order invariant violated for index "webhooks_webhookresponse_webhook_id_db49ebcd" DETAIL: Lower index tid=(522,24) (points to heap tid=(73981,1)) higher index tid=(522,25) (points to heap tid=(73981,1)) page lsn=31B/E522B640. 2024-03-01 11:54:08,646 ERROR : Corrupted index: 95822 table2_table3_resolved_by_alert_id_bbdf0a83 XX002 ERROR: posting list contains misplaced TID in index "table2_table3_resolved_by_alert_id_bbdf0a83" DETAIL: Index tid=(618,2) posting list offset=150 page lsn=317/C1DE74B8. 2024-03-01 11:54:08,873 ERROR : Corrupted index: 95427
Re: 13dev failed assert: comparetup_index_btree(): ItemPointer values should never be equal
> Justin Pryzby writes: > > I have nothing new to add, but wanted to point out this is still an issue. > > This is on the v13 Opened Items list - for lack of anywhere else to put > > them, I > > also added two other, unresolved issues. Two minor things to add: 1. This issue is still reproducible on 15Beta2 (c1d033fcb5) - Backtrace here [2] 2. There was a mention that amcheck could throw up errors, but despite quickly stopping the workload, I didn't find anything interesting [1]. (gdb) frame 3 #3 0x55bf9fe496c8 in comparetup_index_btree (a=0x7f0a1a50ba80, b=0x7f0a1a50b9d8, state=0x55bfa19ce960) at tuplesort.c:4454 4454Assert(false); (gdb) info locals sortKey = 0x55bfa19cef10 tuple1 = 0x7f0a1a563ee0 tuple2 = 0x7f0a1a5642a0 keysz = 2 tupDes = 0x7f0a1a9e66a8 equal_hasnull = false nkey = 3 compare = 0 datum1 = 2085305 datum2 = 2085305 isnull1 = false isnull2 = false __func__ = "comparetup_index_btree" (gdb) p *tuple1 $5 = {t_tid = {ip_blkid = {bi_hi = 0, bi_lo = 205}, ip_posid = 3}, t_info = 16} (gdb) p *tuple2 $9 = {t_tid = {ip_blkid = {bi_hi = 0, bi_lo = 205}, ip_posid = 3}, t_info = 16} (gdb) p *sortKey $7 = {ssup_cxt = 0x40, ssup_collation = 0, ssup_reverse = false, ssup_nulls_first = false, ssup_attno = 0, ssup_extra = 0x0, comparator = 0x7f7f7f7f7f7f7f7f, abbreviate = 127, abbrev_converter = 0x7f7f7f7f7f7f7f7f, abbrev_abort = 0x7f7f7f7f7f7f7f7f, abbrev_full_comparator = 0x7f7f7f7f7f7f7f7f} (gdb) p *tupDes $8 = {natts = 2, tdtypeid = 2249, tdtypmod = -1, tdrefcount = 1, constr = 0x0, attrs = 0x7f0a1a9e66c0} Reference: 1) postgres=# select bt_index_parent_check('pg_class_tblspc_relfilenode_index', true, true); bt_index_parent_check --- (1 row) postgres=# select bt_index_check('pg_class_tblspc_relfilenode_index', true); bt_index_check (1 row) 2) Backtrace - (gdb) bt #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x7f0a25692859 in __GI_abort () at abort.c:79 #2 0x55bf9fdf1718 in ExceptionalCondition (conditionName=0x55bfa0036a0b "false", errorType=0x55bfa0035d5e "FailedAssertion", fileName=0x55bfa0035dbd "tuplesort.c", lineNumber=4454) at assert.c:69 #3 0x55bf9fe496c8 in comparetup_index_btree (a=0x7f0a1a50ba80, b=0x7f0a1a50b9d8, state=0x55bfa19ce960) at tuplesort.c:4454 #4 0x55bf9fe40901 in qsort_tuple (data=0x7f0a1a50b8e8, n=13, compare=0x55bf9fe484ab , arg=0x55bfa19ce960) at ../../../../src/include/lib/sort_template.h:341 #5 0x55bf9fe40b2f in qsort_tuple (data=0x7f0a1a50b3a8, n=61, compare=0x55bf9fe484ab , arg=0x55bfa19ce960) at ../../../../src/include/lib/sort_template.h:378 #6 0x55bf9fe40b9f in qsort_tuple (data=0x7f0a1a509e78, n=343, compare=0x55bf9fe484ab , arg=0x55bfa19ce960) at ../../../../src/include/lib/sort_template.h:392 #7 0x55bf9fe40b2f in qsort_tuple (data=0x7f0a1a509e78, n=833, compare=0x55bf9fe484ab , arg=0x55bfa19ce960) at ../../../../src/include/lib/sort_template.h:378 #8 0x55bf9fe40b9f in qsort_tuple (data=0x7f0a1a4d9050, n=2118, compare=0x55bf9fe484ab , arg=0x55bfa19ce960) at ../../../../src/include/lib/sort_template.h:392 #9 0x55bf9fe46df8 in tuplesort_sort_memtuples (state=0x55bfa19ce960) at tuplesort.c:3698 #10 0x55bf9fe44043 in tuplesort_performsort (state=0x55bfa19ce960) at tuplesort.c:2217 #11 0x55bf9f783a85 in _bt_leafbuild (btspool=0x55bfa1913318, btspool2=0x0) at nbtsort.c:559 #12 0x55bf9f7835a6 in btbuild (heap=0x7f0a1a9df940, index=0x7f0a1a9e2898, indexInfo=0x55bfa19bc740) at nbtsort.c:336 #13 0x55bf9f81c8cc in index_build (heapRelation=0x7f0a1a9df940, indexRelation=0x7f0a1a9e2898, indexInfo=0x55bfa19bc740, isreindex=true, parallel=true) at index.c:3018 #14 0x55bf9f81dbe6 in reindex_index (indexId=3455, skip_constraint_checks=false, persistence=112 'p', params=0x7ffcfa60a524) at index.c:3718 #15 0x55bf9f925148 in ReindexIndex (indexRelation=0x55bfa18f09a0, params=0x7ffcfa60a598, isTopLevel=true) at indexcmds.c:2727 #16 0x55bf9f924f67 in ExecReindex (pstate=0x55bfa1913070, stmt=0x55bfa18f09f8, isTopLevel=true) at indexcmds.c:2651 #17 0x55bf9fc3397f in standard_ProcessUtility (pstmt=0x55bfa18f0d48, queryString=0x55bfa18eff30 "REINDEX INDEX pg_class_tblspc_relfilenode_index;", readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x55bfa18f0e38, qc=0x7ffcfa60ad20) at utility.c:960 #18 0x7f0a251d6887 in pgss_ProcessUtility (pstmt=0x55bfa18f0d48, queryString=0x55bfa18eff30 "REINDEX INDEX pg_class_tblspc_relfilenode_index;", readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x55bfa18f0e38, qc=0x7ffcfa60ad20) at pg_stat_statements.c:1143 #19 0x55bf9fc32d34 in ProcessUtility (pstmt=0x55bfa18f0d48, queryString=0x55bfa18eff30 "REINDEX INDEX pg_class_tblspc_relfilenode_index;", readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x55bfa18f0e38, qc=0x7ffcfa60ad20) at utility.c:526 #20
Re: 13dev failed assert: comparetup_index_btree(): ItemPointer values should never be equal
Justin Pryzby writes: > I have nothing new to add, but wanted to point out this is still an issue. > This is on the v13 Opened Items list - for lack of anywhere else to put them, > I > also added two other, unresolved issues. It's probably time to make a v14 open items page, and move anything you want to treat as a live issue to there. regards, tom lane
Re: 13dev failed assert: comparetup_index_btree(): ItemPointer values should never be equal
I have nothing new to add, but wanted to point out this is still an issue. This is on the v13 Opened Items list - for lack of anywhere else to put them, I also added two other, unresolved issues. https://wiki.postgresql.org/index.php?title=PostgreSQL_13_Open_Items=revision=35624=35352 On Tue, Aug 04, 2020 at 06:00:34PM -0400, Tom Lane wrote: > Peter Geoghegan writes: > > On Tue, Aug 4, 2020 at 1:31 PM Tom Lane wrote: > >> while the INSERT_IN_PROGRESS case has none. Simple symmetry > >> would suggest that the INSERT_IN_PROGRESS case should be > >> > >>if (checking_uniqueness || > >>HeapTupleIsHeapOnly(heapTuple)) > >>// wait > > > I had exactly the same intuition. > > >> but I'm not 100% convinced that that's right. > > > Why doubt that explanation? > > First, it's not clear that this is an exact inverse, because > HeapTupleIsHotUpdated does more than check the HOT_UPDATED flag. > Second, I think there remains some doubt as to whether the > DELETE_IN_PROGRESS case is right either. If we were forcing > a wait for *every* in-doubt HOT-chain element, not only non-last > ones (or non-first ones for the INSERT case, if we use the above > fix) then it'd be quite clear that we're safe. But if we want > to keep the optimization then I think maybe closer analysis is > warranted.
Re: 13dev failed assert: comparetup_index_btree(): ItemPointer values should never be equal
Peter Geoghegan writes: > On Tue, Aug 4, 2020 at 1:31 PM Tom Lane wrote: >> while the INSERT_IN_PROGRESS case has none. Simple symmetry >> would suggest that the INSERT_IN_PROGRESS case should be >> >> if (checking_uniqueness || >> HeapTupleIsHeapOnly(heapTuple)) >> // wait > I had exactly the same intuition. >> but I'm not 100% convinced that that's right. > Why doubt that explanation? First, it's not clear that this is an exact inverse, because HeapTupleIsHotUpdated does more than check the HOT_UPDATED flag. Second, I think there remains some doubt as to whether the DELETE_IN_PROGRESS case is right either. If we were forcing a wait for *every* in-doubt HOT-chain element, not only non-last ones (or non-first ones for the INSERT case, if we use the above fix) then it'd be quite clear that we're safe. But if we want to keep the optimization then I think maybe closer analysis is warranted. regards, tom lane
Re: 13dev failed assert: comparetup_index_btree(): ItemPointer values should never be equal
On Tue, Aug 4, 2020 at 1:31 PM Tom Lane wrote: > The core of the issue seems to be that in the presence of concurrent > updates, we might not have a stable opinion of which entry of the HOT > chain is live, leading to trying to index multiple ones of them (using > the same root TID), which leads to the assertion failure. I agree with that assessment. FWIW, I believe that contrib/amcheck will detect this issue on Postgres 12+. If it happened all that often then we probably would have heard about it by now. BTW, I backpatched the assertion that fails. All branches have it now. It might not help, but it certainly can't hurt. > I'm not sure if I was considering the HOT-chain case when I wrote that, > but "no harm" seems clearly wrong in that situation: indexing more than > one in-doubt chain member leads to having multiple index entries pointing > at the same HOT chain. That could be really bad if they have distinct > index values (though we do not expect such a case to arise in a system > catalog, since broken HOT chains should never occur there). I think that it might accidentally be okay for those reasons, though I have a hard time imagining that that's what you meant back then. I doubt that the exact consequences of the problem will affect what the fix looks like now, so this may be somewhat of an academic question. > In the light of this, it bothers me that the DELETE_IN_PROGRESS case > has an exception for HOT chains: > > if (checking_uniqueness || > HeapTupleIsHotUpdated(heapTuple)) > // wait > > while the INSERT_IN_PROGRESS case has none. Simple symmetry > would suggest that the INSERT_IN_PROGRESS case should be > > if (checking_uniqueness || > HeapTupleIsHeapOnly(heapTuple)) > // wait I had exactly the same intuition. > but I'm not 100% convinced that that's right. Why doubt that explanation? As I've said, it's clear that the original HOT commit imagined that this wait business was all about avoiding confusion about which heap tuple to index for the HOT chain -- nothing more or less than that. The simplest explanation seems to be that 1ddc2703a93 missed that subtlety. When some (though not all) of the problems came to light a few years later, 520bcd9c9bb didn't go far enough. We know that 1ddc2703a93 got the DELETE_IN_PROGRESS stuff wrong -- why doubt that it also got the INSERT_IN_PROGRESS stuff wrong? -- Peter Geoghegan
Re: 13dev failed assert: comparetup_index_btree(): ItemPointer values should never be equal
Peter Geoghegan writes: > Any further thoughts on this, Tom? Sorry for slow response ... my ISP had an equipment failure that took out my email service for most of a day. > This is clearly a live bug that we should fix before too long. I could > write the patch myself, but I would like to get your response to my > analysis before starting down that road. Yeah. Looking at the code now, I note these relevant comments in heapam_index_build_range_scan: * Also, although our opinions about tuple liveness could change while * we scan the page (due to concurrent transaction commits/aborts), * the chain root locations won't, so this info doesn't need to be * rebuilt after waiting for another transaction. * * Note the implied assumption that there is no more than one live * tuple per HOT-chain --- else we could create more than one index * entry pointing to the same root tuple. The core of the issue seems to be that in the presence of concurrent updates, we might not have a stable opinion of which entry of the HOT chain is live, leading to trying to index multiple ones of them (using the same root TID), which leads to the assertion failure. Also relevant is 1ddc2703a93's commit-log comment that First, teach IndexBuildHeapScan to not wait for INSERT_IN_PROGRESS or DELETE_IN_PROGRESS tuples to commit unless the index build is checking uniqueness/exclusion constraints. If it isn't, there's no harm in just indexing the in-doubt tuple. I'm not sure if I was considering the HOT-chain case when I wrote that, but "no harm" seems clearly wrong in that situation: indexing more than one in-doubt chain member leads to having multiple index entries pointing at the same HOT chain. That could be really bad if they have distinct index values (though we do not expect such a case to arise in a system catalog, since broken HOT chains should never occur there). >> Perhaps 2011's commit 520bcd9c9bb missed similar >> HEAPTUPLE_INSERT_IN_PROGRESS issues that manifest themselves within >> Justin's test case now? In the light of this, it bothers me that the DELETE_IN_PROGRESS case has an exception for HOT chains: if (checking_uniqueness || HeapTupleIsHotUpdated(heapTuple)) // wait while the INSERT_IN_PROGRESS case has none. Simple symmetry would suggest that the INSERT_IN_PROGRESS case should be if (checking_uniqueness || HeapTupleIsHeapOnly(heapTuple)) // wait but I'm not 100% convinced that that's right. regards, tom lane
Re: 13dev failed assert: comparetup_index_btree(): ItemPointer values should never be equal
On Tue, Jul 28, 2020 at 3:09 PM Peter Geoghegan wrote: > Perhaps 2011's commit 520bcd9c9bb missed similar > HEAPTUPLE_INSERT_IN_PROGRESS issues that manifest themselves within > Justin's test case now? Any further thoughts on this, Tom? This is clearly a live bug that we should fix before too long. I could write the patch myself, but I would like to get your response to my analysis before starting down that road. -- Peter Geoghegan
Re: 13dev failed assert: comparetup_index_btree(): ItemPointer values should never be equal
On Tue, Jul 28, 2020 at 2:46 PM Peter Geoghegan wrote: > The fact remains that this function (originally known as > IndexBuildHeapScan(), now heapam_index_build_range_scan()) did not > care about whether or not the index is unique for about 3 years > (excluding the tupleIsAlive stuff, which was always there, even before > HOT). The original HOT commit (commit 282d2a03dd3) said nothing about > unique indexes in the relevant path (the HEAPTUPLE_INSERT_IN_PROGRESS > + !TransactionIdIsCurrentTransactionId() "concurrent system catalog > insert" path). The need to wait here really did seem to be all about > not getting duplicate TIDs (i.e. respecting the basic HOT invariant) > back in 2007. I mentioned that the unique index aspect was added by commit 1ddc2703 in 2010 (the new-style VACUUM FULL deadlock commit that added the "if (checking_uniqueness)" condition). Turns out that that had bugs that were fixed in 2011's commit 520bcd9c9bb (at least I think so based on a reading of the latter commit's commit message) -- though those were in the DELETE_IN_PROGRESS case. Perhaps 2011's commit 520bcd9c9bb missed similar HEAPTUPLE_INSERT_IN_PROGRESS issues that manifest themselves within Justin's test case now? -- Peter Geoghegan
Re: 13dev failed assert: comparetup_index_btree(): ItemPointer values should never be equal
On Tue, Jul 28, 2020 at 1:26 PM Peter Geoghegan wrote: > On Tue, Jul 28, 2020 at 1:04 PM Tom Lane wrote: > > No, I don't think so. It was designed for the case of unique key X > > being inserted immediately after a deletion of the same key. The > > deleted tuple is presumably not yet vacuumed-away, so the new tuple > > should have a different TID. In no case should we have multiple index > > tuples pointing at the same TID; that would imply that somebody failed > > to vacuuum away an old index entry before freeing up the heap TID. > > It looks like one HOT chain. The fact remains that this function (originally known as IndexBuildHeapScan(), now heapam_index_build_range_scan()) did not care about whether or not the index is unique for about 3 years (excluding the tupleIsAlive stuff, which was always there, even before HOT). The original HOT commit (commit 282d2a03dd3) said nothing about unique indexes in the relevant path (the HEAPTUPLE_INSERT_IN_PROGRESS + !TransactionIdIsCurrentTransactionId() "concurrent system catalog insert" path). The need to wait here really did seem to be all about not getting duplicate TIDs (i.e. respecting the basic HOT invariant) back in 2007. -- Peter Geoghegan
Re: 13dev failed assert: comparetup_index_btree(): ItemPointer values should never be equal
On Tue, Jul 28, 2020 at 1:04 PM Tom Lane wrote: > No, I don't think so. It was designed for the case of unique key X > being inserted immediately after a deletion of the same key. The > deleted tuple is presumably not yet vacuumed-away, so the new tuple > should have a different TID. In no case should we have multiple index > tuples pointing at the same TID; that would imply that somebody failed > to vacuuum away an old index entry before freeing up the heap TID. It looks like one HOT chain. I think cases where the visibility/HeapTupleSatisfiesVacuum() stuff somehow gets confused could result in the same heap TID (which is actually the HOT chain's root TID) getting indexed twice. > Or, perhaps, REINDEX is somehow scanning the same TID twice, and > generating indeed-duplicate index entries? It's 100% clear that that's what happens from my rr recording (kind of). A conditional breakpoint in _bt_build_callback() clearly shows that it gets called twice for the same TID value (twice in immediate succession). The first time it gets called in the !HeapTupleIsHeapOnlyTuple() path, the second time in the HeapTupleIsHeapOnlyTuple() path (i.e. the path that uses the root_offsets array). I notice that the root tuple of the hot chain is marked HEAP_COMBOCID (and xmin == xmax for the HOT chain tuple). The xmin for the successor (which matches xmin and xmax for root tuple) exactly matches the REINDEX/crashing session's OldestXmin. -- Peter Geoghegan
Re: 13dev failed assert: comparetup_index_btree(): ItemPointer values should never be equal
Peter Geoghegan writes: > I also find it suspicious that I can no longer produce the assertion > failure once I force all non-unique system catalog indexes (such as > Justin's repro index, pg_class_tblspc_relfilenode_index) to go through > the HEAPTUPLE_INSERT_IN_PROGRESS + > !TransactionIdIsCurrentTransactionId() handling for unique indexes > shown here: Hmm... > Actually...was the code *originally* designed to avoid this issue? No, I don't think so. It was designed for the case of unique key X being inserted immediately after a deletion of the same key. The deleted tuple is presumably not yet vacuumed-away, so the new tuple should have a different TID. In no case should we have multiple index tuples pointing at the same TID; that would imply that somebody failed to vacuuum away an old index entry before freeing up the heap TID. Or, perhaps, REINDEX is somehow scanning the same TID twice, and generating indeed-duplicate index entries? regards, tom lane
Re: 13dev failed assert: comparetup_index_btree(): ItemPointer values should never be equal
On Tue, Jul 28, 2020 at 12:00 PM Peter Geoghegan wrote: > I still don't know what's going on here, but I find it suspicious that > some relevant tuples go through the HEAPTUPLE_INSERT_IN_PROGRESS + > !TransactionIdIsCurrentTransactionId() path of > heapam_index_build_range_scan(). After all, if this wasn't a system > catalog index then we'd expect to see "concurrent insert in progress > within table \"%s\"" WARNING output. I also find it suspicious that I can no longer produce the assertion failure once I force all non-unique system catalog indexes (such as Justin's repro index, pg_class_tblspc_relfilenode_index) to go through the HEAPTUPLE_INSERT_IN_PROGRESS + !TransactionIdIsCurrentTransactionId() handling for unique indexes shown here: /* * If we are performing uniqueness checks, indexing * such a tuple could lead to a bogus uniqueness * failure. In that case we wait for the inserting * transaction to finish and check again. */ if (checking_uniqueness) { /* * Must drop the lock on the buffer before we wait */ LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK); XactLockTableWait(xwait, heapRelation, >t_self, XLTW_InsertIndexUnique); CHECK_FOR_INTERRUPTS(); goto recheck; } Commenting out "if (checking_uniqueness)" here at least *masks* the bug. Seemingly by averting problems that the checking_uniqueness code wasn't actually designed to solve. I imagine that this factor makes the bug less serious in practice -- most system catalogs are unique indexes. Actually...was the code *originally* designed to avoid this issue? Might that fact have been lost since HOT was first introduced? Commit 1ddc2703a93 changed some of the code in question to avoid deadlocks on system catalogs with new-style VACUUM FULL. I wonder if it was a good idea to not wait when we weren't checking_uniqueness following that 2010 commit, though -- we used to wait like this regardless of our checking_uniqueness status. (I understand that the real problem here may be the way that we can release locks early for system catalogs, but let's start with immediate causes.) -- Peter Geoghegan
Re: 13dev failed assert: comparetup_index_btree(): ItemPointer values should never be equal
On Tue, Jul 28, 2020 at 10:37 AM Peter Geoghegan wrote: > It's starting to look more like that. I can reproduce the bug by > running the REINDEX in a tight loop while "make installcheck" runs. It > looks as if the two tuples passed to comparetup_index_btree() are > separate tuples that each point to the same heap TID. Evidently this is an old bug. The assertion that fails was added to Postgres 12, but that isn't significant. The invariant we see violated here has nothing to do with any of my B-Tree work -- it would have been reasonable to add the same assertion to Postgres 9.5. If I add the same assertion to 9.5 now, I find that the same steps reproduce the problem -- "REINDEX INDEX pg_class_tblspc_relfilenode_index" run in a tight loop connected to the regression database, concurrent with a "make installcheck". I still don't know what's going on here, but I find it suspicious that some relevant tuples go through the HEAPTUPLE_INSERT_IN_PROGRESS + !TransactionIdIsCurrentTransactionId() path of heapam_index_build_range_scan(). After all, if this wasn't a system catalog index then we'd expect to see "concurrent insert in progress within table \"%s\"" WARNING output. -- Peter Geoghegan
Re: 13dev failed assert: comparetup_index_btree(): ItemPointer values should never be equal
On Tue, Jul 28, 2020 at 10:06 AM Peter Geoghegan wrote: > I wrote the assertion that fails here with the bug that I fixed in > commit 4974d7f87e62a58e80c6524e49677cb25cc10e12 in mind specifically. > That was a bug that involved a scan that returned duplicate tuples due > to a problem in heapam_index_build_range_scan() or all of the > infrastructure that it depends on (directly and indirectly). I wonder > if it's something like that -- this is also a system catalog index. It's starting to look more like that. I can reproduce the bug by running the REINDEX in a tight loop while "make installcheck" runs. It looks as if the two tuples passed to comparetup_index_btree() are separate tuples that each point to the same heap TID. I have an rr recording of this. It shouldn't take too long to figure out what's going on... -- Peter Geoghegan
Re: 13dev failed assert: comparetup_index_btree(): ItemPointer values should never be equal
On Tue, Jul 28, 2020 at 8:47 AM Peter Geoghegan wrote: > This is very likely to be related to incremental sort because it's a > use-after-free issue, which is the kind of thing that could only > really happen inside tuplesort.c. This is clear because some of the > variables have the tell-tale 0x7f7f7f pattern that we written by > CLOBBER_FREED_MEMORY builds when memory is freed. Actually, I changed my mind. The pointer variable sortKey within comparetup_index_btree() has just been incremented in a way that makes it point past the end of allocated memory, without being dereferenced. That part is fine. I still don't think that it's deduplication, though, because at the point of the crash we haven't even reached _bt_load() yet. That is, we haven't reached nbtsort.c code that is specific to Postgres 13 yet (and besides, catalog indexes don't use deduplication in practice). I wrote the assertion that fails here with the bug that I fixed in commit 4974d7f87e62a58e80c6524e49677cb25cc10e12 in mind specifically. That was a bug that involved a scan that returned duplicate tuples due to a problem in heapam_index_build_range_scan() or all of the infrastructure that it depends on (directly and indirectly). I wonder if it's something like that -- this is also a system catalog index. -- Peter Geoghegan
Re: 13dev failed assert: comparetup_index_btree(): ItemPointer values should never be equal
On Tue, Jul 28, 2020 at 8:40 AM James Coleman wrote: > Is that assumption largely based on the incremental sort patch > refactoring tuplesort.c a bit? I haven't looked at it much at all, but > I'm wondering if the issue could also be related to the btree > duplicates changes in 13 given that we're looking at > comparetup_index_btree and the datums are equal. It couldn't possibly be the deduplication patch. That didn't change anything in tuplesort.c. This is very likely to be related to incremental sort because it's a use-after-free issue, which is the kind of thing that could only really happen inside tuplesort.c. This is clear because some of the variables have the tell-tale 0x7f7f7f pattern that we written by CLOBBER_FREED_MEMORY builds when memory is freed. -- Peter Geoghegan
Re: 13dev failed assert: comparetup_index_btree(): ItemPointer values should never be equal
On Tue, Jul 28, 2020 at 11:40:14AM -0400, James Coleman wrote: > > This appears to be an issue with d2d8a229b (Incremental Sort), so I will add > > at: https://wiki.postgresql.org/wiki/PostgreSQL_13_Open_Items > > Is that assumption largely based on the incremental sort patch > refactoring tuplesort.c a bit? I haven't looked at it much at all, but > I'm wondering if the issue could also be related to the btree > duplicates changes in 13 given that we're looking at > comparetup_index_btree and the datums are equal. Good point. I'd looked at something like this to come to my tentative conclusion. git log --stat origin/REL_12_STABLE..origin/REL_13_STABLE -- src/backend/utils/sort/*tuple*c -- Justin
Re: 13dev failed assert: comparetup_index_btree(): ItemPointer values should never be equal
On Tue, Jul 28, 2020 at 11:10 AM Justin Pryzby wrote: > > An internal server aborted last night while running a maintenance script. I > reproduced this easily running the crashing command in a loop, and verified > this is a live issue on REL_13_STABLE (dc6f2fb43). > REINDEX INDEX pg_class_tblspc_relfilenode_index > > It looks like this crashed once before, and I didn't notice until now: > Fri Jun 26 22:30:29 CDT 2020: pg_shdescription: > pg_toast.pg_toast_2396_index(reindex toast)... > psql: error: could not connect to server: server closed the connection > unexpectedly > > #3 0x00afde98 in comparetup_index_btree (a=0x201fa58, b=0x201fa10, > state=0x20147b0) at tuplesort.c:4251 > 4251Assert(false); > (gdb) l > 4246if (pos1 != pos2) > 4247return (pos1 < pos2) ? -1 : 1; > 4248} > 4249 > 4250/* ItemPointer values should never be equal */ > 4251Assert(false); > 4252 > 4253return 0; > 4254} > 4255 > > #3 0x00afde98 in comparetup_index_btree (a=0x201fa58, b=0x201fa10, > state=0x20147b0) at tuplesort.c:4251 > sortKey = 0x2014d60 > tuple1 = 0x20189d8 > tuple2 = 0x2018898 > keysz = 2 > tupDes = 0x7f48996b3790 > equal_hasnull = false > nkey = 3 > compare = 0 > datum1 = 67999603 > datum2 = 67999603 > isnull1 = false > isnull2 = false > __func__ = "comparetup_index_btree" > > (gdb) p *tuple1 > $2 = {t_tid = {ip_blkid = {bi_hi = 0, bi_lo = 0}, ip_posid = 43}, t_info = 16} > (gdb) p *tuple2 > $3 = {t_tid = {ip_blkid = {bi_hi = 0, bi_lo = 0}, ip_posid = 43}, t_info = 16} > > (gdb) p *sortKey > $5 = {ssup_cxt = 0x40, ssup_collation = 0, ssup_reverse = false, > ssup_nulls_first = false, ssup_attno = 0, ssup_extra = 0x0, comparator = > 0x7f7f7f7f7f7f7f7f, abbreviate = 127, abbrev_converter = 0x7f7f7f7f7f7f7f7f, > abbrev_abort = 0x7f7f7f7f7f7f7f7f, abbrev_full_comparator = > 0x7f7f7f7f7f7f7f7f} > > (gdb) p *tupDes > $6 = {natts = 2, tdtypeid = 0, tdtypmod = -1, tdrefcount = 1, constr = 0x0, > attrs = 0x7f48996b37a8} > > Core was generated by `postgres: postgres sentinel [local] REINDEX > '. > > (gdb) bt > #0 0x7f489853d1f7 in raise () from /lib64/libc.so.6 > #1 0x7f489853e8e8 in abort () from /lib64/libc.so.6 > #2 0x00aafff7 in ExceptionalCondition (conditionName=0xccd0dc > "false", errorType=0xccc327 "FailedAssertion", fileName=0xccc2fd > "tuplesort.c", lineNumber=4251) at assert.c:67 > #3 0x00afde98 in comparetup_index_btree (a=0x201fa58, b=0x201fa10, > state=0x20147b0) at tuplesort.c:4251 > #4 0x00af1d5e in qsort_tuple (a=0x201fa10, n=18, cmp_tuple=0xafcf21 > , state=0x20147b0) at qsort_tuple.c:140 > #5 0x00af2104 in qsort_tuple (a=0x201f710, n=50, cmp_tuple=0xafcf21 > , state=0x20147b0) at qsort_tuple.c:191 > #6 0x00af2104 in qsort_tuple (a=0x201cc38, n=544, cmp_tuple=0xafcf21 > , state=0x20147b0) at qsort_tuple.c:191 > #7 0x00af8056 in tuplesort_sort_memtuples (state=0x20147b0) at > tuplesort.c:3490 > #8 0x00af51a9 in tuplesort_performsort (state=0x20147b0) at > tuplesort.c:1985 > #9 0x00529418 in _bt_leafbuild (btspool=0x1f784e0, btspool2=0x0) at > nbtsort.c:553 > #10 0x00528f9c in btbuild (heap=0x1fb5758, index=0x7f48996b3460, > indexInfo=0x1f77a48) at nbtsort.c:333 > #11 0x005adcb3 in index_build (heapRelation=0x1fb5758, > indexRelation=0x7f48996b3460, indexInfo=0x1f77a48, isreindex=true, > parallel=true) at index.c:2903 > #12 0x005aec6b in reindex_index (indexId=3455, > skip_constraint_checks=false, persistence=112 'p', options=2) at index.c:3539 > #13 0x00692583 in ReindexIndex (indexRelation=0x1f54840, options=0, > concurrent=false) at indexcmds.c:2466 > #14 0x00932e36 in standard_ProcessUtility (pstmt=0x1f54960, > queryString=0x1f53d90 "REINDEX INDEX pg_class_tblspc_relfilenode_index", > context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x1f54c40, > qc=0x7ffde023bf80) at utility.c:929 > #15 0x0093241f in ProcessUtility (pstmt=0x1f54960, > queryString=0x1f53d90 "REINDEX INDEX pg_class_tblspc_relfilenode_index", > context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x1f54c40, > qc=0x7ffde023bf80) > at utility.c:524 > #16 0x00931288 in PortalRunUtility (portal=0x1fb5ac0, > pstmt=0x1f54960, isTopLevel=true, setHoldSnapshot=false, dest=0x1f54c40, > qc=0x7ffde023bf80) at pquery.c:1157 > #17 0x009314a7 in PortalRunMulti (portal=0x1fb5ac0, isTopLevel=true, > setHoldSnapshot=false, dest=0x1f54c40, altdest=0x1f54c40, qc=0x7ffde023bf80) > at pquery.c:1303 > #18 0x009309bc in PortalRun (portal=0x1fb5ac0, > count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x1f54c40, > altdest=0x1f54c40, qc=0x7ffde023bf80) at pquery.c:779 > #19 0x0092ab5b