Re: 13dev failed assert: comparetup_index_btree(): ItemPointer values should never be equal

2024-03-21 Thread Andrey M. Borodin



> On 21 Mar 2024, at 18:54, Peter Geoghegan  wrote:
>  Do the posting lists that are corrupt
> (reported on elsewhere) also have duplicate TIDs?

I do not have access now, but AFAIR yes.
Thanks for pointers!

BTW there were also some errors in logs like
ERROR:  index "tablename" contains unexpected zero page at block 1265985 HINT:
and even
MultiXactId 34043703 has not been created yet -- apparent wraparound
"right sibling's left-link doesn't match: right sibling 4 of target 2 with 
leafblkno 2 and scanblkno 2 spuriously links to non-target 1 on level 0 of 
index "indexname"

Multixact problem was fixed by vacuum freeze, other indexes were repacked.


> On 21 Mar 2024, at 20:21, Matthias van de Meent 
>  wrote:
> 
> Would you happen to have a PostgreSQL version number (or commit hash)
> to help debugging? Has it always had that PG version, or has the
> version been upgraded?

Vanilla 14.11 (14.10 when created).

> Considering the age of this thread, and thus potential for v14 pre-.4:
> Did this cluster use REINDEX (concurrently) for the relevant indexes?


As now I see, chances are my case is not related to the original thread.

Thanks!


Best regards, Andrey Borodin.



Re: 13dev failed assert: comparetup_index_btree(): ItemPointer values should never be equal

2024-03-21 Thread Matthias van de Meent
On Thu, 21 Mar 2024 at 07:17, Andrey M. Borodin  wrote:
>
> > On 29 Jun 2022, at 17:43, Robins Tharakan  wrote:
>
> Sorry to bump ancient thread, I have some observations that might or might 
> not be relevant.
> Recently we noticed a corruption on one of clusters. The corruption at hand 
> is not in system catalog, but in user indexes.
> The cluster was correctly configured: checksums, fsync, FPI etc.
> The cluster never was restored from a backup. It’s a single-node cluster, so 
> it was not ever promoted, pg_rewind-ed etc. VM had never been rebooted.
>
> But, the cluster had been experiencing 10 OOMs a day. There were no torn 
> pages, no checsum erros at log at all. Yet, B-tree indexes became corrupted.

Would you happen to have a PostgreSQL version number (or commit hash)
to help debugging? Has it always had that PG version, or has the
version been upgraded?

Considering the age of this thread, and thus potential for v14 pre-.4:
Did this cluster use REINDEX (concurrently) for the relevant indexes?


Kind regards,

Matthias van de Meent




Re: 13dev failed assert: comparetup_index_btree(): ItemPointer values should never be equal

2024-03-21 Thread Peter Geoghegan
On Thu, Mar 21, 2024 at 2:16 AM Andrey M. Borodin  wrote:
> Most of these messages look similar, except last one: “cross page item order 
> invariant violated for index”. Indeed, index scans were hanging in a cycle.
> I could not locate problem in WAL yet, because a lot of other stuff is going 
> on. But I have no other ideas, but suspect that posting list redo is 
> corrupting index in case of a crash.

Some of these errors seem unrelated to posting lists. For example, this one:

2024-03-01 11:54:08,162 ERROR : Corrupted index: 96066
webhooks_webhookresponse_webhook_id_db49ebcd XX002 ERROR: item order
invariant violated for index
"webhooks_webhookresponse_webhook_id_db49ebcd" DETAIL: Lower index
tid=(522,24) (points to heap tid=(73981,1)) higher index tid=(522,25)
(points to heap tid=(73981,1)) page lsn=31B/E522B640.

Notice that there are duplicate heap TIDs here, but no posting list.
This is almost certainly a symptom of heap related corruption -- often
a problem with recovery. Do the posting lists that are corrupt
(reported on elsewhere) also have duplicate TIDs?

Such problems tend to first get noticed when inserts fail with
"posting list split failed" errors -- but that's just a symptom. It
just so happens that the hardening added to places like
_bt_swap_posting() and _bt_binsrch_insert() is much more likely to
visibly break than anything else, at least in practice.

-- 
Peter Geoghegan




Re: 13dev failed assert: comparetup_index_btree(): ItemPointer values should never be equal

2024-03-21 Thread Andrey M. Borodin



> On 29 Jun 2022, at 17:43, Robins Tharakan  wrote:


Sorry to bump ancient thread, I have some observations that might or might not 
be relevant.
Recently we noticed a corruption on one of clusters. The corruption at hand is 
not in system catalog, but in user indexes.
The cluster was correctly configured: checksums, fsync, FPI etc.
The cluster never was restored from a backup. It’s a single-node cluster, so it 
was not ever promoted, pg_rewind-ed etc. VM had never been rebooted.

But, the cluster had been experiencing 10 OOMs a day. There were no torn pages, 
no checsum erros at log at all. Yet, B-tree indexes became corrupted.


Sorry for this wall of text, I’m posing everything as-is in case if there is 
some useful information.

$ /etc/cron.yandex/pg_corruption_check.py --index
2024-03-01 11:54:05,075 ERROR : Corrupted index: 96009 
table1_table1message_table1_team_identity_06a95642 XX002 ERROR: posting list 
contains misplaced TID in index 
"table1_table1message_table1_team_identity_06a95642" DETAIL: Index tid=(267,34) 
posting list offset=137 page lsn=31B/62159608.
2024-03-01 11:54:05,100 ERROR : Corrupted index: 96008 
table1_table1message_organization_id_66c18ed2 XX002 ERROR: posting list 
contains misplaced TID in index "table1_table1message_organization_id_66c18ed2" 
DETAIL: Index tid=(267,34) posting list offset=137 page lsn=31B/62158BC8.
2024-03-01 11:54:05,355 ERROR : Corrupted index: 95804 
table2_aler_channel_81aeec_idx XX002 ERROR: posting list contains misplaced TID 
in index "table2_aler_channel_81aeec_idx" DETAIL: Index tid=(336,7) posting 
list offset=182 page lsn=314/9B794248.
2024-03-01 11:54:05,716 ERROR : Corrupted index: 95816 
table2_table3_channel_id_91a1912f XX002 ERROR: posting list contains misplaced 
TID in index "table2_table3_channel_id_91a1912f" DETAIL: Index tid=(384,2) 
posting list offset=72 page lsn=317/3F14F390.
2024-03-01 11:54:06,068 ERROR : Corrupted index: 95815 
table2_table3_channel_filter_id_6706c8b6 XX002 ERROR: posting list contains 
misplaced TID in index "table2_table3_channel_filter_id_6706c8b6" DETAIL: Index 
tid=(380,2) posting list offset=72 page lsn=317/3F0D8E30.
2024-03-01 11:54:06,302 ERROR : Corrupted index: 95824 
table2_table3_root_alert_group_id_f327f122 XX002 ERROR: item order invariant 
violated for index "table2_table3_root_alert_group_id_f327f122" DETAIL: Lower 
index tid=(368,204) (points to heap tid=(48901,2)) higher index tid=(368,205) 
(points to heap tid=(48901,2)) page lsn=319/3C234588.
2024-03-01 11:54:06,538 ERROR : Corrupted index: 95810 
table2_table3_acknowledged_by_user_id_dd6723dc XX002 ERROR: posting list 
contains misplaced TID in index 
"table2_table3_acknowledged_by_user_id_dd6723dc" DETAIL: Index tid=(380,69) 
posting list offset=35 page lsn=317/C14E2D50.
2024-03-01 11:54:06,775 ERROR : Corrupted index: 95825 
table2_table3_silenced_by_user_id_40a833a1 XX002 ERROR: posting list contains 
misplaced TID in index "table2_table3_silenced_by_user_id_40a833a1" DETAIL: 
Index tid=(371,11) posting list offset=144 page lsn=318/61171918.
2024-03-01 11:54:07,009 ERROR : Corrupted index: 95829 
table2_table3_wiped_by_id_4326ff61 XX002 ERROR: item order invariant violated 
for index "table2_table3_wiped_by_id_4326ff61" DETAIL: Lower index tid=(373,97) 
(points to heap tid=(48901,2)) higher index tid=(373,98) (points to heap 
tid=(48901,2)) page lsn=318/61172788.
2024-03-01 11:54:07,245 ERROR : Corrupted index: 95823 
table2_table3_resolved_by_user_id_463cdf3d XX002 ERROR: posting list contains 
misplaced TID in index "table2_table3_resolved_by_user_id_463cdf3d" DETAIL: 
Index tid=(375,89) posting list offset=144 page lsn=319/3C1DCFC8.
2024-03-01 11:54:07,479 ERROR : Corrupted index: 95819 
table2_table3_maintenance_uuid_9a7b8529_like XX002 ERROR: item order invariant 
violated for index "table2_table3_maintenance_uuid_9a7b8529_like" DETAIL: Lower 
index tid=(372,4) (points to heap tid=(48901,2)) higher index tid=(372,5) 
(points to heap tid=(48901,2)) page lsn=317/C1A210A8.
2024-03-01 11:54:07,717 ERROR : Corrupted index: 95827 
table2_table3_table1_message_id_58a31784_like XX002 ERROR: posting list 
contains misplaced TID in index "table2_table3_table1_message_id_58a31784_like" 
DETAIL: Index tid=(373,89) posting list offset=144 page lsn=319/3C3EE660.
2024-03-01 11:54:08,162 ERROR : Corrupted index: 96066 
webhooks_webhookresponse_webhook_id_db49ebcd XX002 ERROR: item order invariant 
violated for index "webhooks_webhookresponse_webhook_id_db49ebcd" DETAIL: Lower 
index tid=(522,24) (points to heap tid=(73981,1)) higher index tid=(522,25) 
(points to heap tid=(73981,1)) page lsn=31B/E522B640.
2024-03-01 11:54:08,646 ERROR : Corrupted index: 95822 
table2_table3_resolved_by_alert_id_bbdf0a83 XX002 ERROR: posting list contains 
misplaced TID in index "table2_table3_resolved_by_alert_id_bbdf0a83" DETAIL: 
Index tid=(618,2) posting list offset=150 page lsn=317/C1DE74B8.
2024-03-01 11:54:08,873 ERROR : Corrupted index: 95427 

Re: 13dev failed assert: comparetup_index_btree(): ItemPointer values should never be equal

2022-06-29 Thread Robins Tharakan
> Justin Pryzby  writes:
> > I have nothing new to add, but wanted to point out this is still an issue.
> > This is on the v13 Opened Items list - for lack of anywhere else to put 
> > them, I
> > also added two other, unresolved issues.

Two minor things to add:
1. This issue is still reproducible on 15Beta2 (c1d033fcb5) - Backtrace here [2]
2. There was a mention that amcheck could throw up errors, but despite quickly
stopping the workload, I didn't find anything interesting [1].


(gdb) frame 3
#3  0x55bf9fe496c8 in comparetup_index_btree (a=0x7f0a1a50ba80,
b=0x7f0a1a50b9d8, state=0x55bfa19ce960) at tuplesort.c:4454
4454Assert(false);

(gdb) info locals
sortKey = 0x55bfa19cef10
tuple1 = 0x7f0a1a563ee0
tuple2 = 0x7f0a1a5642a0
keysz = 2
tupDes = 0x7f0a1a9e66a8
equal_hasnull = false
nkey = 3
compare = 0
datum1 = 2085305
datum2 = 2085305
isnull1 = false
isnull2 = false
__func__ = "comparetup_index_btree"

(gdb) p *tuple1
$5 = {t_tid = {ip_blkid = {bi_hi = 0, bi_lo = 205}, ip_posid = 3}, t_info = 16}

(gdb) p *tuple2
$9 = {t_tid = {ip_blkid = {bi_hi = 0, bi_lo = 205}, ip_posid = 3}, t_info = 16}

(gdb) p *sortKey
$7 = {ssup_cxt = 0x40, ssup_collation = 0, ssup_reverse = false,
ssup_nulls_first = false, ssup_attno = 0, ssup_extra = 0x0, comparator
= 0x7f7f7f7f7f7f7f7f, abbreviate = 127,
  abbrev_converter = 0x7f7f7f7f7f7f7f7f, abbrev_abort =
0x7f7f7f7f7f7f7f7f, abbrev_full_comparator = 0x7f7f7f7f7f7f7f7f}

(gdb) p *tupDes
$8 = {natts = 2, tdtypeid = 2249, tdtypmod = -1, tdrefcount = 1,
constr = 0x0, attrs = 0x7f0a1a9e66c0}

Reference:
1) postgres=# select
bt_index_parent_check('pg_class_tblspc_relfilenode_index', true,
true);
 bt_index_parent_check
---

(1 row)

postgres=# select bt_index_check('pg_class_tblspc_relfilenode_index', true);
 bt_index_check


(1 row)


2) Backtrace -
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x7f0a25692859 in __GI_abort () at abort.c:79
#2  0x55bf9fdf1718 in ExceptionalCondition
(conditionName=0x55bfa0036a0b "false", errorType=0x55bfa0035d5e
"FailedAssertion", fileName=0x55bfa0035dbd "tuplesort.c",
lineNumber=4454)
at assert.c:69
#3  0x55bf9fe496c8 in comparetup_index_btree (a=0x7f0a1a50ba80,
b=0x7f0a1a50b9d8, state=0x55bfa19ce960) at tuplesort.c:4454
#4  0x55bf9fe40901 in qsort_tuple (data=0x7f0a1a50b8e8, n=13,
compare=0x55bf9fe484ab , arg=0x55bfa19ce960)
at ../../../../src/include/lib/sort_template.h:341
#5  0x55bf9fe40b2f in qsort_tuple (data=0x7f0a1a50b3a8, n=61,
compare=0x55bf9fe484ab , arg=0x55bfa19ce960)
at ../../../../src/include/lib/sort_template.h:378
#6  0x55bf9fe40b9f in qsort_tuple (data=0x7f0a1a509e78, n=343,
compare=0x55bf9fe484ab , arg=0x55bfa19ce960)
at ../../../../src/include/lib/sort_template.h:392
#7  0x55bf9fe40b2f in qsort_tuple (data=0x7f0a1a509e78, n=833,
compare=0x55bf9fe484ab , arg=0x55bfa19ce960)
at ../../../../src/include/lib/sort_template.h:378
#8  0x55bf9fe40b9f in qsort_tuple (data=0x7f0a1a4d9050, n=2118,
compare=0x55bf9fe484ab , arg=0x55bfa19ce960)
at ../../../../src/include/lib/sort_template.h:392
#9  0x55bf9fe46df8 in tuplesort_sort_memtuples
(state=0x55bfa19ce960) at tuplesort.c:3698
#10 0x55bf9fe44043 in tuplesort_performsort (state=0x55bfa19ce960)
at tuplesort.c:2217
#11 0x55bf9f783a85 in _bt_leafbuild (btspool=0x55bfa1913318,
btspool2=0x0) at nbtsort.c:559
#12 0x55bf9f7835a6 in btbuild (heap=0x7f0a1a9df940,
index=0x7f0a1a9e2898, indexInfo=0x55bfa19bc740) at nbtsort.c:336
#13 0x55bf9f81c8cc in index_build (heapRelation=0x7f0a1a9df940,
indexRelation=0x7f0a1a9e2898, indexInfo=0x55bfa19bc740,
isreindex=true, parallel=true) at index.c:3018
#14 0x55bf9f81dbe6 in reindex_index (indexId=3455,
skip_constraint_checks=false, persistence=112 'p',
params=0x7ffcfa60a524) at index.c:3718
#15 0x55bf9f925148 in ReindexIndex (indexRelation=0x55bfa18f09a0,
params=0x7ffcfa60a598, isTopLevel=true) at indexcmds.c:2727
#16 0x55bf9f924f67 in ExecReindex (pstate=0x55bfa1913070,
stmt=0x55bfa18f09f8, isTopLevel=true) at indexcmds.c:2651
#17 0x55bf9fc3397f in standard_ProcessUtility
(pstmt=0x55bfa18f0d48, queryString=0x55bfa18eff30 "REINDEX INDEX
pg_class_tblspc_relfilenode_index;", readOnlyTree=false,
context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,
dest=0x55bfa18f0e38, qc=0x7ffcfa60ad20) at utility.c:960
#18 0x7f0a251d6887 in pgss_ProcessUtility (pstmt=0x55bfa18f0d48,
queryString=0x55bfa18eff30 "REINDEX INDEX
pg_class_tblspc_relfilenode_index;", readOnlyTree=false,
context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,
dest=0x55bfa18f0e38, qc=0x7ffcfa60ad20) at pg_stat_statements.c:1143
#19 0x55bf9fc32d34 in ProcessUtility (pstmt=0x55bfa18f0d48,
queryString=0x55bfa18eff30 "REINDEX INDEX
pg_class_tblspc_relfilenode_index;", readOnlyTree=false,
context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,
dest=0x55bfa18f0e38, qc=0x7ffcfa60ad20) at utility.c:526
#20 

Re: 13dev failed assert: comparetup_index_btree(): ItemPointer values should never be equal

2021-01-26 Thread Tom Lane
Justin Pryzby  writes:
> I have nothing new to add, but wanted to point out this is still an issue.
> This is on the v13 Opened Items list - for lack of anywhere else to put them, 
> I
> also added two other, unresolved issues.

It's probably time to make a v14 open items page, and move anything
you want to treat as a live issue to there.

regards, tom lane




Re: 13dev failed assert: comparetup_index_btree(): ItemPointer values should never be equal

2021-01-26 Thread Justin Pryzby
I have nothing new to add, but wanted to point out this is still an issue.

This is on the v13 Opened Items list - for lack of anywhere else to put them, I
also added two other, unresolved issues.

https://wiki.postgresql.org/index.php?title=PostgreSQL_13_Open_Items=revision=35624=35352

On Tue, Aug 04, 2020 at 06:00:34PM -0400, Tom Lane wrote:
> Peter Geoghegan  writes:
> > On Tue, Aug 4, 2020 at 1:31 PM Tom Lane  wrote:
> >> while the INSERT_IN_PROGRESS case has none.  Simple symmetry
> >> would suggest that the INSERT_IN_PROGRESS case should be
> >> 
> >>if (checking_uniqueness ||
> >>HeapTupleIsHeapOnly(heapTuple))
> >>// wait
> 
> > I had exactly the same intuition.
> 
> >> but I'm not 100% convinced that that's right.
> 
> > Why doubt that explanation?
> 
> First, it's not clear that this is an exact inverse, because
> HeapTupleIsHotUpdated does more than check the HOT_UPDATED flag.
> Second, I think there remains some doubt as to whether the
> DELETE_IN_PROGRESS case is right either.  If we were forcing
> a wait for *every* in-doubt HOT-chain element, not only non-last
> ones (or non-first ones for the INSERT case, if we use the above
> fix) then it'd be quite clear that we're safe.  But if we want
> to keep the optimization then I think maybe closer analysis is
> warranted.




Re: 13dev failed assert: comparetup_index_btree(): ItemPointer values should never be equal

2020-08-04 Thread Tom Lane
Peter Geoghegan  writes:
> On Tue, Aug 4, 2020 at 1:31 PM Tom Lane  wrote:
>> while the INSERT_IN_PROGRESS case has none.  Simple symmetry
>> would suggest that the INSERT_IN_PROGRESS case should be
>> 
>>  if (checking_uniqueness ||
>>  HeapTupleIsHeapOnly(heapTuple))
>>  // wait

> I had exactly the same intuition.

>> but I'm not 100% convinced that that's right.

> Why doubt that explanation?

First, it's not clear that this is an exact inverse, because
HeapTupleIsHotUpdated does more than check the HOT_UPDATED flag.
Second, I think there remains some doubt as to whether the
DELETE_IN_PROGRESS case is right either.  If we were forcing
a wait for *every* in-doubt HOT-chain element, not only non-last
ones (or non-first ones for the INSERT case, if we use the above
fix) then it'd be quite clear that we're safe.  But if we want
to keep the optimization then I think maybe closer analysis is
warranted.

regards, tom lane




Re: 13dev failed assert: comparetup_index_btree(): ItemPointer values should never be equal

2020-08-04 Thread Peter Geoghegan
On Tue, Aug 4, 2020 at 1:31 PM Tom Lane  wrote:
> The core of the issue seems to be that in the presence of concurrent
> updates, we might not have a stable opinion of which entry of the HOT
> chain is live, leading to trying to index multiple ones of them (using
> the same root TID), which leads to the assertion failure.

I agree with that assessment. FWIW, I believe that contrib/amcheck
will detect this issue on Postgres 12+. If it happened all that often
then we probably would have heard about it by now.

BTW, I backpatched the assertion that fails. All branches have it now.
It might not help, but it certainly can't hurt.

> I'm not sure if I was considering the HOT-chain case when I wrote that,
> but "no harm" seems clearly wrong in that situation: indexing more than
> one in-doubt chain member leads to having multiple index entries pointing
> at the same HOT chain.  That could be really bad if they have distinct
> index values (though we do not expect such a case to arise in a system
> catalog, since broken HOT chains should never occur there).

I think that it might accidentally be okay for those reasons, though I
have a hard time imagining that that's what you meant back then. I
doubt that the exact consequences of the problem will affect what the
fix looks like now, so this may be somewhat of an academic question.

> In the light of this, it bothers me that the DELETE_IN_PROGRESS case
> has an exception for HOT chains:
>
> if (checking_uniqueness ||
> HeapTupleIsHotUpdated(heapTuple))
> // wait
>
> while the INSERT_IN_PROGRESS case has none.  Simple symmetry
> would suggest that the INSERT_IN_PROGRESS case should be
>
> if (checking_uniqueness ||
> HeapTupleIsHeapOnly(heapTuple))
> // wait

I had exactly the same intuition.

> but I'm not 100% convinced that that's right.

Why doubt that explanation?

As I've said, it's clear that the original HOT commit imagined that
this wait business was all about avoiding confusion about which heap
tuple to index for the HOT chain -- nothing more or less than that.
The simplest explanation seems to be that 1ddc2703a93 missed that
subtlety. When some (though not all) of the problems came to light a
few years later, 520bcd9c9bb didn't go far enough. We know that
1ddc2703a93 got the DELETE_IN_PROGRESS stuff wrong -- why doubt that
it also got the INSERT_IN_PROGRESS stuff wrong?

--
Peter Geoghegan




Re: 13dev failed assert: comparetup_index_btree(): ItemPointer values should never be equal

2020-08-04 Thread Tom Lane
Peter Geoghegan  writes:
> Any further thoughts on this, Tom?

Sorry for slow response ... my ISP had an equipment failure that took
out my email service for most of a day.

> This is clearly a live bug that we should fix before too long. I could
> write the patch myself, but I would like to get your response to my
> analysis before starting down that road.

Yeah.  Looking at the code now, I note these relevant comments in
heapam_index_build_range_scan:

 * Also, although our opinions about tuple liveness could change while
 * we scan the page (due to concurrent transaction commits/aborts),
 * the chain root locations won't, so this info doesn't need to be
 * rebuilt after waiting for another transaction.
 *
 * Note the implied assumption that there is no more than one live
 * tuple per HOT-chain --- else we could create more than one index
 * entry pointing to the same root tuple.

The core of the issue seems to be that in the presence of concurrent
updates, we might not have a stable opinion of which entry of the HOT
chain is live, leading to trying to index multiple ones of them (using
the same root TID), which leads to the assertion failure.

Also relevant is 1ddc2703a93's commit-log comment that

First, teach IndexBuildHeapScan to not wait for INSERT_IN_PROGRESS or
DELETE_IN_PROGRESS tuples to commit unless the index build is checking
uniqueness/exclusion constraints.  If it isn't, there's no harm in just
indexing the in-doubt tuple.

I'm not sure if I was considering the HOT-chain case when I wrote that,
but "no harm" seems clearly wrong in that situation: indexing more than
one in-doubt chain member leads to having multiple index entries pointing
at the same HOT chain.  That could be really bad if they have distinct
index values (though we do not expect such a case to arise in a system
catalog, since broken HOT chains should never occur there).

>> Perhaps 2011's commit 520bcd9c9bb missed similar
>> HEAPTUPLE_INSERT_IN_PROGRESS issues that manifest themselves within
>> Justin's test case now?

In the light of this, it bothers me that the DELETE_IN_PROGRESS case
has an exception for HOT chains:

if (checking_uniqueness ||
HeapTupleIsHotUpdated(heapTuple))
// wait

while the INSERT_IN_PROGRESS case has none.  Simple symmetry
would suggest that the INSERT_IN_PROGRESS case should be

if (checking_uniqueness ||
HeapTupleIsHeapOnly(heapTuple))
// wait

but I'm not 100% convinced that that's right.

regards, tom lane




Re: 13dev failed assert: comparetup_index_btree(): ItemPointer values should never be equal

2020-08-03 Thread Peter Geoghegan
On Tue, Jul 28, 2020 at 3:09 PM Peter Geoghegan  wrote:
> Perhaps 2011's commit 520bcd9c9bb missed similar
> HEAPTUPLE_INSERT_IN_PROGRESS issues that manifest themselves within
> Justin's test case now?

Any further thoughts on this, Tom?

This is clearly a live bug that we should fix before too long. I could
write the patch myself, but I would like to get your response to my
analysis before starting down that road.

--
Peter Geoghegan




Re: 13dev failed assert: comparetup_index_btree(): ItemPointer values should never be equal

2020-07-28 Thread Peter Geoghegan
On Tue, Jul 28, 2020 at 2:46 PM Peter Geoghegan  wrote:
> The fact remains that this function (originally known as
> IndexBuildHeapScan(), now heapam_index_build_range_scan()) did not
> care about whether or not the index is unique for about 3 years
> (excluding the tupleIsAlive stuff, which was always there, even before
> HOT). The original HOT commit (commit 282d2a03dd3) said nothing about
> unique indexes in the relevant path (the HEAPTUPLE_INSERT_IN_PROGRESS
> + !TransactionIdIsCurrentTransactionId() "concurrent system catalog
> insert" path). The need to wait here really did seem to be all about
> not getting duplicate TIDs (i.e. respecting the basic HOT invariant)
> back in 2007.

I mentioned that the unique index aspect was added by commit 1ddc2703
in 2010 (the new-style VACUUM FULL deadlock commit that added the "if
(checking_uniqueness)" condition). Turns out that that had bugs that
were fixed in 2011's commit 520bcd9c9bb (at least I think so based on
a reading of the latter commit's commit message) -- though those were
in the DELETE_IN_PROGRESS case.

Perhaps 2011's commit 520bcd9c9bb missed similar
HEAPTUPLE_INSERT_IN_PROGRESS issues that manifest themselves within
Justin's test case now?

-- 
Peter Geoghegan




Re: 13dev failed assert: comparetup_index_btree(): ItemPointer values should never be equal

2020-07-28 Thread Peter Geoghegan
On Tue, Jul 28, 2020 at 1:26 PM Peter Geoghegan  wrote:
> On Tue, Jul 28, 2020 at 1:04 PM Tom Lane  wrote:
> > No, I don't think so.  It was designed for the case of unique key X
> > being inserted immediately after a deletion of the same key.  The
> > deleted tuple is presumably not yet vacuumed-away, so the new tuple
> > should have a different TID.  In no case should we have multiple index
> > tuples pointing at the same TID; that would imply that somebody failed
> > to vacuuum away an old index entry before freeing up the heap TID.
>
> It looks like one HOT chain.

The fact remains that this function (originally known as
IndexBuildHeapScan(), now heapam_index_build_range_scan()) did not
care about whether or not the index is unique for about 3 years
(excluding the tupleIsAlive stuff, which was always there, even before
HOT). The original HOT commit (commit 282d2a03dd3) said nothing about
unique indexes in the relevant path (the HEAPTUPLE_INSERT_IN_PROGRESS
+ !TransactionIdIsCurrentTransactionId() "concurrent system catalog
insert" path). The need to wait here really did seem to be all about
not getting duplicate TIDs (i.e. respecting the basic HOT invariant)
back in 2007.

-- 
Peter Geoghegan




Re: 13dev failed assert: comparetup_index_btree(): ItemPointer values should never be equal

2020-07-28 Thread Peter Geoghegan
On Tue, Jul 28, 2020 at 1:04 PM Tom Lane  wrote:
> No, I don't think so.  It was designed for the case of unique key X
> being inserted immediately after a deletion of the same key.  The
> deleted tuple is presumably not yet vacuumed-away, so the new tuple
> should have a different TID.  In no case should we have multiple index
> tuples pointing at the same TID; that would imply that somebody failed
> to vacuuum away an old index entry before freeing up the heap TID.

It looks like one HOT chain. I think cases where the
visibility/HeapTupleSatisfiesVacuum() stuff somehow gets confused
could result in the same heap TID (which is actually the HOT chain's
root TID) getting indexed twice.

> Or, perhaps, REINDEX is somehow scanning the same TID twice, and
> generating indeed-duplicate index entries?

It's 100% clear that that's what happens from my rr recording (kind
of). A conditional breakpoint in _bt_build_callback() clearly shows
that it gets called twice for the same TID value (twice in immediate
succession). The first time it gets called in the
!HeapTupleIsHeapOnlyTuple() path, the second time in the
HeapTupleIsHeapOnlyTuple() path (i.e. the path that uses the
root_offsets array).

I notice that the root tuple of the hot chain is marked HEAP_COMBOCID
(and xmin == xmax for the HOT chain tuple). The xmin for the successor
(which matches xmin and xmax for root tuple) exactly matches the
REINDEX/crashing session's OldestXmin.

-- 
Peter Geoghegan




Re: 13dev failed assert: comparetup_index_btree(): ItemPointer values should never be equal

2020-07-28 Thread Tom Lane
Peter Geoghegan  writes:
> I also find it suspicious that I can no longer produce the assertion
> failure once I force all non-unique system catalog indexes (such as
> Justin's repro index, pg_class_tblspc_relfilenode_index) to go through
> the HEAPTUPLE_INSERT_IN_PROGRESS +
> !TransactionIdIsCurrentTransactionId() handling for unique indexes
> shown here:

Hmm...

> Actually...was the code *originally* designed to avoid this issue?

No, I don't think so.  It was designed for the case of unique key X
being inserted immediately after a deletion of the same key.  The
deleted tuple is presumably not yet vacuumed-away, so the new tuple
should have a different TID.  In no case should we have multiple index
tuples pointing at the same TID; that would imply that somebody failed
to vacuuum away an old index entry before freeing up the heap TID.

Or, perhaps, REINDEX is somehow scanning the same TID twice, and
generating indeed-duplicate index entries?

regards, tom lane




Re: 13dev failed assert: comparetup_index_btree(): ItemPointer values should never be equal

2020-07-28 Thread Peter Geoghegan
On Tue, Jul 28, 2020 at 12:00 PM Peter Geoghegan  wrote:
> I still don't know what's going on here, but I find it suspicious that
> some relevant tuples go through the HEAPTUPLE_INSERT_IN_PROGRESS +
> !TransactionIdIsCurrentTransactionId() path of
> heapam_index_build_range_scan(). After all, if this wasn't a system
> catalog index then we'd expect to see "concurrent insert in progress
> within table \"%s\"" WARNING output.

I also find it suspicious that I can no longer produce the assertion
failure once I force all non-unique system catalog indexes (such as
Justin's repro index, pg_class_tblspc_relfilenode_index) to go through
the HEAPTUPLE_INSERT_IN_PROGRESS +
!TransactionIdIsCurrentTransactionId() handling for unique indexes
shown here:

/*
 * If we are performing uniqueness checks, indexing
 * such a tuple could lead to a bogus uniqueness
 * failure.  In that case we wait for the inserting
 * transaction to finish and check again.
 */
if (checking_uniqueness)
{
/*
 * Must drop the lock on the buffer before we wait
 */
LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
XactLockTableWait(xwait, heapRelation,
  >t_self,
  XLTW_InsertIndexUnique);
CHECK_FOR_INTERRUPTS();
goto recheck;
}

Commenting out "if (checking_uniqueness)" here at least *masks* the
bug. Seemingly by averting problems that the checking_uniqueness code
wasn't actually designed to solve. I imagine that this factor makes
the bug less serious in practice -- most system catalogs are unique
indexes.

Actually...was the code *originally* designed to avoid this issue?
Might that fact have been lost since HOT was first introduced? Commit
1ddc2703a93 changed some of the code in question to avoid deadlocks on
system catalogs with new-style VACUUM FULL. I wonder if it was a good
idea to not wait when we weren't checking_uniqueness following that
2010 commit, though -- we used to wait like this regardless of our
checking_uniqueness status.

(I understand that the real problem here may be the way that we can
release locks early for system catalogs, but let's start with
immediate causes.)

-- 
Peter Geoghegan




Re: 13dev failed assert: comparetup_index_btree(): ItemPointer values should never be equal

2020-07-28 Thread Peter Geoghegan
On Tue, Jul 28, 2020 at 10:37 AM Peter Geoghegan  wrote:
> It's starting to look more like that. I can reproduce the bug by
> running the REINDEX in a tight loop while "make installcheck" runs. It
> looks as if the two tuples passed to comparetup_index_btree() are
> separate tuples that each point to the same heap TID.

Evidently this is an old bug. The assertion that fails was added to
Postgres 12, but that isn't significant. The invariant we see violated
here has nothing to do with any of my B-Tree work -- it would have
been reasonable to add the same assertion to Postgres 9.5.

If I add the same assertion to 9.5 now, I find that the same steps
reproduce the problem -- "REINDEX INDEX
pg_class_tblspc_relfilenode_index" run in a tight loop connected to
the regression database, concurrent with a "make installcheck".

I still don't know what's going on here, but I find it suspicious that
some relevant tuples go through the HEAPTUPLE_INSERT_IN_PROGRESS +
!TransactionIdIsCurrentTransactionId() path of
heapam_index_build_range_scan(). After all, if this wasn't a system
catalog index then we'd expect to see "concurrent insert in progress
within table \"%s\"" WARNING output.

-- 
Peter Geoghegan




Re: 13dev failed assert: comparetup_index_btree(): ItemPointer values should never be equal

2020-07-28 Thread Peter Geoghegan
On Tue, Jul 28, 2020 at 10:06 AM Peter Geoghegan  wrote:
> I wrote the assertion that fails here with the bug that I fixed in
> commit 4974d7f87e62a58e80c6524e49677cb25cc10e12 in mind specifically.
> That was a bug that involved a scan that returned duplicate tuples due
> to a problem in heapam_index_build_range_scan() or all of the
> infrastructure that it depends on (directly and indirectly). I wonder
> if it's something like that -- this is also a system catalog index.

It's starting to look more like that. I can reproduce the bug by
running the REINDEX in a tight loop while "make installcheck" runs. It
looks as if the two tuples passed to comparetup_index_btree() are
separate tuples that each point to the same heap TID.

I have an rr recording of this. It shouldn't take too long to figure
out what's going on...

-- 
Peter Geoghegan




Re: 13dev failed assert: comparetup_index_btree(): ItemPointer values should never be equal

2020-07-28 Thread Peter Geoghegan
On Tue, Jul 28, 2020 at 8:47 AM Peter Geoghegan  wrote:
> This is very likely to be related to incremental sort because it's a
> use-after-free issue, which is the kind of thing that could only
> really happen inside tuplesort.c. This is clear because some of the
> variables have the tell-tale 0x7f7f7f pattern that we written by
> CLOBBER_FREED_MEMORY builds when memory is freed.

Actually, I changed my mind. The pointer variable sortKey within
comparetup_index_btree() has just been incremented in a way that makes
it point past the end of allocated memory, without being dereferenced.
That part is fine.

I still don't think that it's deduplication, though, because at the
point of the crash we haven't even reached _bt_load() yet. That is, we
haven't reached nbtsort.c code that is specific to Postgres 13 yet
(and besides, catalog indexes don't use deduplication in practice).

I wrote the assertion that fails here with the bug that I fixed in
commit 4974d7f87e62a58e80c6524e49677cb25cc10e12 in mind specifically.
That was a bug that involved a scan that returned duplicate tuples due
to a problem in heapam_index_build_range_scan() or all of the
infrastructure that it depends on (directly and indirectly). I wonder
if it's something like that -- this is also a system catalog index.

-- 
Peter Geoghegan




Re: 13dev failed assert: comparetup_index_btree(): ItemPointer values should never be equal

2020-07-28 Thread Peter Geoghegan
On Tue, Jul 28, 2020 at 8:40 AM James Coleman  wrote:
> Is that assumption largely based on the incremental sort patch
> refactoring tuplesort.c a bit? I haven't looked at it much at all, but
> I'm wondering if the issue could also be related to the btree
> duplicates changes in 13 given that we're looking at
> comparetup_index_btree and the datums are equal.

It couldn't possibly be the deduplication patch. That didn't change
anything in tuplesort.c.

This is very likely to be related to incremental sort because it's a
use-after-free issue, which is the kind of thing that could only
really happen inside tuplesort.c. This is clear because some of the
variables have the tell-tale 0x7f7f7f pattern that we written by
CLOBBER_FREED_MEMORY builds when memory is freed.


-- 
Peter Geoghegan




Re: 13dev failed assert: comparetup_index_btree(): ItemPointer values should never be equal

2020-07-28 Thread Justin Pryzby
On Tue, Jul 28, 2020 at 11:40:14AM -0400, James Coleman wrote:
> > This appears to be an issue with d2d8a229b (Incremental Sort), so I will add
> > at: https://wiki.postgresql.org/wiki/PostgreSQL_13_Open_Items
> 
> Is that assumption largely based on the incremental sort patch
> refactoring tuplesort.c a bit? I haven't looked at it much at all, but
> I'm wondering if the issue could also be related to the btree
> duplicates changes in 13 given that we're looking at
> comparetup_index_btree and the datums are equal.

Good point.  I'd looked at something like this to come to my tentative
conclusion.

git log --stat origin/REL_12_STABLE..origin/REL_13_STABLE -- 
src/backend/utils/sort/*tuple*c

-- 
Justin




Re: 13dev failed assert: comparetup_index_btree(): ItemPointer values should never be equal

2020-07-28 Thread James Coleman
On Tue, Jul 28, 2020 at 11:10 AM Justin Pryzby  wrote:
>
> An internal server aborted last night while running a maintenance script.  I
> reproduced this easily running the crashing command in a loop, and verified
> this is a live issue on REL_13_STABLE (dc6f2fb43).
> REINDEX INDEX pg_class_tblspc_relfilenode_index
>
> It looks like this crashed once before, and I didn't notice until now:
> Fri Jun 26 22:30:29 CDT 2020: pg_shdescription: 
> pg_toast.pg_toast_2396_index(reindex toast)...
> psql: error: could not connect to server: server closed the connection 
> unexpectedly
>
> #3  0x00afde98 in comparetup_index_btree (a=0x201fa58, b=0x201fa10, 
> state=0x20147b0) at tuplesort.c:4251
> 4251Assert(false);
> (gdb) l
> 4246if (pos1 != pos2)
> 4247return (pos1 < pos2) ? -1 : 1;
> 4248}
> 4249
> 4250/* ItemPointer values should never be equal */
> 4251Assert(false);
> 4252
> 4253return 0;
> 4254}
> 4255
>
> #3  0x00afde98 in comparetup_index_btree (a=0x201fa58, b=0x201fa10, 
> state=0x20147b0) at tuplesort.c:4251
> sortKey = 0x2014d60
> tuple1 = 0x20189d8
> tuple2 = 0x2018898
> keysz = 2
> tupDes = 0x7f48996b3790
> equal_hasnull = false
> nkey = 3
> compare = 0
> datum1 = 67999603
> datum2 = 67999603
> isnull1 = false
> isnull2 = false
> __func__ = "comparetup_index_btree"
>
> (gdb) p *tuple1
> $2 = {t_tid = {ip_blkid = {bi_hi = 0, bi_lo = 0}, ip_posid = 43}, t_info = 16}
> (gdb) p *tuple2
> $3 = {t_tid = {ip_blkid = {bi_hi = 0, bi_lo = 0}, ip_posid = 43}, t_info = 16}
>
> (gdb) p *sortKey
> $5 = {ssup_cxt = 0x40, ssup_collation = 0, ssup_reverse = false, 
> ssup_nulls_first = false, ssup_attno = 0, ssup_extra = 0x0, comparator = 
> 0x7f7f7f7f7f7f7f7f, abbreviate = 127, abbrev_converter = 0x7f7f7f7f7f7f7f7f,
>   abbrev_abort = 0x7f7f7f7f7f7f7f7f, abbrev_full_comparator = 
> 0x7f7f7f7f7f7f7f7f}
>
> (gdb) p *tupDes
> $6 = {natts = 2, tdtypeid = 0, tdtypmod = -1, tdrefcount = 1, constr = 0x0, 
> attrs = 0x7f48996b37a8}
>
> Core was generated by `postgres: postgres sentinel [local] REINDEX
> '.
>
> (gdb) bt
> #0  0x7f489853d1f7 in raise () from /lib64/libc.so.6
> #1  0x7f489853e8e8 in abort () from /lib64/libc.so.6
> #2  0x00aafff7 in ExceptionalCondition (conditionName=0xccd0dc 
> "false", errorType=0xccc327 "FailedAssertion", fileName=0xccc2fd 
> "tuplesort.c", lineNumber=4251) at assert.c:67
> #3  0x00afde98 in comparetup_index_btree (a=0x201fa58, b=0x201fa10, 
> state=0x20147b0) at tuplesort.c:4251
> #4  0x00af1d5e in qsort_tuple (a=0x201fa10, n=18, cmp_tuple=0xafcf21 
> , state=0x20147b0) at qsort_tuple.c:140
> #5  0x00af2104 in qsort_tuple (a=0x201f710, n=50, cmp_tuple=0xafcf21 
> , state=0x20147b0) at qsort_tuple.c:191
> #6  0x00af2104 in qsort_tuple (a=0x201cc38, n=544, cmp_tuple=0xafcf21 
> , state=0x20147b0) at qsort_tuple.c:191
> #7  0x00af8056 in tuplesort_sort_memtuples (state=0x20147b0) at 
> tuplesort.c:3490
> #8  0x00af51a9 in tuplesort_performsort (state=0x20147b0) at 
> tuplesort.c:1985
> #9  0x00529418 in _bt_leafbuild (btspool=0x1f784e0, btspool2=0x0) at 
> nbtsort.c:553
> #10 0x00528f9c in btbuild (heap=0x1fb5758, index=0x7f48996b3460, 
> indexInfo=0x1f77a48) at nbtsort.c:333
> #11 0x005adcb3 in index_build (heapRelation=0x1fb5758, 
> indexRelation=0x7f48996b3460, indexInfo=0x1f77a48, isreindex=true, 
> parallel=true) at index.c:2903
> #12 0x005aec6b in reindex_index (indexId=3455, 
> skip_constraint_checks=false, persistence=112 'p', options=2) at index.c:3539
> #13 0x00692583 in ReindexIndex (indexRelation=0x1f54840, options=0, 
> concurrent=false) at indexcmds.c:2466
> #14 0x00932e36 in standard_ProcessUtility (pstmt=0x1f54960, 
> queryString=0x1f53d90 "REINDEX INDEX pg_class_tblspc_relfilenode_index", 
> context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x1f54c40,
> qc=0x7ffde023bf80) at utility.c:929
> #15 0x0093241f in ProcessUtility (pstmt=0x1f54960, 
> queryString=0x1f53d90 "REINDEX INDEX pg_class_tblspc_relfilenode_index", 
> context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x1f54c40, 
> qc=0x7ffde023bf80)
> at utility.c:524
> #16 0x00931288 in PortalRunUtility (portal=0x1fb5ac0, 
> pstmt=0x1f54960, isTopLevel=true, setHoldSnapshot=false, dest=0x1f54c40, 
> qc=0x7ffde023bf80) at pquery.c:1157
> #17 0x009314a7 in PortalRunMulti (portal=0x1fb5ac0, isTopLevel=true, 
> setHoldSnapshot=false, dest=0x1f54c40, altdest=0x1f54c40, qc=0x7ffde023bf80) 
> at pquery.c:1303
> #18 0x009309bc in PortalRun (portal=0x1fb5ac0, 
> count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x1f54c40, 
> altdest=0x1f54c40, qc=0x7ffde023bf80) at pquery.c:779
> #19 0x0092ab5b