Re: nbtree's ScalarArrayOp array mark/restore code appears to be buggy

2023-09-23 Thread Peter Geoghegan
On Fri, Sep 22, 2023 at 8:17 PM Peter Geoghegan  wrote:
> My suspicion is that bugfix commit 70bc5833 missed some subtlety
> around what we need to do to make sure that the array keys stay "in
> sync" with the scan. I'll have time to debug the problem some more
> tomorrow.

I've figured out what's going on here.

If I make my test case "group by" both of the indexed columns from the
composite index (either index/table will do, since it's an equijoin),
a more detailed picture emerges that hints at the underlying problem:

┌───┬─┬─┐
│ count │ small_a │ small_b │
├───┼─┼─┤
│ 8,192 │   1 │   2 │
│ 8,192 │   1 │   3 │
│ 8,192 │   1 │   5 │
│ 8,192 │   1 │  10 │
│ 8,192 │   1 │  12 │
│ 8,192 │   1 │  17 │
│ 2,872 │   1 │  19 │
└───┴─┴─┘
(7 rows)

The count for the final row is wrong. It should be 8,192, just like
the earlier counts for lower (small_a, small_b) groups. Notably, the
issue is limited to the grouping that has the highest sort order. That
strongly hints that the problem has something to do with "array
wraparound".

The query qual contains "WHERE small_a IN (1, 3)", so we'll "wraps
around" from cur_elem index 1 (value 3) to cur_elem index 0 (value 1),
without encountering any rows where small_a is 3 (because there aren't
any in the index). That in itself isn't the problem. The problem is
that _bt_restore_array_keys() doesn't consider wraparound. It sees
that "cur_elem == mark_elem" for all array scan keys, and figues that
it doesn't need to call _bt_preprocess_keys(). This is incorrect,
since the current set of search-type scan keys (the set most recently
output, during the last _bt_preprocess_keys() call) still have the
value "3".

The fix for this should be fairly straightforward. We must teach
_bt_restore_array_keys() to distinguish "past the end of the array"
from "after the start of the array", so that doesn't spuriously skip a
required call to _bt_preprocess_keys() . I already see that the
problem goes away once _bt_restore_array_keys() is made to call
_bt_preprocess_keys() unconditionally, so I'm already fairly confident
that this will work.

-- 
Peter Geoghegan




Re: nbtree's ScalarArrayOp array mark/restore code appears to be buggy

2023-09-23 Thread Peter Geoghegan
On Sat, Sep 23, 2023 at 11:47 AM Peter Geoghegan  wrote:
> The fix for this should be fairly straightforward. We must teach
> _bt_restore_array_keys() to distinguish "past the end of the array"
> from "after the start of the array", so that doesn't spuriously skip a
> required call to _bt_preprocess_keys() . I already see that the
> problem goes away once _bt_restore_array_keys() is made to call
> _bt_preprocess_keys() unconditionally, so I'm already fairly confident
> that this will work.

Attached draft patch shows how this could work.

_bt_restore_array_keys() has comments that seem to suppose that
calling _bt_preprocess_keys is fairly expensive, and something that's
well worth avoiding. But...is it, really? I wonder if we'd be better
off just biting the bullet and always calling _bt_preprocess_keys
here. Could it really be such a big cost, compared to pinning and
locking the page in the btrestpos() path?

The current draft SAOP patch calls _bt_preprocess_keys() with a buffer
lock held. This is very likely unsafe, so I'll need to come up with a
principled approach (e.g. a stripped down, specialized version of
_bt_preprocess_keys that's safe to call while holding a lock seems doable).
I've been able to put that off for a few months now because it just
doesn't impact my microbenchmarks to any notable degree (and not just
in those cases where we can use the "numberOfKeys == 1" fast path at
the start of _bt_preprocess_keys). This at least suggests that the cost of
always calling _bt_preprocess_keys in _bt_restore_array_keys might be
acceptable.

--
Peter Geoghegan
From 781fd293aa7556f5ab029d2d5c2dfc829a4e8efd Mon Sep 17 00:00:00 2001
From: Peter Geoghegan 
Date: Sat, 23 Sep 2023 15:00:59 -0700
Subject: [PATCH v1] Fix btmarkpos/btrestrpos array keys edge case.

Oversight in bug fix commit 70bc5833, which taught nbtree to handle
array keys during mark/restore processing, but missed this subtlety.

Author: Peter Geoghegan 
Discussion: CAH2-WzkgP3DDRJxw6DgjCxo-cu-DKrvjEv_ArkP2ctBJatDCYg@mail.gmail.com">https://postgr.es/m/CAH2-WzkgP3DDRJxw6DgjCxo-cu-DKrvjEv_ArkP2ctBJatDCYg@mail.gmail.com
Backpatch: 11- (all supported branches).
---
 src/include/access/nbtree.h  |  2 ++
 src/backend/access/nbtree/nbtree.c   |  1 +
 src/backend/access/nbtree/nbtutils.c | 18 ++
 3 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/src/include/access/nbtree.h b/src/include/access/nbtree.h
index f5c66964c..b38279e7b 100644
--- a/src/include/access/nbtree.h
+++ b/src/include/access/nbtree.h
@@ -1043,6 +1043,8 @@ typedef struct BTScanOpaqueData
 
 	/* workspace for SK_SEARCHARRAY support */
 	ScanKey		arrayKeyData;	/* modified copy of scan->keyData */
+	bool		arraysStarted;	/* Started array keys, but have yet to
+ * "reach past the end" of all arrays? */
 	int			numArrayKeys;	/* number of equality-type array keys (-1 if
  * there are any unsatisfiable array keys) */
 	int			arrayKeyCount;	/* count indicating number of array scan keys
diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index 62bc9917f..6c5b5c69c 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -364,6 +364,7 @@ btbeginscan(Relation rel, int nkeys, int norderbys)
 		so->keyData = NULL;
 
 	so->arrayKeyData = NULL;	/* assume no array keys for now */
+	so->arraysStarted = false;
 	so->numArrayKeys = 0;
 	so->arrayKeys = NULL;
 	so->arrayContext = NULL;
diff --git a/src/backend/access/nbtree/nbtutils.c b/src/backend/access/nbtree/nbtutils.c
index 7da499c4d..17e597f11 100644
--- a/src/backend/access/nbtree/nbtutils.c
+++ b/src/backend/access/nbtree/nbtutils.c
@@ -539,6 +539,8 @@ _bt_start_array_keys(IndexScanDesc scan, ScanDirection dir)
 			curArrayKey->cur_elem = 0;
 		skey->sk_argument = curArrayKey->elem_values[curArrayKey->cur_elem];
 	}
+
+	so->arraysStarted = true;
 }
 
 /*
@@ -598,6 +600,14 @@ _bt_advance_array_keys(IndexScanDesc scan, ScanDirection dir)
 	if (scan->parallel_scan != NULL)
 		_bt_parallel_advance_array_keys(scan);
 
+	/*
+	 * When no new array keys were found, the scan is "past the end" of the
+	 * array keys.  _bt_start_array_keys can still "restart" the array keys if
+	 * a rescan is required.
+	 */
+	if (!found)
+		so->arraysStarted = false;
+
 	return found;
 }
 
@@ -648,11 +658,11 @@ _bt_restore_array_keys(IndexScanDesc scan)
 	}
 
 	/*
-	 * If we changed any keys, we must redo _bt_preprocess_keys.  That might
-	 * sound like overkill, but in cases with multiple keys per index column
-	 * it seems necessary to do the full set of pushups.
+	 * If we changed any keys, or if we're not sure that so->keyData[]
+	 * contains the array keys indicated as current within so->arrayKeys[],
+	 * then we must redo _bt_preprocess_keys
 	 */
-	if (changed)
+	if (changed || !so->arraysStarted)
 	{
 		_bt_preprocess_keys(scan);
 		/* The mark should have been set on a consistent set of keys... */
-- 
2.40.1



Re: nbtree's ScalarArrayOp array mark/restore code appears to be buggy

2023-09-25 Thread Peter Geoghegan
On Sat, Sep 23, 2023 at 4:22 PM Peter Geoghegan  wrote:
> Attached draft patch shows how this could work.
>
> _bt_restore_array_keys() has comments that seem to suppose that
> calling _bt_preprocess_keys is fairly expensive, and something that's
> well worth avoiding. But...is it, really? I wonder if we'd be better
> off just biting the bullet and always calling _bt_preprocess_keys
> here.

My current plan is to commit something close to my original patch on
Wednesday or Thursday. My proposed fix is minimally invasive (it still
allows _bt_restore_array_keys to avoid most calls to
_bt_preprocess_keys), so I don't see any reason to delay acting here.

-- 
Peter Geoghegan