On Fri, Sep 22, 2023 at 8:17 PM Peter Geoghegan <p...@bowt.ie> wrote: > My suspicion is that bugfix commit 70bc5833 missed some subtlety > around what we need to do to make sure that the array keys stay "in > sync" with the scan. I'll have time to debug the problem some more > tomorrow.
I've figured out what's going on here. If I make my test case "group by" both of the indexed columns from the composite index (either index/table will do, since it's an equijoin), a more detailed picture emerges that hints at the underlying problem: ┌───────┬─────────┬─────────┐ │ count │ small_a │ small_b │ ├───────┼─────────┼─────────┤ │ 8,192 │ 1 │ 2 │ │ 8,192 │ 1 │ 3 │ │ 8,192 │ 1 │ 5 │ │ 8,192 │ 1 │ 10 │ │ 8,192 │ 1 │ 12 │ │ 8,192 │ 1 │ 17 │ │ 2,872 │ 1 │ 19 │ └───────┴─────────┴─────────┘ (7 rows) The count for the final row is wrong. It should be 8,192, just like the earlier counts for lower (small_a, small_b) groups. Notably, the issue is limited to the grouping that has the highest sort order. That strongly hints that the problem has something to do with "array wraparound". The query qual contains "WHERE small_a IN (1, 3)", so we'll "wraps around" from cur_elem index 1 (value 3) to cur_elem index 0 (value 1), without encountering any rows where small_a is 3 (because there aren't any in the index). That in itself isn't the problem. The problem is that _bt_restore_array_keys() doesn't consider wraparound. It sees that "cur_elem == mark_elem" for all array scan keys, and figues that it doesn't need to call _bt_preprocess_keys(). This is incorrect, since the current set of search-type scan keys (the set most recently output, during the last _bt_preprocess_keys() call) still have the value "3". The fix for this should be fairly straightforward. We must teach _bt_restore_array_keys() to distinguish "past the end of the array" from "after the start of the array", so that doesn't spuriously skip a required call to _bt_preprocess_keys() . I already see that the problem goes away once _bt_restore_array_keys() is made to call _bt_preprocess_keys() unconditionally, so I'm already fairly confident that this will work. -- Peter Geoghegan