On Fri, Dec 8, 2023 at 10:46 AM Alexander Korotkov <aekorot...@gmail.com> wrote: > > In your example "foo = 90" is satisfied by_bt_first(), but "foo > > > 99::int8" is not. I think this could be resolved by introducing a > > separate flag exactly distinguishing scan keys used for _bt_first(). > > I'm going to post the patch doing this. > > The draft patch is attached. It requires polishing and proper > commenting. But I hope the basic idea is clear. Do you think this is > the way forward?
Does this really need to work at the scan key level, rather than at the whole-scan level? Wouldn't it make more sense to just totally disable it for the whole scan, since we'll barely ever need to do that anyway? My ScalarArrayOpExpr patch will need to disable this optimization, since with that patch in place we don't necessarily go through _bt_first each time the search-type scan keys must change. We might need to check a few tuples from before the _bt_first-wise position of the next set of array values, which is a problem with opposite-direction-only inequalities (it's a little bit like the situation from my test case, actually). That's partly why I'd prefer this to work at the whole-scan level (though I also just don't think that inventing SK_BT_BT_FIRST makes much sense). I think that you should make it clearer that this whole optimization only applies to required *inequalities*, which can be required in the opposite direction *only*. It should be more obvious from looking at the code that the optimization doesn't apply to required equality strategy scan keys (those are always required in *both* scan directions or in neither direction, so unlike the closely related prefix optimization added by the same commit, they just can't use the optimization that we need to fix here). BTW, do we really need to keep around the BTScanOpaqueData.firstPage field? Why can't the call to _bt_readpage from _bt_first (and from _bt_endpoint) just pass "firstPage=true" as a simple argument? Note that the first call to _bt_readpage must take place from _bt_first (or from _bt_endpoint). The first _bt_first call is already kind of special, in a way that is directly related to this issue. I added some comments about that to today's commit c9c0589fda, in fact -- I think it's an important issue in general. -- Peter Geoghegan