On Tue, 18 Nov 2025 20:15:40 -0500 Peter Geoghegan <[email protected]> wrote:
> On Tue, Nov 18, 2025 at 2:28 AM Yugo Nagata <[email protected]> wrote: > > I've attached a patch that adds the following comment: > > > > + * nextkey determines how the scankey's boundary is interpreted, and > > backward > > + * indicates a backward scan. See comments in _bt_first for a more > > detailed > > + * explanation of these fields. > > > > What do think? > > Seems reasonable to me. Thank you for your review. > > I wonder if we also do something about these existing _bt_binsrch comments: > > * On an internal (non-leaf) page, _bt_binsrch() returns the OffsetNumber > * of the last key < given scankey, or last key <= given scankey if nextkey > * is true. (Since _bt_compare treats the first data key of such a page as > * minus infinity, there will be at least one key < scankey, so the result > * always points at one of the keys on the page.) > > Here we describe what happens on an internal page. This is correct, > but *seems* to contradict the higher level comments at the end of > _bt_first. There is actually no contradiction between the two -- not > when you understand that _bt_first describes the whole end-to-end > process of calling _bt_search and then calling _bt_binsrch on the leaf > page (not of calling _bt_binsrch once, against an internal page). > > One could think of this _bt_binsrch internal page behavior as > compensating for the fact that internal pages have pivot tuples that > consist of a separator key (except for the first key, which is just > has a -inf key/no key), plus a downlink that points to the *next* page > after that separator key one level down (except for the internal page > high key, which has no downlink at all). Might be useful to say > something like that instead. > > This is hard to explain without an example. Right now, an internal > page might have pivot tuples that look like this: > > Separator: -inf, Downlink: 1 > Separator: 'a', Downlink: 2 > Separator: 'c', Downlink: 3 > Separator: 'f', Downlink: (none, this is the high key) > > But _bt_binsrch "pretends" that our internal page actually contains: > > Downlink: 1 > Separator: 'a' > Downlink: 2 > Separator: 'c' > Downlink: 3 > Separator: 'f' > > So if our = scan key is (say) 'c', we should descend using the > downlink that points to block 2 (not the one that points to block 3, > as might be expected from looking at the real physical representation > consisting of pivot tuples). That is what the scan needs to do to get > to the page one level down whose high key is also 'c' -- that's where > the scan needs to look. (If the next level down is the leaf level, > then the leaf page we descend to might also contain a *non*-pivot > tuple with the key value 'c', "right before" the high key with 'c', > which the scan will need to return in _bt_readpage. Lehman & Yao allow > the key before a leaf page's high key to be equal to the high key, but > otherwise forbid all duplicate keys.) > > I find it very hard to know what explanation will be the least > confusing to other people, at least in this area. Since you're > interested in this area, I thought I'd ask what you think. I do not see any contradiction between the comment on _bt_binsrch and the comments at the end of _bt_first. The former explicitly states that it refers to internal (non-leaf) pages, and I understand the latter to describe loading data from a leaf page. That said, we could make this clearer by explicitly mentioning the leaf page in the first sentence, for example: * Now load data from the first leaf page of the scan (usually the *page currently in so->currPos.buf). Regards, Yugo Nagata -- Yugo Nagata <[email protected]>
