On Wed, Apr 4, 2018 at 10:46 PM, Peter Geoghegan <p...@bowt.ie> wrote:
> > > > You mean passing "fastpath" to _bt_insertonpg and then checking it's > false > > if page split is needed? But isn't page split is only needed if the page > > doesn't have enough free space? If so, we have checked for that before > > setting "fastpath". > > That's not exactly what I meant. I meant that if: > > 1. This is an insertion to the leaf level in _bt_insertonpg(). > > and > > 2. We don't have space on the page, and so must do a split (or at > least free LP_DEAD items). > > and > > 3. RelationGetTargetBlock(rel) != InvalidBlockNumber > > There should be an assertion failure. This new assertion within > _bt_insertonpg() makes it much less likely that the assumption breaks > later. > I came up with something like this: + /* + * If we're here then a pagesplit is needed. We should never reach here + * if we're using the fastpath since we should have checked for all the + * required conditions, including the fact that this page has enough + * freespace. Note that this routine can in-theory deal with the + * situation where a NULL stack pointer is passed (that's what would + * happen if the fastpath is taken), like it does during crash + * recovery. But that path is much slower, defeating the very purpose + * of the optimisation. The following assertion should protect us from + * any future code changes that invalidate those assumptions. + * + * Note the fact that whenever we fail to take the fastpath, we clear + * the cached block. So checking for a valid cached block at this point + * is enough to decide whether we're in a fastpath or not. + */ + Assert(!(P_ISLEAF(lpageop) && + BlockNumberIsValid(RelationGetTargetBlock(rel)))); + But then I started thinking can _bt_insertonpg() be called from a code path which doesn't start at _bt_doinsert()? I traced one such path _bt_getstackbuf() -> _bt_finish_split() -> _bt_insert_parent(), but that can only happen in case of a non-leaf page. The assertion which checks for a leaf page, should be fine, right? > > Finally, I'd like to see a small paragraph in the nbtree README, about > the high level theory behind this optimization and page recycling. The > assumption is that there can only be one non-ignorable leaf rightmost > page, and so even a RecentGlobalXmin style interlock isn't required. > We cannot fail to detect that our hint was invalidated, because there > can only be one such page in the B-Tree at any time. It's possible > that the page will be deleted and recycled without a backend's cached > page also being detected as invalidated, but only when we happen to > recycle a page that once again becomes the rightmost leaf page. > > Can I borrow that almost verbatim, after adding details about the optimisation itself? Looks like I'm too tired to think sensibly. > Once those changes are made, this should be fine to commit. > Ok, thanks. Thanks, Pavan -- Pavan Deolasee http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services