Thanks for the review! Attached is v2 with feedback addressed. On Mon, Oct 31, 2022 at 9:09 PM Andres Freund <and...@anarazel.de> wrote: > > From 9d8b01960463dc64ff5b111d523ff80fce3017af Mon Sep 17 00:00:00 2001 > > From: Melanie Plageman <melanieplage...@gmail.com> > > Date: Mon, 31 Oct 2022 13:40:06 -0400 > > Subject: [PATCH v1 2/3] Turn HeapKeyTest macro into function > > > > This should always be inlined appropriately now. It is easier to read as > > a function. Also, remove unused include in catcache.c. > > --- > > src/backend/access/heap/heapam.c | 10 ++-- > > src/backend/utils/cache/catcache.c | 1 - > > src/include/access/valid.h | 76 ++++++++++++------------------ > > 3 files changed, 36 insertions(+), 51 deletions(-) > > > > diff --git a/src/backend/access/heap/heapam.c > > b/src/backend/access/heap/heapam.c > > index 12be87efed..1c995faa12 100644 > > --- a/src/backend/access/heap/heapam.c > > +++ b/src/backend/access/heap/heapam.c > > @@ -716,8 +716,10 @@ heapgettup(HeapScanDesc scan, > > > > snapshot); > > > > if (valid && key != NULL) > > - HeapKeyTest(tuple, > > RelationGetDescr(scan->rs_base.rs_rd), > > - nkeys, key, > > valid); > > + { > > + valid = HeapKeyTest(tuple, > > RelationGetDescr(scan->rs_base.rs_rd), > > + nkeys, key); > > + } > > > > if (valid) > > { > > superfluous parens.
fixed. > > From a894ce38c488df6546392b9f3bd894b67edf951e Mon Sep 17 00:00:00 2001 > > From: Melanie Plageman <melanieplage...@gmail.com> > > Date: Mon, 31 Oct 2022 13:40:29 -0400 > > Subject: [PATCH v1 3/3] Refactor heapgettup* and heapgetpage > > > > Simplify heapgettup(), heapgettup_pagemode(), and heapgetpage(). All > > three contained several unnecessary local variables, duplicate code, and > > nested if statements. Streamlining these improves readability and > > extensibility. > > It'd be nice to break this into slightly smaller chunks. I can do that. Since incorporating feedback will be harder once I break it up into smaller chunks, I'm inclined to wait to do so until I know that the structure I have now is the one we will go with. (I know smaller chunks will make it more reviewable.) > > + > > +static inline void heapgettup_no_movement(HeapScanDesc scan) > > +{ > > FWIW, for function definitions we keep the return type (and with that also the > the "static inline") on a separate line. Fixed > > > + ItemId lpp; > > + OffsetNumber lineoff; > > + BlockNumber page; > > + Page dp; > > + HeapTuple tuple = &(scan->rs_ctup); > > + /* > > + * ``no movement'' scan direction: refetch prior tuple > > + */ > > + > > + /* Since the tuple was previously fetched, needn't lock page here */ > > + if (!scan->rs_inited) > > + { > > + Assert(!BufferIsValid(scan->rs_cbuf)); > > + tuple->t_data = NULL; > > + return; > > Is it possible to have a no-movement scan with an uninitialized scan? That > doesn't really seem to make sense. At least that's how I understand the > explanation for NoMovement nearby: > * dir == NoMovementScanDirection means "re-fetch the tuple indicated > * by scan->rs_ctup". > > We can't have a rs_ctup without an already started scan, no? > > Looks like this is pre-existing code that you just moved, but it still seems > wrong. Changed to an assert > > > + } > > + page = ItemPointerGetBlockNumber(&(tuple->t_self)); > > + if (page != scan->rs_cblock) > > + heapgetpage((TableScanDesc) scan, page); > > > We have a > BlockNumber page; > and > Page dp; > in this code which seems almost intentionally confusing. This again is a > pre-existing issue but perhaps we could clean it up first? in attached page -> block dp -> page in basically all locations in heapam.c (should that be its own commit?) > > +static inline Page heapgettup_continue_page(HeapScanDesc scan, BlockNumber > > page, ScanDirection dir, > > + int *linesleft, OffsetNumber *lineoff) > > +{ > > + HeapTuple tuple = &(scan->rs_ctup); > > Hm. Finding the next offset via rs_ctup doesn't seem quite right. For one, > it's not actually that cheap to extract the offset from an ItemPointer because > of the the way we pack it into ItemPointerData. So, it was like this before [1]. What about saving the lineoff in rs_cindex. It is smaller, but that seems okay, right? > > + Page dp = BufferGetPage(scan->rs_cbuf); > > + TestForOldSnapshot(scan->rs_base.rs_snapshot, scan->rs_base.rs_rd, > > dp); > > Newlines between definitions and code :) k > Perhaps worth asserting that the scan is initialized and that rs_cbuf is > valid? indeed. > > > + if (ScanDirectionIsForward(dir)) > > + { > > + *lineoff = > > OffsetNumberNext(ItemPointerGetOffsetNumber(&(tuple->t_self))); > > + *linesleft = PageGetMaxOffsetNumber(dp) - (*lineoff) + 1; > > We can't access PageGetMaxOffsetNumber etc without holding a lock on the > page. It's not immediately obvious that that is held in all paths. In heapgettup() I call LockBuffer() before invoking heapgettup_continue_page() and heapgettup_start_page() which are the ones doing this. I did have big plans for using the continue_page and start_page functions in heapgettup_pagemode() as well, but since I'm not doing that now, I can add in an expectation that the lock is held. I added a comment saying the caller is responsible for acquiring the lock if needed. I thought of adding an assert, but I don't see that being done outside of bufmgr.c BufferDesc *bufHdr = GetBufferDescriptor(buffer - 1); Assert(LWLockHeldByMe(BufferDescriptorGetContentLock(bufHdr))); > > +static inline BlockNumber heapgettup_initial_page(HeapScanDesc scan, > > ScanDirection dir) > > +{ > > + Assert(!ScanDirectionIsNoMovement(dir)); > > + Assert(!scan->rs_inited); > > Is there a reason we couldn't set rs_inited in here, rather than reapeating > that in all callers? I wasn't sure if future callers or existing callers in the future may need to do steps other than what is in heapgettup_initial_page() before setting rs_inited. I thought of the responsibility of heapgettup_initial_page() as returning the initial page to start the scan. If it is going to do all initialization steps, perhaps the name should change? I thought having a function that says it does initialization of the scan might be confusing since initscan() also exists. > ISTM that this function should deal with the > /* > * return null immediately if relation is empty > */ > > logic, I think you now are repeating that check on every call to heapgettup(). So, that's a good point. If I move setting rs_inited inside of heapgettup_initial_page(), then I can also easily move the empty table check inside there too. I don't want to set rs_inited before every return in heapgettup_initial_page(). Do you think there are any issues with setting it at the top of the function? I thought about setting it at the very top (even before checking if the relation is empty) Is it okay to set it before the empty table check? rs_inited will be set to false at the bottom before returning. But, maybe this will be an issue in other callers of heapgettup_initial_page()? Anyway, I have changed it in attached v2. > > @@ -511,182 +711,55 @@ heapgettup(HeapScanDesc scan, > > ScanKey key) > > { > > HeapTuple tuple = &(scan->rs_ctup); > > - Snapshot snapshot = scan->rs_base.rs_snapshot; > > - bool backward = ScanDirectionIsBackward(dir); > > BlockNumber page; > > - bool finished; > > Page dp; > > - int lines; > > OffsetNumber lineoff; > > int linesleft; > > - ItemId lpp; > > + > > + if (ScanDirectionIsNoMovement(dir)) > > + return heapgettup_no_movement(scan); > > Maybe add an unlikely() - this path is barely ever used... done. > > + page = scan->rs_cblock; > > + LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE); > > + dp = heapgettup_continue_page(scan, page, dir, &linesleft, > > &lineoff); > > + goto continue_page; > > } > > > > /* > > * advance the scan until we find a qualifying tuple or run out of > > stuff > > * to scan > > */ > > - lpp = PageGetItemId(dp, lineoff); > > - for (;;) > > + while (page != InvalidBlockNumber) > > { > > + heapgetpage((TableScanDesc) scan, page); > > + LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE); > > + dp = heapgettup_start_page(scan, page, dir, &linesleft, > > &lineoff); > > + continue_page: > > > I don't like the goto continue_page at all. Seems that the paths leading here > should call LockBuffer(), heapgettup_start_page() etc? Possibly a do {} while > () loop could do the trick as well. I don't see how a do while loop would solve help with the problem. We need to check if the block number is valid after getting a block assignment before doing heapgetpage() (e.g. after heapgettup_initial_page() and after heapgettup_advance_page()). Removing the goto continue_page means adding the heapgettpage(), heapgettup_start_page(), etc code block in two places now (both after heapgettup_initial_page() and after heapgettup_advance_page()) and, in both locations we have to add an if statement to check if the block is valid. I feel like this makes the function longer and harder to understand. Keeping the loop as short as possible makes it clear what it is doing. I think that with an appropriate warning comment, the goto continue_page is clearer and easier to understand. To me, starting a page at the top of the outer loop, then looping through the lines in the page and is the structure that makes the most sense. - Melanie [1] https://github.com/postgres/postgres/blob/master/src/backend/access/heap/heapam.c#L572
v2-0001-Remove-breaks-in-HeapTupleSatisfiesVisibility.patch
Description: Binary data
v2-0002-Turn-HeapKeyTest-macro-into-function.patch
Description: Binary data
v2-0003-Refactor-heapgettup-and-heapgetpage.patch
Description: Binary data