On Tue, 31 Jan 2023 at 12:18, Melanie Plageman <melanieplage...@gmail.com> wrote: > v7 attached
I've been looking over the v7-0002 patch today and I did make a few adjustments to heapgettup_initial_block() as I would prefer to see the branching of each of all these helper functions follow the pattern of: if (<forward scan>) { if (<parallel scan>) <parallel stuff> else <serial stuff> } else { <backwards serial stuff> } which wasn't quite what the function was doing. Along the way, I noticed that 0002 has a subtle bug that does not seem to be present once the remaining patches are applied. I think I'm happier to push these along the lines of how you have them in the patches, so I've held off pushing for now due to the bug and the change I had to make to fix it. The problem is around the setting of scan->rs_inited = true; you've moved that into heapgettup_initial_block() and you've correctly not initialised the scan for empty tables when you return InvalidBlockNumber, however, you've not correctly considered the fact that table_block_parallelscan_nextpage() could also return InvalidBlockNumber if the parallel workers manage to grab all of the blocks before the current process gets the first block. I don't know for sure, but it looks like this could cause problems when heapgettup() or heapgettup_pagemode() got called again for a rescan. We'd have returned the NULL tuple to indicate that no further tuples exist, but we'll have left rs_inited set to true which looks like it'll cause issues. I wondered if it might be better to do the scan->rs_inited = true; in heapgettup() and heapgettup_pagemode() instead. The attached v8 patch does it this way. Despite this fixing that bug, I think this might be a slightly better division of duties. If you're ok with the attached (and everyone else is too), then I can push it in the (NZ) morning. The remaining patches would need to be rebased due to my changes. David
From cbd37463bdaa96afed4c7c739c8e91b770a9f8a7 Mon Sep 17 00:00:00 2001 From: David Rowley <dgrow...@gmail.com> Date: Wed, 1 Feb 2023 19:35:16 +1300 Subject: [PATCH v8] Refactor heapam.c adding heapgettup_initial_block function Here we adjust heapgettup() and heapgettup_pagemode() to move the code that fetches the first block out into a helper function. This removes some code duplication. Author: Melanie Plageman Reviewed-by: David Rowley Discussion: https://postgr.es/m/CAAKRu_bvkhka0CZQun28KTqhuUh5ZqY=_t8qeqzqol02rpi...@mail.gmail.com --- src/backend/access/heap/heapam.c | 225 ++++++++++++++----------------- 1 file changed, 103 insertions(+), 122 deletions(-) diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index 0a8bac25f5..40168cc9ca 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -483,6 +483,67 @@ heapgetpage(TableScanDesc sscan, BlockNumber block) scan->rs_ntuples = ntup; } +/* + * heapgettup_initial_block - return the first BlockNumber to scan + * + * Returns InvalidBlockNumber when there are no blocks to scan. This can + * occur with empty tables and in parallel scans when parallel workers get all + * of the pages before we can get a chance to get our first page. + */ +static BlockNumber +heapgettup_initial_block(HeapScanDesc scan, ScanDirection dir) +{ + Assert(!scan->rs_inited); + + /* When there are no pages to scan, return InvalidBlockNumber */ + if (scan->rs_nblocks == 0 || scan->rs_numblocks == 0) + return InvalidBlockNumber; + + if (ScanDirectionIsForward(dir)) + { + /* serial scan */ + if (scan->rs_base.rs_parallel == NULL) + return scan->rs_startblock; + else + { + /* parallel scan */ + table_block_parallelscan_startblock_init(scan->rs_base.rs_rd, + scan->rs_parallelworkerdata, + (ParallelBlockTableScanDesc) scan->rs_base.rs_parallel); + + /* may return InvalidBlockNumber if there are no more blocks */ + return table_block_parallelscan_nextpage(scan->rs_base.rs_rd, + scan->rs_parallelworkerdata, + (ParallelBlockTableScanDesc) scan->rs_base.rs_parallel); + } + } + else + { + /* backward parallel scan not supported */ + Assert(scan->rs_base.rs_parallel == NULL); + + /* + * Disable reporting to syncscan logic in a backwards scan; it's not + * very likely anyone else is doing the same thing at the same time, + * and much more likely that we'll just bollix things for forward + * scanners. + */ + scan->rs_base.rs_flags &= ~SO_ALLOW_SYNC; + + /* + * Start from last page of the scan. Ensure we take into account + * rs_numblocks if it's been adjusted by heap_setscanlimits(). + */ + if (scan->rs_numblocks != InvalidBlockNumber) + return (scan->rs_startblock + scan->rs_numblocks - 1) % scan->rs_nblocks; + + if (scan->rs_startblock > 0) + return scan->rs_startblock - 1; + + return scan->rs_nblocks - 1; + } +} + /* ---------------- * heapgettup - fetch next heap tuple * @@ -527,38 +588,19 @@ heapgettup(HeapScanDesc scan, { if (!scan->rs_inited) { + block = heapgettup_initial_block(scan, dir); + /* - * return null immediately if relation is empty + * Check if we have reached the end of the scan already. This + * could happen if the table is empty or if the parallel workers + * have already finished the scan before we did anything ourselves */ - if (scan->rs_nblocks == 0 || scan->rs_numblocks == 0) + if (block == InvalidBlockNumber) { Assert(!BufferIsValid(scan->rs_cbuf)); tuple->t_data = NULL; return; } - if (scan->rs_base.rs_parallel != NULL) - { - ParallelBlockTableScanDesc pbscan = - (ParallelBlockTableScanDesc) scan->rs_base.rs_parallel; - ParallelBlockTableScanWorker pbscanwork = - scan->rs_parallelworkerdata; - - table_block_parallelscan_startblock_init(scan->rs_base.rs_rd, - pbscanwork, pbscan); - - block = table_block_parallelscan_nextpage(scan->rs_base.rs_rd, - pbscanwork, pbscan); - - /* Other processes might have already finished the scan. */ - if (block == InvalidBlockNumber) - { - Assert(!BufferIsValid(scan->rs_cbuf)); - tuple->t_data = NULL; - return; - } - } - else - block = scan->rs_startblock; /* first page */ heapgetpage((TableScanDesc) scan, block); lineoff = FirstOffsetNumber; /* first offnum */ scan->rs_inited = true; @@ -582,60 +624,40 @@ heapgettup(HeapScanDesc scan, } else { - /* backward parallel scan not supported */ - Assert(scan->rs_base.rs_parallel == NULL); - if (!scan->rs_inited) { + block = heapgettup_initial_block(scan, dir); + /* - * return null immediately if relation is empty + * Check if we have reached the end of the scan already. This + * could happen if the table is empty. */ - if (scan->rs_nblocks == 0 || scan->rs_numblocks == 0) + if (block == InvalidBlockNumber) { Assert(!BufferIsValid(scan->rs_cbuf)); tuple->t_data = NULL; return; } - /* - * Disable reporting to syncscan logic in a backwards scan; it's - * not very likely anyone else is doing the same thing at the same - * time, and much more likely that we'll just bollix things for - * forward scanners. - */ - scan->rs_base.rs_flags &= ~SO_ALLOW_SYNC; - - /* - * Start from last page of the scan. Ensure we take into account - * rs_numblocks if it's been adjusted by heap_setscanlimits(). - */ - if (scan->rs_numblocks != InvalidBlockNumber) - block = (scan->rs_startblock + scan->rs_numblocks - 1) % scan->rs_nblocks; - else if (scan->rs_startblock > 0) - block = scan->rs_startblock - 1; - else - block = scan->rs_nblocks - 1; heapgetpage((TableScanDesc) scan, block); + LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE); + + page = BufferGetPage(scan->rs_cbuf); + TestForOldSnapshot(snapshot, scan->rs_base.rs_rd, page); + lines = PageGetMaxOffsetNumber(page); + lineoff = lines; /* final offnum */ + scan->rs_inited = true; } else { /* continue from previously returned page/tuple */ block = scan->rs_cblock; /* current page */ - } - - LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE); + LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE); - page = BufferGetPage(scan->rs_cbuf); - TestForOldSnapshot(snapshot, scan->rs_base.rs_rd, page); - lines = PageGetMaxOffsetNumber(page); + page = BufferGetPage(scan->rs_cbuf); + TestForOldSnapshot(snapshot, scan->rs_base.rs_rd, page); + lines = PageGetMaxOffsetNumber(page); - if (!scan->rs_inited) - { - lineoff = lines; /* final offnum */ - scan->rs_inited = true; - } - else - { /* * The previous returned tuple may have been vacuumed since the * previous scan when we use a non-MVCC snapshot, so we must @@ -837,38 +859,19 @@ heapgettup_pagemode(HeapScanDesc scan, { if (!scan->rs_inited) { + block = heapgettup_initial_block(scan, dir); + /* - * return null immediately if relation is empty + * Check if we have reached the end of the scan already. This + * could happen if the table is empty or if the parallel workers + * have already finished the scan before we did anything ourselves */ - if (scan->rs_nblocks == 0 || scan->rs_numblocks == 0) + if (block == InvalidBlockNumber) { Assert(!BufferIsValid(scan->rs_cbuf)); tuple->t_data = NULL; return; } - if (scan->rs_base.rs_parallel != NULL) - { - ParallelBlockTableScanDesc pbscan = - (ParallelBlockTableScanDesc) scan->rs_base.rs_parallel; - ParallelBlockTableScanWorker pbscanwork = - scan->rs_parallelworkerdata; - - table_block_parallelscan_startblock_init(scan->rs_base.rs_rd, - pbscanwork, pbscan); - - block = table_block_parallelscan_nextpage(scan->rs_base.rs_rd, - pbscanwork, pbscan); - - /* Other processes might have already finished the scan. */ - if (block == InvalidBlockNumber) - { - Assert(!BufferIsValid(scan->rs_cbuf)); - tuple->t_data = NULL; - return; - } - } - else - block = scan->rs_startblock; /* first page */ heapgetpage((TableScanDesc) scan, block); lineindex = 0; scan->rs_inited = true; @@ -889,62 +892,40 @@ heapgettup_pagemode(HeapScanDesc scan, } else { - /* backward parallel scan not supported */ - Assert(scan->rs_base.rs_parallel == NULL); - if (!scan->rs_inited) { + block = heapgettup_initial_block(scan, dir); + /* - * return null immediately if relation is empty + * Check if we have reached the end of the scan already. This + * could happen if the table is empty. */ - if (scan->rs_nblocks == 0 || scan->rs_numblocks == 0) + if (block == InvalidBlockNumber) { Assert(!BufferIsValid(scan->rs_cbuf)); tuple->t_data = NULL; return; } - /* - * Disable reporting to syncscan logic in a backwards scan; it's - * not very likely anyone else is doing the same thing at the same - * time, and much more likely that we'll just bollix things for - * forward scanners. - */ - scan->rs_base.rs_flags &= ~SO_ALLOW_SYNC; - - /* - * Start from last page of the scan. Ensure we take into account - * rs_numblocks if it's been adjusted by heap_setscanlimits(). - */ - if (scan->rs_numblocks != InvalidBlockNumber) - block = (scan->rs_startblock + scan->rs_numblocks - 1) % scan->rs_nblocks; - else if (scan->rs_startblock > 0) - block = scan->rs_startblock - 1; - else - block = scan->rs_nblocks - 1; heapgetpage((TableScanDesc) scan, block); + page = BufferGetPage(scan->rs_cbuf); + TestForOldSnapshot(scan->rs_base.rs_snapshot, scan->rs_base.rs_rd, page); + lines = scan->rs_ntuples; + lineindex = lines - 1; + scan->rs_inited = true; } else { /* continue from previously returned page/tuple */ block = scan->rs_cblock; /* current page */ - } - page = BufferGetPage(scan->rs_cbuf); - TestForOldSnapshot(scan->rs_base.rs_snapshot, scan->rs_base.rs_rd, page); - lines = scan->rs_ntuples; - - if (!scan->rs_inited) - { - lineindex = lines - 1; - scan->rs_inited = true; - } - else - { + page = BufferGetPage(scan->rs_cbuf); + TestForOldSnapshot(scan->rs_base.rs_snapshot, scan->rs_base.rs_rd, page); + lines = scan->rs_ntuples; lineindex = scan->rs_cindex - 1; } - /* block and lineindex now reference the previous visible tid */ + /* block and lineindex now reference the previous visible tid */ linesleft = lineindex + 1; } -- 2.39.0.windows.1