Hi, On Wed, May 10, 2017 at 2:28 PM, Ashutosh Sharma <ashu.coe...@gmail.com> wrote: > While doing the code coverage testing of v7 patch shared with - [1], I > found that there are few lines of code in _hash_next() that are > redundant and needs to be removed. I basically came to know this while > testing the scenario where a hash index scan starts when a split is in > progress. I have removed those lines and attached is the newer version > of patch. >
Please find the new version of patches for page at a time scan in hash index rebased on top of latest commit in master branch. Also, i have ran pgindent script with pg_bsd_indent version 2.0 on all the modified files. Thanks. > [1] - > https://www.postgresql.org/message-id/CAE9k0Pmn92Le0iOTU5b6087eRXzUnkq1PKcihxtokHJ9boXCBg%40mail.gmail.com > > On Mon, May 8, 2017 at 6:55 PM, Ashutosh Sharma <ashu.coe...@gmail.com> wrote: >> Hi, >> >> On Tue, Apr 4, 2017 at 3:56 PM, Amit Kapila <amit.kapil...@gmail.com> wrote: >>> On Sun, Apr 2, 2017 at 4:14 AM, Ashutosh Sharma <ashu.coe...@gmail.com> >>> wrote: >>>> >>>> Please note that these patches needs to be applied on top of [1]. >>>> >>> >>> Few more review comments: >>> >>> 1. >>> - page = BufferGetPage(so->hashso_curbuf); >>> + blkno = so->currPos.currPage; >>> + if (so->hashso_bucket_buf == so->currPos.buf) >>> + { >>> + buf = so->currPos.buf; >>> + LockBuffer(buf, BUFFER_LOCK_SHARE); >>> + page = BufferGetPage(buf); >>> + } >>> >>> Here, you are assuming that only bucket page can be pinned, but I >>> think that assumption is not right. When _hash_kill_items() is called >>> before moving to next page, there could be a pin on the overflow page. >>> You need some logic to check if the buffer is pinned, then just lock >>> it. I think once you do that, it might not be convinient to release >>> the pin at the end of this function. >> >> Yes, there are few cases where we might have pin on overflow pages too >> and we need to handle such cases in _hash_kill_items. I have taken >> care of this in the attached v7 patch. Thanks. >> >>> >>> Add some comments on top of _hash_kill_items to explain the new >>> changes or say some thing like "See _bt_killitems for details" >> >> Added some more comments on the new changes. >> >>> >>> 2. >>> + /* >>> + * We save the LSN of the page as we read it, so that we know whether it >>> + * safe to apply LP_DEAD hints to the page later. This allows us to drop >>> + * the pin for MVCC scans, which allows vacuum to avoid blocking. >>> + */ >>> + so->currPos.lsn = PageGetLSN(page); >>> + >>> >>> The second part of above comment doesn't make sense because vacuum's >>> will still be blocked because we hold the pin on primary bucket page. >> >> That's right. It doesn't make sense because we won't allow vacuum to >> start. I have removed it. >> >>> >>> 3. >>> + { >>> + /* >>> + * No more matching tuples were found. return FALSE >>> + * indicating the same. Also, remember the prev and >>> + * next block number so that if fetching tuples using >>> + * cursor we remember the page from where to start the >>> + * scan. >>> + */ >>> + so->currPos.prevPage = (opaque)->hasho_prevblkno; >>> + so->currPos.nextPage = (opaque)->hasho_nextblkno; >>> >>> You can't read opaque without having lock and by this time it has >>> already been released. >> >> I have corrected it. Please refer to the attached v7 patch. >> >> Also, I think if you want to save position for >>> cursor movement, then you need to save the position of last bucket >>> when scan completes the overflow chain, however as you have written it >>> will be always invalid block number. I think there is similar problem >>> with prevblock number. >> >> Did you mean last bucket or last page. If it is last page, then I am >> already storing it. >> >>> >>> 4. >>> +_hash_load_qualified_items(IndexScanDesc scan, Page page, OffsetNumber >>> offnum, >>> + OffsetNumber maxoff, ScanDirection dir) >>> +{ >>> + HashScanOpaque so = (HashScanOpaque) scan->opaque; >>> + IndexTuple itup; >>> + int itemIndex; >>> + >>> + if (ScanDirectionIsForward(dir)) >>> + { >>> + /* load items[] in ascending order */ >>> + itemIndex = 0; >>> + >>> + /* new page, relocate starting position by binary search */ >>> + offnum = _hash_binsearch(page, so->hashso_sk_hash); >>> >>> What is the need to find offset number when this function already has >>> that as an input parameter? >> >> It's not required. I have removed it. >> >>> >>> 5. >>> +_hash_load_qualified_items(IndexScanDesc scan, Page page, OffsetNumber >>> offnum, >>> + OffsetNumber maxoff, ScanDirection dir) >>> >>> I think maxoff is not required to be passed a parameter to this >>> function as you need it only for forward scan. You can compute it >>> locally. >> >> It is required for both forward and backward scan. However, I am not >> passing it to _hash_load_qualified_items() now. >> >>> >>> 6. >>> +_hash_load_qualified_items(IndexScanDesc scan, Page page, OffsetNumber >>> offnum, >>> + OffsetNumber maxoff, ScanDirection dir) >>> { >>> .. >>> + if (ScanDirectionIsForward(dir)) >>> + { >>> .. >>> + while (offnum <= maxoff) >>> { >>> .. >>> + if (so->hashso_sk_hash == _hash_get_indextuple_hashkey(itup) && >>> + _hash_checkqual(scan, itup)) >>> + { >>> + /* tuple is qualified, so remember it */ >>> + _hash_saveitem(so, itemIndex, offnum, itup); >>> + itemIndex++; >>> + } >>> + >>> + offnum = OffsetNumberNext(offnum); >>> .. >>> >>> Why are you traversing the whole page even when there is no match? >>> There is a similar problem in backward scan. I think this is blunder. >> >> Fixed. Please check the attached >> '0001-Rewrite-hash-index-scans-to-work-a-page-at-a-timev7.patch' >> >>> >>> 7. >>> + if (so->currPos.buf == so->hashso_bucket_buf || >>> + so->currPos.buf == so->hashso_split_bucket_buf) >>> + { >>> + so->currPos.prevPage = InvalidBlockNumber; >>> + LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK); >>> + } >>> + else >>> + { >>> + so->currPos.prevPage = (opaque)->hasho_prevblkno; >>> + _hash_relbuf(rel, so->currPos.buf); >>> + } >>> + >>> + so->currPos.nextPage = (opaque)->hasho_nextblkno; >>> >>> What makes you think it is safe read opaque after releasing the lock? >> >> Nothing makes me think to read opaque after releasing lock. It's a >> mistake. I have corrected it. Please check attached v7 patch. >> >> -- >> With Regards, >> Ashutosh Sharma >> EnterpriseDB:http://www.enterprisedb.com
From a155bed1a1510d030f2a004a91002274139d3b9a Mon Sep 17 00:00:00 2001 From: ashu <ashutosh1...@example.com> Date: Sun, 30 Jul 2017 11:48:38 +0530 Subject: [PATCH] Rewrite hash index scan to work page at a timev9 Patch by Ashutosh Sharma <ashu.coe...@gmail.com> --- src/backend/access/hash/README | 25 +- src/backend/access/hash/hash.c | 153 +++--------- src/backend/access/hash/hashpage.c | 10 +- src/backend/access/hash/hashsearch.c | 446 +++++++++++++++++++++++++++++++---- src/backend/access/hash/hashutil.c | 71 +++++- src/include/access/hash.h | 50 +++- 6 files changed, 561 insertions(+), 194 deletions(-) diff --git a/src/backend/access/hash/README b/src/backend/access/hash/README index c8a0ec7..eef7d66 100644 --- a/src/backend/access/hash/README +++ b/src/backend/access/hash/README @@ -259,10 +259,11 @@ The reader algorithm is: -- then, per read request: reacquire content lock on current page step to next page if necessary (no chaining of content locks, but keep - the pin on the primary bucket throughout the scan; we also maintain - a pin on the page currently being scanned) - get tuple - release content lock + the pin on the primary bucket throughout the scan) + save all the matching tuples from current index page into an items array + release pin and content lock (but if it is primary bucket page retain + it's pin till the end of scan) + get tuple from an item array -- at scan shutdown: release all pins still held @@ -270,15 +271,13 @@ Holding the buffer pin on the primary bucket page for the whole scan prevents the reader's current-tuple pointer from being invalidated by splits or compactions. (Of course, other buckets can still be split or compacted.) -To keep concurrency reasonably good, we require readers to cope with -concurrent insertions, which means that they have to be able to re-find -their current scan position after re-acquiring the buffer content lock on -page. Since deletion is not possible while a reader holds the pin on bucket, -and we assume that heap tuple TIDs are unique, this can be implemented by -searching for the same heap tuple TID previously returned. Insertion does -not move index entries across pages, so the previously-returned index entry -should always be on the same page, at the same or higher offset number, -as it was before. +To minimize lock/unlock traffic, hash index scan always searches entire hash +page to identify all the matching items at once, copying their heap tuple IDs +into backend-local storage. The heap tuple IDs are then processed while not +holding any page lock within the index thereby, allowing concurrent insertion +to happen on a same index page without any requirement of re-finding the current +scan position for reader. We do continue to hold a pin on the bucket page, to +protect against concurrent deletions and bucket split. To allow for scans during a bucket split, if at the start of the scan, the bucket is marked as bucket-being-populated, it scan all the tuples in that diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c index d89c192..2b858f0 100644 --- a/src/backend/access/hash/hash.c +++ b/src/backend/access/hash/hash.c @@ -268,66 +268,22 @@ bool hashgettuple(IndexScanDesc scan, ScanDirection dir) { HashScanOpaque so = (HashScanOpaque) scan->opaque; - Relation rel = scan->indexRelation; - Buffer buf; - Page page; - OffsetNumber offnum; - ItemPointer current; bool res; + HashScanPosItem *currItem; /* Hash indexes are always lossy since we store only the hash code */ scan->xs_recheck = true; /* - * We hold pin but not lock on current buffer while outside the hash AM. - * Reacquire the read lock here. - */ - if (BufferIsValid(so->hashso_curbuf)) - LockBuffer(so->hashso_curbuf, BUFFER_LOCK_SHARE); - - /* * If we've already initialized this scan, we can just advance it in the * appropriate direction. If we haven't done so yet, we call a routine to * get the first item in the scan. */ - current = &(so->hashso_curpos); - if (ItemPointerIsValid(current)) + if (!HashScanPosIsValid(so->currPos)) + res = _hash_first(scan, dir); + else { /* - * An insertion into the current index page could have happened while - * we didn't have read lock on it. Re-find our position by looking - * for the TID we previously returned. (Because we hold a pin on the - * primary bucket page, no deletions or splits could have occurred; - * therefore we can expect that the TID still exists in the current - * index page, at an offset >= where we were.) - */ - OffsetNumber maxoffnum; - - buf = so->hashso_curbuf; - Assert(BufferIsValid(buf)); - page = BufferGetPage(buf); - - /* - * We don't need test for old snapshot here as the current buffer is - * pinned, so vacuum can't clean the page. - */ - maxoffnum = PageGetMaxOffsetNumber(page); - for (offnum = ItemPointerGetOffsetNumber(current); - offnum <= maxoffnum; - offnum = OffsetNumberNext(offnum)) - { - IndexTuple itup; - - itup = (IndexTuple) PageGetItem(page, PageGetItemId(page, offnum)); - if (ItemPointerEquals(&(so->hashso_heappos), &(itup->t_tid))) - break; - } - if (offnum > maxoffnum) - elog(ERROR, "failed to re-find scan position within index \"%s\"", - RelationGetRelationName(rel)); - ItemPointerSetOffsetNumber(current, offnum); - - /* * Check to see if we should kill the previously-fetched tuple. */ if (scan->kill_prior_tuple) @@ -341,16 +297,11 @@ hashgettuple(IndexScanDesc scan, ScanDirection dir) * entries. */ if (so->killedItems == NULL) - so->killedItems = palloc(MaxIndexTuplesPerPage * - sizeof(HashScanPosItem)); + so->killedItems = (int *) + palloc(MaxIndexTuplesPerPage * sizeof(int)); if (so->numKilled < MaxIndexTuplesPerPage) - { - so->killedItems[so->numKilled].heapTid = so->hashso_heappos; - so->killedItems[so->numKilled].indexOffset = - ItemPointerGetOffsetNumber(&(so->hashso_curpos)); - so->numKilled++; - } + so->killedItems[so->numKilled++] = so->currPos.itemIndex; } /* @@ -358,30 +309,10 @@ hashgettuple(IndexScanDesc scan, ScanDirection dir) */ res = _hash_next(scan, dir); } - else - res = _hash_first(scan, dir); - - /* - * Skip killed tuples if asked to. - */ - if (scan->ignore_killed_tuples) - { - while (res) - { - offnum = ItemPointerGetOffsetNumber(current); - page = BufferGetPage(so->hashso_curbuf); - if (!ItemIdIsDead(PageGetItemId(page, offnum))) - break; - res = _hash_next(scan, dir); - } - } - - /* Release read lock on current buffer, but keep it pinned */ - if (BufferIsValid(so->hashso_curbuf)) - LockBuffer(so->hashso_curbuf, BUFFER_LOCK_UNLOCK); /* Return current heap TID on success */ - scan->xs_ctup.t_self = so->hashso_heappos; + currItem = &so->currPos.items[so->currPos.itemIndex]; + scan->xs_ctup.t_self = currItem->heapTid; return res; } @@ -396,35 +327,21 @@ hashgetbitmap(IndexScanDesc scan, TIDBitmap *tbm) HashScanOpaque so = (HashScanOpaque) scan->opaque; bool res; int64 ntids = 0; + HashScanPosItem *currItem; res = _hash_first(scan, ForwardScanDirection); while (res) { - bool add_tuple; + currItem = &so->currPos.items[so->currPos.itemIndex]; /* - * Skip killed tuples if asked to. + * _hash_first() or _hash_next() never returns dead tuples. Therefore, + * we can always add the tuples into TIDBitmap without checking if a + * tuple is dead or not. */ - if (scan->ignore_killed_tuples) - { - Page page; - OffsetNumber offnum; - - offnum = ItemPointerGetOffsetNumber(&(so->hashso_curpos)); - page = BufferGetPage(so->hashso_curbuf); - add_tuple = !ItemIdIsDead(PageGetItemId(page, offnum)); - } - else - add_tuple = true; - - /* Save tuple ID, and continue scanning */ - if (add_tuple) - { - /* Note we mark the tuple ID as requiring recheck */ - tbm_add_tuples(tbm, &(so->hashso_heappos), 1, true); - ntids++; - } + tbm_add_tuples(tbm, &(currItem->heapTid), 1, true); + ntids++; res = _hash_next(scan, ForwardScanDirection); } @@ -448,12 +365,9 @@ hashbeginscan(Relation rel, int nkeys, int norderbys) scan = RelationGetIndexScan(rel, nkeys, norderbys); so = (HashScanOpaque) palloc(sizeof(HashScanOpaqueData)); - so->hashso_curbuf = InvalidBuffer; + HashScanPosInvalidate(so->currPos); so->hashso_bucket_buf = InvalidBuffer; so->hashso_split_bucket_buf = InvalidBuffer; - /* set position invalid (this will cause _hash_first call) */ - ItemPointerSetInvalid(&(so->hashso_curpos)); - ItemPointerSetInvalid(&(so->hashso_heappos)); so->hashso_buc_populated = false; so->hashso_buc_split = false; @@ -476,22 +390,16 @@ hashrescan(IndexScanDesc scan, ScanKey scankey, int nscankeys, HashScanOpaque so = (HashScanOpaque) scan->opaque; Relation rel = scan->indexRelation; - /* - * Before leaving current page, deal with any killed items. Also, ensure - * that we acquire lock on current page before calling _hash_kill_items. - */ - if (so->numKilled > 0) + if (HashScanPosIsValid(so->currPos)) { - LockBuffer(so->hashso_curbuf, BUFFER_LOCK_SHARE); - _hash_kill_items(scan); - LockBuffer(so->hashso_curbuf, BUFFER_LOCK_UNLOCK); + /* Before leaving current page, deal with any killed items */ + if (so->numKilled > 0) + _hash_kill_items(scan, false); + _hash_dropscanbuf(rel, so); } - _hash_dropscanbuf(rel, so); - /* set position invalid (this will cause _hash_first call) */ - ItemPointerSetInvalid(&(so->hashso_curpos)); - ItemPointerSetInvalid(&(so->hashso_heappos)); + HashScanPosInvalidate(so->currPos); /* Update scan key, if a new one is given */ if (scankey && scan->numberOfKeys > 0) @@ -514,19 +422,14 @@ hashendscan(IndexScanDesc scan) HashScanOpaque so = (HashScanOpaque) scan->opaque; Relation rel = scan->indexRelation; - /* - * Before leaving current page, deal with any killed items. Also, ensure - * that we acquire lock on current page before calling _hash_kill_items. - */ - if (so->numKilled > 0) + if (HashScanPosIsValid(so->currPos)) { - LockBuffer(so->hashso_curbuf, BUFFER_LOCK_SHARE); - _hash_kill_items(scan); - LockBuffer(so->hashso_curbuf, BUFFER_LOCK_UNLOCK); + /* Before leaving current page, deal with any killed items */ + if (so->numKilled > 0) + _hash_kill_items(scan, false); + _hash_dropscanbuf(rel, so); } - _hash_dropscanbuf(rel, so); - if (so->killedItems != NULL) pfree(so->killedItems); pfree(so); diff --git a/src/backend/access/hash/hashpage.c b/src/backend/access/hash/hashpage.c index d5b6502..90e1e55 100644 --- a/src/backend/access/hash/hashpage.c +++ b/src/backend/access/hash/hashpage.c @@ -298,20 +298,20 @@ _hash_dropscanbuf(Relation rel, HashScanOpaque so) { /* release pin we hold on primary bucket page */ if (BufferIsValid(so->hashso_bucket_buf) && - so->hashso_bucket_buf != so->hashso_curbuf) + so->hashso_bucket_buf != so->currPos.buf) _hash_dropbuf(rel, so->hashso_bucket_buf); so->hashso_bucket_buf = InvalidBuffer; /* release pin we hold on primary bucket page of bucket being split */ if (BufferIsValid(so->hashso_split_bucket_buf) && - so->hashso_split_bucket_buf != so->hashso_curbuf) + so->hashso_split_bucket_buf != so->currPos.buf) _hash_dropbuf(rel, so->hashso_split_bucket_buf); so->hashso_split_bucket_buf = InvalidBuffer; /* release any pin we still hold */ - if (BufferIsValid(so->hashso_curbuf)) - _hash_dropbuf(rel, so->hashso_curbuf); - so->hashso_curbuf = InvalidBuffer; + if (BufferIsValid(so->currPos.buf)) + _hash_dropbuf(rel, so->currPos.buf); + so->currPos.buf = InvalidBuffer; /* reset split scan */ so->hashso_buc_populated = false; diff --git a/src/backend/access/hash/hashsearch.c b/src/backend/access/hash/hashsearch.c index 3e461ad..85ab86d 100644 --- a/src/backend/access/hash/hashsearch.c +++ b/src/backend/access/hash/hashsearch.c @@ -20,44 +20,108 @@ #include "pgstat.h" #include "utils/rel.h" +static bool _hash_readpage(IndexScanDesc scan, Buffer *bufP, + ScanDirection dir); +static int _hash_load_qualified_items(IndexScanDesc scan, Page page, + OffsetNumber offnum, ScanDirection dir); +static inline void _hash_saveitem(HashScanOpaque so, int itemIndex, + OffsetNumber offnum, IndexTuple itup); +static void _hash_readnext(IndexScanDesc scan, Buffer *bufp, + Page *pagep, HashPageOpaque *opaquep); /* * _hash_next() -- Get the next item in a scan. * - * On entry, we have a valid hashso_curpos in the scan, and a - * pin and read lock on the page that contains that item. - * We find the next item in the scan, if any. - * On success exit, we have the page containing the next item - * pinned and locked. + * On entry, so->currPos describes the current page, which may + * be pinned but not locked, and so->currPos.itemIndex identifies + * which item was previously returned. + * + * On successful exit, scan->xs_ctup.t_self is set to the TID + * of the next heap tuple, and if requested, scan->xs_itup + * points to a copy of the index tuple. so->currPos is updated + * as needed. + * + * On failure exit (no more tuples), we return FALSE with no + * pins or locks held. */ bool _hash_next(IndexScanDesc scan, ScanDirection dir) { Relation rel = scan->indexRelation; HashScanOpaque so = (HashScanOpaque) scan->opaque; + HashScanPosItem *currItem; + BlockNumber blkno; Buffer buf; - Page page; - OffsetNumber offnum; - ItemPointer current; - IndexTuple itup; - - /* we still have the buffer pinned and read-locked */ - buf = so->hashso_curbuf; - Assert(BufferIsValid(buf)); + bool end_of_scan = false; /* - * step to next valid tuple. + * Advance to next tuple on current page; or if there's no more, try to + * read data from next or prev page based on the scan direction. Before + * moving to the next or prev page make sure that we deal with all the + * killed items. */ - if (!_hash_step(scan, &buf, dir)) + if (ScanDirectionIsForward(dir)) + { + if (++so->currPos.itemIndex > so->currPos.lastItem) + { + if (so->numKilled > 0) + _hash_kill_items(scan, false); + + blkno = so->currPos.nextPage; + if (BlockNumberIsValid(blkno)) + { + buf = _hash_getbuf(rel, blkno, HASH_READ, LH_OVERFLOW_PAGE); + so->currPos.buf = buf; + TestForOldSnapshot(scan->xs_snapshot, rel, BufferGetPage(buf)); + if (!_hash_readpage(scan, &buf, dir)) + end_of_scan = true; + } + else + end_of_scan = true; + } + } + else + { + if (--so->currPos.itemIndex < so->currPos.firstItem) + { + if (so->numKilled > 0) + _hash_kill_items(scan, false); + + blkno = so->currPos.prevPage; + if (BlockNumberIsValid(blkno)) + { + buf = _hash_getbuf(rel, blkno, HASH_READ, + LH_BUCKET_PAGE | LH_OVERFLOW_PAGE); + so->currPos.buf = buf; + TestForOldSnapshot(scan->xs_snapshot, rel, BufferGetPage(buf)); + + /* + * We always maintain the pin on bucket page for whole scan + * operation, so releasing the additional pin we have acquired + * here. + */ + if (buf == so->hashso_bucket_buf || + buf == so->hashso_split_bucket_buf) + _hash_dropbuf(rel, buf); + + if (!_hash_readpage(scan, &buf, dir)) + end_of_scan = true; + } + else + end_of_scan = true; + } + } + + if (end_of_scan) + { + _hash_dropscanbuf(rel, so); + HashScanPosInvalidate(so->currPos); return false; + } - /* if we're here, _hash_step found a valid tuple */ - current = &(so->hashso_curpos); - offnum = ItemPointerGetOffsetNumber(current); - _hash_checkpage(rel, buf, LH_BUCKET_PAGE | LH_OVERFLOW_PAGE); - page = BufferGetPage(buf); - itup = (IndexTuple) PageGetItem(page, PageGetItemId(page, offnum)); - so->hashso_heappos = itup->t_tid; + /* OK, itemIndex says what to return */ + currItem = &so->currPos.items[so->currPos.itemIndex]; + scan->xs_ctup.t_self = currItem->heapTid; return true; } @@ -212,11 +276,15 @@ _hash_readprev(IndexScanDesc scan, /* * _hash_first() -- Find the first item in a scan. * - * Find the first item in the index that - * satisfies the qualification associated with the scan descriptor. On - * success, the page containing the current index tuple is read locked - * and pinned, and the scan's opaque data entry is updated to - * include the buffer. + * We find the first item (or, if backward scan, the last item) in + * the index that satisfies the qualification associated with the + * scan descriptor. On success, the page containing the current + * index tuple is read locked and pinned, and data about the + * matching tuple(s) on the page has been loaded into so->currPos, + * scan->xs_ctup.t_self is set to the heap TID of the current tuple. + * + * If there are no matching items in the index, we return FALSE, + * with no pins or locks held. */ bool _hash_first(IndexScanDesc scan, ScanDirection dir) @@ -229,15 +297,9 @@ _hash_first(IndexScanDesc scan, ScanDirection dir) Buffer buf; Page page; HashPageOpaque opaque; - IndexTuple itup; - ItemPointer current; - OffsetNumber offnum; pgstat_count_index_scan(rel); - current = &(so->hashso_curpos); - ItemPointerSetInvalid(current); - /* * We do not support hash scans with no index qualification, because we * would have to read the whole index rather than just one bucket. That @@ -356,17 +418,15 @@ _hash_first(IndexScanDesc scan, ScanDirection dir) _hash_readnext(scan, &buf, &page, &opaque); } - /* Now find the first tuple satisfying the qualification */ - if (!_hash_step(scan, &buf, dir)) - return false; + /* remember which buffer we have pinned, if any */ + Assert(BufferIsInvalid(so->currPos.buf)); + so->currPos.buf = buf; - /* if we're here, _hash_step found a valid tuple */ - offnum = ItemPointerGetOffsetNumber(current); - _hash_checkpage(rel, buf, LH_BUCKET_PAGE | LH_OVERFLOW_PAGE); - page = BufferGetPage(buf); - itup = (IndexTuple) PageGetItem(page, PageGetItemId(page, offnum)); - so->hashso_heappos = itup->t_tid; + /* Now find all the tuples satisfying the qualification from a page */ + if (!_hash_readpage(scan, &buf, dir)) + return false; + /* if we're here, _hash_readpage found a valid tuples */ return true; } @@ -467,7 +527,7 @@ _hash_step(IndexScanDesc scan, Buffer *bufP, ScanDirection dir) /* Before leaving current page, deal with any killed items */ if (so->numKilled > 0) - _hash_kill_items(scan); + _hash_kill_items(scan, true); /* * ran off the end of this page, try the next @@ -524,7 +584,7 @@ _hash_step(IndexScanDesc scan, Buffer *bufP, ScanDirection dir) /* Before leaving current page, deal with any killed items */ if (so->numKilled > 0) - _hash_kill_items(scan); + _hash_kill_items(scan, true); /* * ran off the end of this page, try the next @@ -575,3 +635,301 @@ _hash_step(IndexScanDesc scan, Buffer *bufP, ScanDirection dir) ItemPointerSet(current, blkno, offnum); return true; } + +/* + * _hash_readpage() -- Load data from current index page into so->currPos + * + * We scan all the items in the current index page and save them into + * so->currPos if it satifies the qualification. If no matching items + * are found in the current page, we move to the next or previous page + * in a bucket chain as indicated by the direction. + * + * Return true if any matching items are found else return false. + */ +static bool +_hash_readpage(IndexScanDesc scan, Buffer *bufP, ScanDirection dir) +{ + Relation rel = scan->indexRelation; + HashScanOpaque so = (HashScanOpaque) scan->opaque; + Buffer buf; + Page page; + HashPageOpaque opaque; + OffsetNumber offnum; + uint16 itemIndex; + + so->currPos.currPage = BufferGetBlockNumber(so->currPos.buf); + + buf = *bufP; + Assert(BufferIsValid(buf)); + _hash_checkpage(rel, buf, LH_BUCKET_PAGE | LH_OVERFLOW_PAGE); + page = BufferGetPage(buf); + opaque = (HashPageOpaque) PageGetSpecialPointer(page); + + /* + * We save the LSN of the page as we read it, so that we know whether it + * safe to apply LP_DEAD hints to the page later. + */ + so->currPos.lsn = PageGetLSN(page); + + if (ScanDirectionIsForward(dir)) + { + /* new page, locate starting position by binary search */ + offnum = _hash_binsearch(page, so->hashso_sk_hash); + + itemIndex = _hash_load_qualified_items(scan, page, offnum, dir); + + while (itemIndex == 0) + { + /* + * Could not find any matching tuples in the current page, move to + * the next page. Before leaving the current page, also deal with + * any killed items. + */ + if (so->numKilled > 0) + _hash_kill_items(scan, true); + + /* + * We remember prev and next block number along with current block + * number so that if fetching the tup- les using cursor we know + * the page from where to startwith. This is for the case where we + * have re- ached the end of bucket chain without finding any + * matching tuples. See comments in else part below. + */ + if (!BlockNumberIsValid((opaque)->hasho_nextblkno)) + { + so->currPos.prevPage = (opaque)->hasho_prevblkno; + so->currPos.nextPage = InvalidBlockNumber; + } + + _hash_readnext(scan, &buf, &page, &opaque); + if (BufferIsValid(buf)) + { + so->currPos.buf = buf; + so->currPos.currPage = BufferGetBlockNumber(buf); + so->currPos.lsn = PageGetLSN(page); + offnum = _hash_binsearch(page, so->hashso_sk_hash); + itemIndex = _hash_load_qualified_items(scan, page, + offnum, dir); + } + else + { + /* + * No more matching tuples were found. return FALSE indicating + * the same. + */ + so->currPos.buf = buf; + return false; + } + } + + if (so->currPos.buf == so->hashso_bucket_buf || + so->currPos.buf == so->hashso_split_bucket_buf) + { + so->currPos.prevPage = InvalidBlockNumber; + so->currPos.nextPage = (opaque)->hasho_nextblkno; + LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK); + } + else + { + so->currPos.prevPage = (opaque)->hasho_prevblkno; + so->currPos.nextPage = (opaque)->hasho_nextblkno; + _hash_relbuf(rel, so->currPos.buf); + so->currPos.buf = InvalidBuffer; + } + + so->currPos.firstItem = 0; + so->currPos.lastItem = itemIndex - 1; + so->currPos.itemIndex = 0; + } + else + { + /* new page, locate starting position by binary search */ + offnum = _hash_binsearch_last(page, so->hashso_sk_hash); + + itemIndex = _hash_load_qualified_items(scan, page, offnum, dir); + + while (itemIndex == MaxIndexTuplesPerPage) + { + /* + * Could not find any matching tuples in the current page, move to + * the prev page. Before leaving the current page, also deal with + * any killed items. + */ + if (so->numKilled > 0) + _hash_kill_items(scan, true); + + /* + * We remember prev and next block number along with current block + * number so that if fetching the tup- les using cursor we know + * the page from where to startwith. This is for the case where we + * have re- ached the bucket page without finding any matching + * tuples. See comments in else part below. + */ + if (so->currPos.buf == so->hashso_bucket_buf || + so->currPos.buf == so->hashso_split_bucket_buf) + { + so->currPos.prevPage = InvalidBlockNumber; + so->currPos.nextPage = (opaque)->hasho_nextblkno; + } + + _hash_readprev(scan, &buf, &page, &opaque); + if (BufferIsValid(buf)) + { + so->currPos.buf = buf; + so->currPos.currPage = BufferGetBlockNumber(buf); + so->currPos.lsn = PageGetLSN(page); + offnum = _hash_binsearch_last(page, so->hashso_sk_hash); + itemIndex = _hash_load_qualified_items(scan, page, + offnum, dir); + } + else + { + /* + * No more matching tuples were found. return FALSE indicating + * the same. + */ + so->currPos.buf = buf; + return false; + } + } + + if (so->currPos.buf == so->hashso_bucket_buf || + so->currPos.buf == so->hashso_split_bucket_buf) + { + so->currPos.prevPage = InvalidBlockNumber; + so->currPos.nextPage = (opaque)->hasho_nextblkno; + LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK); + } + else + { + so->currPos.prevPage = (opaque)->hasho_prevblkno; + so->currPos.nextPage = (opaque)->hasho_nextblkno; + _hash_relbuf(rel, so->currPos.buf); + so->currPos.buf = InvalidBuffer; + } + + so->currPos.firstItem = itemIndex; + so->currPos.lastItem = MaxIndexTuplesPerPage - 1; + so->currPos.itemIndex = MaxIndexTuplesPerPage - 1; + } + + return (so->currPos.firstItem <= so->currPos.lastItem); +} + +/* + * Load all the qualified items from a current index page + * into so->currPos. Helper function for _hash_readpage. + */ +static int +_hash_load_qualified_items(IndexScanDesc scan, Page page, + OffsetNumber offnum, ScanDirection dir) +{ + HashScanOpaque so = (HashScanOpaque) scan->opaque; + IndexTuple itup; + int itemIndex; + OffsetNumber maxoff; + + maxoff = PageGetMaxOffsetNumber(page); + + if (ScanDirectionIsForward(dir)) + { + /* load items[] in ascending order */ + itemIndex = 0; + + while (offnum <= maxoff) + { + Assert(offnum >= FirstOffsetNumber); + itup = (IndexTuple) PageGetItem(page, PageGetItemId(page, offnum)); + + /* + * skip the tuples that are moved by split operation for the scan + * that has started when split was in progress. Also, skip the + * tuples that are marked as dead. + */ + if ((so->hashso_buc_populated && !so->hashso_buc_split && + (itup->t_info & INDEX_MOVED_BY_SPLIT_MASK)) || + (scan->ignore_killed_tuples && + (ItemIdIsDead(PageGetItemId(page, offnum))))) + { + offnum = OffsetNumberNext(offnum); /* move forward */ + continue; + } + + if (so->hashso_sk_hash == _hash_get_indextuple_hashkey(itup) && + _hash_checkqual(scan, itup)) + { + /* tuple is qualified, so remember it */ + _hash_saveitem(so, itemIndex, offnum, itup); + itemIndex++; + } + else + + /* + * No more matching tuples exist in this page. so, exit while + * loop. + */ + break; + + offnum = OffsetNumberNext(offnum); + } + + Assert(itemIndex <= MaxIndexTuplesPerPage); + return itemIndex; + } + else + { + /* load items[] in descending order */ + itemIndex = MaxIndexTuplesPerPage; + + while (offnum >= FirstOffsetNumber) + { + Assert(offnum <= maxoff); + itup = (IndexTuple) PageGetItem(page, PageGetItemId(page, offnum)); + + /* + * skip the tuples that are moved by split operation for the scan + * that has started when split was in progress. Also, skip the + * tuples that are marked as dead. + */ + if ((so->hashso_buc_populated && !so->hashso_buc_split && + (itup->t_info & INDEX_MOVED_BY_SPLIT_MASK)) || + (scan->ignore_killed_tuples && + (ItemIdIsDead(PageGetItemId(page, offnum))))) + { + offnum = OffsetNumberPrev(offnum); /* move back */ + continue; + } + + if (so->hashso_sk_hash == _hash_get_indextuple_hashkey(itup) && + _hash_checkqual(scan, itup)) + { + itemIndex--; + /* tuple is qualified, so remember it */ + _hash_saveitem(so, itemIndex, offnum, itup); + } + else + + /* + * No more matching tuples exist in this page. so, exit while + * loop. + */ + break; + + offnum = OffsetNumberPrev(offnum); + } + + Assert(itemIndex >= 0); + return itemIndex; + } +} + +/* Save an index item into so->currPos.items[itemIndex] */ +static inline void +_hash_saveitem(HashScanOpaque so, int itemIndex, + OffsetNumber offnum, IndexTuple itup) +{ + HashScanPosItem *currItem = &so->currPos.items[itemIndex]; + + currItem->heapTid = itup->t_tid; + currItem->indexOffset = offnum; +} diff --git a/src/backend/access/hash/hashutil.c b/src/backend/access/hash/hashutil.c index 9b803af..5ae8713 100644 --- a/src/backend/access/hash/hashutil.c +++ b/src/backend/access/hash/hashutil.c @@ -522,13 +522,28 @@ _hash_get_newbucket_from_oldbucket(Relation rel, Bucket old_bucket, * current page and killed tuples thereon (generally, this should only be * called if so->numKilled > 0). * + * The caller does not have a lock on the page and may or may not have the + * page pinned in a buffer. Note that read-lock is sufficient for setting + * LP_DEAD status (which is only a hint). + * + * The caller must have pin on bucket buffer, but may or may not have pin + * on overflow buffer, as indicated by havePin. + * * We match items by heap TID before assuming they are the right ones to * delete. + * + * Note that we keep pin on the bucket page throughout the scan. Hence, + * there is no chance of VACUUM deleting any items from the page. + * + * See _bt_killitems() for more details. */ void -_hash_kill_items(IndexScanDesc scan) +_hash_kill_items(IndexScanDesc scan, bool havePin) { HashScanOpaque so = (HashScanOpaque) scan->opaque; + Relation rel = scan->indexRelation; + BlockNumber blkno; + Buffer buf; Page page; HashPageOpaque opaque; OffsetNumber offnum, @@ -539,6 +554,7 @@ _hash_kill_items(IndexScanDesc scan) Assert(so->numKilled > 0); Assert(so->killedItems != NULL); + Assert(HashScanPosIsValid(so->currPos)); /* * Always reset the scan state, so we don't look for same items on other @@ -546,20 +562,59 @@ _hash_kill_items(IndexScanDesc scan) */ so->numKilled = 0; - page = BufferGetPage(so->hashso_curbuf); + blkno = so->currPos.currPage; + if (so->hashso_bucket_buf == so->currPos.buf) + { + buf = so->currPos.buf; + LockBuffer(buf, BUFFER_LOCK_SHARE); + page = BufferGetPage(buf); + } + else + { + if (!havePin) + buf = _hash_getbuf(rel, blkno, HASH_READ, LH_OVERFLOW_PAGE); + else + { + buf = so->currPos.buf; + LockBuffer(buf, BUFFER_LOCK_SHARE); + } + + /* It might not exist anymore; in which case we can't hint it. */ + if (!BufferIsValid(buf)) + return; + + /* + * If page LSN differs it means that the page was modified since the + * last read. killedItems could be not valid so LP_DEAD hints apply- + * ing is not safe. + */ + page = BufferGetPage(buf); + if (PageGetLSN(page) != so->currPos.lsn) + { + _hash_relbuf(rel, buf); + return; + } + } + opaque = (HashPageOpaque) PageGetSpecialPointer(page); maxoff = PageGetMaxOffsetNumber(page); for (i = 0; i < numKilled; i++) { - offnum = so->killedItems[i].indexOffset; + int itemIndex = so->killedItems[i]; + HashScanPosItem *currItem = &so->currPos.items[itemIndex]; + + offnum = currItem->indexOffset; + + Assert(itemIndex >= so->currPos.firstItem && + itemIndex <= so->currPos.lastItem); while (offnum <= maxoff) { ItemId iid = PageGetItemId(page, offnum); IndexTuple ituple = (IndexTuple) PageGetItem(page, iid); - if (ItemPointerEquals(&ituple->t_tid, &so->killedItems[i].heapTid)) + if (ItemPointerEquals(&ituple->t_tid, &currItem->heapTid)) { /* found the item */ ItemIdMarkDead(iid); @@ -578,6 +633,12 @@ _hash_kill_items(IndexScanDesc scan) if (killedsomething) { opaque->hasho_flag |= LH_PAGE_HAS_DEAD_TUPLES; - MarkBufferDirtyHint(so->hashso_curbuf, true); + MarkBufferDirtyHint(buf, true); } + + if (so->hashso_bucket_buf == so->currPos.buf || + havePin) + LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK); + else + _hash_relbuf(rel, buf); } diff --git a/src/include/access/hash.h b/src/include/access/hash.h index 7fa868b..9335679 100644 --- a/src/include/access/hash.h +++ b/src/include/access/hash.h @@ -103,6 +103,46 @@ typedef struct HashScanPosItem /* what we remember about each match */ OffsetNumber indexOffset; /* index item's location within page */ } HashScanPosItem; +typedef struct HashScanPosData +{ + Buffer buf; /* if valid, the buffer is pinned */ + XLogRecPtr lsn; /* pos in the WAL stream when page was read */ + BlockNumber currPage; /* current hash index page */ + BlockNumber nextPage; /* next overflow page */ + BlockNumber prevPage; /* prev overflow or bucket page */ + + /* + * The items array is always ordered in index order (ie, increasing + * indexoffset). When scanning backwards it is convenient to fill the + * array back-to-front, so we start at the last slot and fill downwards. + * Hence we need both a first-valid-entry and a last-valid-entry counter. + * itemIndex is a cursor showing which entry was last returned to caller. + */ + int firstItem; /* first valid index in items[] */ + int lastItem; /* last valid index in items[] */ + int itemIndex; /* current index in items[] */ + + HashScanPosItem items[MaxIndexTuplesPerPage]; /* MUST BE LAST */ +} HashScanPosData; + +#define HashScanPosIsValid(scanpos) \ +( \ + AssertMacro(BlockNumberIsValid((scanpos).currPage) || \ + !BufferIsValid((scanpos).buf)), \ + BlockNumberIsValid((scanpos).currPage) \ +) + +#define HashScanPosInvalidate(scanpos) \ + do { \ + (scanpos).buf = InvalidBuffer; \ + (scanpos).lsn = InvalidXLogRecPtr; \ + (scanpos).currPage = InvalidBlockNumber; \ + (scanpos).nextPage = InvalidBlockNumber; \ + (scanpos).prevPage = InvalidBlockNumber; \ + (scanpos).firstItem = 0; \ + (scanpos).lastItem = 0; \ + (scanpos).itemIndex = 0; \ + } while (0); /* * HashScanOpaqueData is private state for a hash index scan. @@ -145,8 +185,14 @@ typedef struct HashScanOpaqueData */ bool hashso_buc_split; /* info about killed items if any (killedItems is NULL if never used) */ - HashScanPosItem *killedItems; /* tids and offset numbers of killed items */ + int *killedItems; /* currPos.items indexes of killed items */ int numKilled; /* number of currently stored items */ + + /* + * Identify all the matching items on a page and save them in + * HashScanPosData + */ + HashScanPosData currPos; /* current position data */ } HashScanOpaqueData; typedef HashScanOpaqueData *HashScanOpaque; @@ -411,7 +457,7 @@ extern BlockNumber _hash_get_oldblock_from_newbucket(Relation rel, Bucket new_bu extern BlockNumber _hash_get_newblock_from_oldbucket(Relation rel, Bucket old_bucket); extern Bucket _hash_get_newbucket_from_oldbucket(Relation rel, Bucket old_bucket, uint32 lowmask, uint32 maxbucket); -extern void _hash_kill_items(IndexScanDesc scan); +extern void _hash_kill_items(IndexScanDesc scan, bool havePin); /* hash.c */ extern void hashbucketcleanup(Relation rel, Bucket cur_bucket, -- 1.8.3.1
From 72f36b0fea59114b73a38622f34fb8d4e35a4731 Mon Sep 17 00:00:00 2001 From: ashu <ashutosh1...@example.com> Date: Sun, 30 Jul 2017 12:49:34 +0530 Subject: [PATCH] Remove redundant function _hash_step() and some of the unused members of HashScanOpaqueData. The function _hash_step() used to find the next qualifing tuple in the index page is no more required as new hash index works page at a time which means it reads all the qualifing tuples in a page at once with the help of _hash_readpage(). Patch by Ashutosh Sharma <ashu.coe...@gmail.com> --- src/backend/access/hash/hashsearch.c | 206 ----------------------------------- src/include/access/hash.h | 15 --- 2 files changed, 221 deletions(-) diff --git a/src/backend/access/hash/hashsearch.c b/src/backend/access/hash/hashsearch.c index 85ab86d..2aa0b29 100644 --- a/src/backend/access/hash/hashsearch.c +++ b/src/backend/access/hash/hashsearch.c @@ -431,212 +431,6 @@ _hash_first(IndexScanDesc scan, ScanDirection dir) } /* - * _hash_step() -- step to the next valid item in a scan in the bucket. - * - * If no valid record exists in the requested direction, return - * false. Else, return true and set the hashso_curpos for the - * scan to the right thing. - * - * Here we need to ensure that if the scan has started during split, then - * skip the tuples that are moved by split while scanning bucket being - * populated and then scan the bucket being split to cover all such - * tuples. This is done to ensure that we don't miss tuples in the scans - * that are started during split. - * - * 'bufP' points to the current buffer, which is pinned and read-locked. - * On success exit, we have pin and read-lock on whichever page - * contains the right item; on failure, we have released all buffers. - */ -bool -_hash_step(IndexScanDesc scan, Buffer *bufP, ScanDirection dir) -{ - Relation rel = scan->indexRelation; - HashScanOpaque so = (HashScanOpaque) scan->opaque; - ItemPointer current; - Buffer buf; - Page page; - HashPageOpaque opaque; - OffsetNumber maxoff; - OffsetNumber offnum; - BlockNumber blkno; - IndexTuple itup; - - current = &(so->hashso_curpos); - - buf = *bufP; - _hash_checkpage(rel, buf, LH_BUCKET_PAGE | LH_OVERFLOW_PAGE); - page = BufferGetPage(buf); - opaque = (HashPageOpaque) PageGetSpecialPointer(page); - - /* - * If _hash_step is called from _hash_first, current will not be valid, so - * we can't dereference it. However, in that case, we presumably want to - * start at the beginning/end of the page... - */ - maxoff = PageGetMaxOffsetNumber(page); - if (ItemPointerIsValid(current)) - offnum = ItemPointerGetOffsetNumber(current); - else - offnum = InvalidOffsetNumber; - - /* - * 'offnum' now points to the last tuple we examined (if any). - * - * continue to step through tuples until: 1) we get to the end of the - * bucket chain or 2) we find a valid tuple. - */ - do - { - switch (dir) - { - case ForwardScanDirection: - if (offnum != InvalidOffsetNumber) - offnum = OffsetNumberNext(offnum); /* move forward */ - else - { - /* new page, locate starting position by binary search */ - offnum = _hash_binsearch(page, so->hashso_sk_hash); - } - - for (;;) - { - /* - * check if we're still in the range of items with the - * target hash key - */ - if (offnum <= maxoff) - { - Assert(offnum >= FirstOffsetNumber); - itup = (IndexTuple) PageGetItem(page, PageGetItemId(page, offnum)); - - /* - * skip the tuples that are moved by split operation - * for the scan that has started when split was in - * progress - */ - if (so->hashso_buc_populated && !so->hashso_buc_split && - (itup->t_info & INDEX_MOVED_BY_SPLIT_MASK)) - { - offnum = OffsetNumberNext(offnum); /* move forward */ - continue; - } - - if (so->hashso_sk_hash == _hash_get_indextuple_hashkey(itup)) - break; /* yes, so exit for-loop */ - } - - /* Before leaving current page, deal with any killed items */ - if (so->numKilled > 0) - _hash_kill_items(scan, true); - - /* - * ran off the end of this page, try the next - */ - _hash_readnext(scan, &buf, &page, &opaque); - if (BufferIsValid(buf)) - { - maxoff = PageGetMaxOffsetNumber(page); - offnum = _hash_binsearch(page, so->hashso_sk_hash); - } - else - { - itup = NULL; - break; /* exit for-loop */ - } - } - break; - - case BackwardScanDirection: - if (offnum != InvalidOffsetNumber) - offnum = OffsetNumberPrev(offnum); /* move back */ - else - { - /* new page, locate starting position by binary search */ - offnum = _hash_binsearch_last(page, so->hashso_sk_hash); - } - - for (;;) - { - /* - * check if we're still in the range of items with the - * target hash key - */ - if (offnum >= FirstOffsetNumber) - { - Assert(offnum <= maxoff); - itup = (IndexTuple) PageGetItem(page, PageGetItemId(page, offnum)); - - /* - * skip the tuples that are moved by split operation - * for the scan that has started when split was in - * progress - */ - if (so->hashso_buc_populated && !so->hashso_buc_split && - (itup->t_info & INDEX_MOVED_BY_SPLIT_MASK)) - { - offnum = OffsetNumberPrev(offnum); /* move back */ - continue; - } - - if (so->hashso_sk_hash == _hash_get_indextuple_hashkey(itup)) - break; /* yes, so exit for-loop */ - } - - /* Before leaving current page, deal with any killed items */ - if (so->numKilled > 0) - _hash_kill_items(scan, true); - - /* - * ran off the end of this page, try the next - */ - _hash_readprev(scan, &buf, &page, &opaque); - if (BufferIsValid(buf)) - { - TestForOldSnapshot(scan->xs_snapshot, rel, page); - maxoff = PageGetMaxOffsetNumber(page); - offnum = _hash_binsearch_last(page, so->hashso_sk_hash); - } - else - { - itup = NULL; - break; /* exit for-loop */ - } - } - break; - - default: - /* NoMovementScanDirection */ - /* this should not be reached */ - itup = NULL; - break; - } - - if (itup == NULL) - { - /* - * We ran off the end of the bucket without finding a match. - * Release the pin on bucket buffers. Normally, such pins are - * released at end of scan, however scrolling cursors can - * reacquire the bucket lock and pin in the same scan multiple - * times. - */ - *bufP = so->hashso_curbuf = InvalidBuffer; - ItemPointerSetInvalid(current); - _hash_dropscanbuf(rel, so); - return false; - } - - /* check the tuple quals, loop around if not met */ - } while (!_hash_checkqual(scan, itup)); - - /* if we made it to here, we've found a valid tuple */ - blkno = BufferGetBlockNumber(buf); - *bufP = so->hashso_curbuf = buf; - ItemPointerSet(current, blkno, offnum); - return true; -} - -/* * _hash_readpage() -- Load data from current index page into so->currPos * * We scan all the items in the current index page and save them into diff --git a/src/include/access/hash.h b/src/include/access/hash.h index 9335679..c11e87e 100644 --- a/src/include/access/hash.h +++ b/src/include/access/hash.h @@ -152,14 +152,6 @@ typedef struct HashScanOpaqueData /* Hash value of the scan key, ie, the hash key we seek */ uint32 hashso_sk_hash; - /* - * We also want to remember which buffer we're currently examining in the - * scan. We keep the buffer pinned (but not locked) across hashgettuple - * calls, in order to avoid doing a ReadBuffer() for every tuple in the - * index. - */ - Buffer hashso_curbuf; - /* remember the buffer associated with primary bucket */ Buffer hashso_bucket_buf; @@ -170,12 +162,6 @@ typedef struct HashScanOpaqueData */ Buffer hashso_split_bucket_buf; - /* Current position of the scan, as an index TID */ - ItemPointerData hashso_curpos; - - /* Current position of the scan, as a heap TID */ - ItemPointerData hashso_heappos; - /* Whether scan starts on bucket being populated due to split */ bool hashso_buc_populated; @@ -426,7 +412,6 @@ extern void _hash_finish_split(Relation rel, Buffer metabuf, Buffer obuf, /* hashsearch.c */ extern bool _hash_next(IndexScanDesc scan, ScanDirection dir); extern bool _hash_first(IndexScanDesc scan, ScanDirection dir); -extern bool _hash_step(IndexScanDesc scan, Buffer *bufP, ScanDirection dir); /* hashsort.c */ typedef struct HSpool HSpool; /* opaque struct in hashsort.c */ -- 1.8.3.1
From e09ccbce2aa3388db37ec72e3c02e9593bbed4f9 Mon Sep 17 00:00:00 2001 From: ashu <ashutosh1...@example.com> Date: Sun, 30 Jul 2017 12:37:24 +0530 Subject: [PATCH] Improve locking startegy during VACUUM in Hash Index v4 Patch by Ashutosh Sharma <ashu.coe...@gmail.com> --- src/backend/access/hash/README | 2 +- src/backend/access/hash/hash.c | 21 ++++++++++----------- src/backend/access/hash/hashovfl.c | 4 +--- 3 files changed, 12 insertions(+), 15 deletions(-) diff --git a/src/backend/access/hash/README b/src/backend/access/hash/README index eef7d66..34a84ce 100644 --- a/src/backend/access/hash/README +++ b/src/backend/access/hash/README @@ -396,8 +396,8 @@ The fourth operation is garbage collection (bulk deletion): mark the target page dirty write WAL for deleting tuples from target page if this is the last bucket page, break out of loop - pin and x-lock next page release prior lock and pin (except keep pin on primary bucket page) + pin and x-lock next page if the page we have locked is not the primary bucket page: release lock and take exclusive lock on primary bucket page if there are no other pins on the primary bucket page: diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c index 2b858f0..3d68af5 100644 --- a/src/backend/access/hash/hash.c +++ b/src/backend/access/hash/hash.c @@ -663,11 +663,9 @@ hashvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats) * that the next valid TID will be greater than or equal to the current * valid TID. There can't be any concurrent scans in progress when we first * enter this function because of the cleanup lock we hold on the primary - * bucket page, but as soon as we release that lock, there might be. We - * handle that by conspiring to prevent those scans from passing our cleanup - * scan. To do that, we lock the next page in the bucket chain before - * releasing the lock on the previous page. (This type of lock chaining is - * not ideal, so we might want to look for a better solution at some point.) + * bucket page, but as soon as we release that lock, there might be. But, + * we do not have to bother about it, as the hash index scan work in page + * at a time mode. * * We need to retain a pin on the primary bucket to ensure that no concurrent * split can start. @@ -836,19 +834,20 @@ hashbucketcleanup(Relation rel, Bucket cur_bucket, Buffer bucket_buf, if (!BlockNumberIsValid(blkno)) break; - next_buf = _hash_getbuf_with_strategy(rel, blkno, HASH_WRITE, - LH_OVERFLOW_PAGE, - bstrategy); - /* - * release the lock on previous page after acquiring the lock on next - * page + * As the hash index scan work in page at a time mode, vacuum can + * release the lock on previous page before acquiring lock on the next + * page. */ if (retain_pin) LockBuffer(buf, BUFFER_LOCK_UNLOCK); else _hash_relbuf(rel, buf); + next_buf = _hash_getbuf_with_strategy(rel, blkno, HASH_WRITE, + LH_OVERFLOW_PAGE, + bstrategy); + buf = next_buf; } diff --git a/src/backend/access/hash/hashovfl.c b/src/backend/access/hash/hashovfl.c index c206e70..3a7011d 100644 --- a/src/backend/access/hash/hashovfl.c +++ b/src/backend/access/hash/hashovfl.c @@ -790,9 +790,7 @@ _hash_initbitmapbuffer(Buffer buf, uint16 bmsize, bool initpage) * Caller must acquire cleanup lock on the primary page of the target * bucket to exclude any scans that are in progress, which could easily * be confused into returning the same tuple more than once or some tuples - * not at all by the rearrangement we are performing here. To prevent - * any concurrent scan to cross the squeeze scan we use lock chaining - * similar to hasbucketcleanup. Refer comments atop hashbucketcleanup. + * not at all by the rearrangement we are performing here. * * We need to retain a pin on the primary bucket to ensure that no concurrent * split can start. -- 1.8.3.1
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers