On Thu, Sep 15, 2016 at 4:04 AM, Jeff Janes <jeff.ja...@gmail.com> wrote: > On Tue, Sep 13, 2016 at 9:31 AM, Jeff Janes <jeff.ja...@gmail.com> wrote: >> >> ======= >> >> +Vacuum acquires cleanup lock on bucket to remove the dead tuples and or >> tuples >> +that are moved due to split. The need for cleanup lock to remove dead >> tuples >> +is to ensure that scans' returns correct results. Scan that returns >> multiple >> +tuples from the same bucket page always restart the scan from the >> previous >> +offset number from which it has returned last tuple. >> >> Perhaps it would be better to teach scans to restart anywhere on the page, >> than to force more cleanup locks to be taken? > > > Commenting on one of my own questions: > > This won't work when the vacuum removes the tuple which an existing scan is > currently examining and thus will be used to re-find it's position when it > realizes it is not visible and so takes up the scan again. > > The index tuples in a page are stored sorted just by hash value, not by the > combination of (hash value, tid). If they were sorted by both, we could > re-find our position even if the tuple had been removed, because we would > know to start at the slot adjacent to where the missing tuple would be were > it not removed. But unless we are willing to break pg_upgrade, there is no > feasible way to change that now. >
I think it is possible without breaking pg_upgrade, if we match all items of a page at once (and save them as local copy), rather than matching item-by-item as we do now. We are already doing similar for btree, refer explanation of BTScanPosItem and BTScanPosData in nbtree.h. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers