On Mon, Sep 8, 2014 at 12:08 PM, Alexander Korotkov <[email protected]> wrote:
> On Mon, Sep 8, 2014 at 11:13 AM, Heikki Linnakangas < > [email protected]> wrote: > >> On 09/07/2014 05:11 PM, Костя Кузнецов wrote: >> >>> hello. >>> i recode vacuum for gist index. >>> all tests is ok. >>> also i test vacuum on table size 2 million rows. all is ok. >>> on my machine old vaccum work about 9 second. this version work about >>> 6-7 sec . >>> review please. >>> >> >> If I'm reading this correctly, the patch changes gistbulkdelete to scan >> the index in physical order, while the old code starts from the root and >> scans the index from left to right, in logical order. >> >> Scanning the index in physical order is wrong, if any index pages are >> split while vacuum runs. A page split could move some tuples to a >> lower-numbered page, so that the vacuum will not scan those tuples. >> >> In the b-tree code, we solved that problem back in 2006, so it can be >> done but requires a bit more code. In b-tree, we solved it with a "vacuum >> cycle ID" number that's set on the page halves when a page is split. That >> allows VACUUM to identify pages that have been split concurrently sees >> them, and "jump back" to vacuum them too. See commit >> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h= >> 5749f6ef0cc1c67ef9c9ad2108b3d97b82555c80. It should be possible to do >> something similar in GiST, and in fact you might be able to reuse the NSN >> field that's already set on the page halves on split, instead of adding a >> new "vacuum cycle ID". > > > Idea is right. But in fact, does GiST ever recycle any page? It has > F_DELETED flag, but ISTM this flag is never set. So, I think it's possible > that this patch is working correctly. However, probably GiST sometimes > leaves new page unused due to server crash. > Anyway, I'm not fan of committing patch in this shape. We need to let GiST > recycle pages first, then implement VACUUM similar to b-tree. > Another note. Assuming we have NSN which can play the role of "vacuum cycle ID", can we implement sequential (with possible "jump back") index scan for GiST? ------ With best regards, Alexander Korotkov.
