At Tue, 20 Mar 2018 13:57:19 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyot...@lab.ntt.co.jp> wrote in <20180320.135719.90053076.horiguchi.kyot...@lab.ntt.co.jp> > At Mon, 19 Mar 2018 20:50:48 +0900, Masahiko Sawada <sawada.m...@gmail.com> > wrote in <CAD21AoCfWXcX-po8Q1r779nyVGzs01pwpSLM=u7sx3hv+l+...@mail.gmail.com> > > On Mon, Mar 19, 2018 at 2:45 PM, Kyotaro HORIGUCHI > > <horiguchi.kyot...@lab.ntt.co.jp> wrote: > > > At Mon, 19 Mar 2018 11:12:58 +0900, Masahiko Sawada > > > <sawada.m...@gmail.com> wrote in > > > <CAD21AoAB8tQg9xwojupUJjKD=fmhtx6thdependdhftvlwc...@mail.gmail.com> > > >> On Wed, Mar 14, 2018 at 9:25 PM, Alexander Korotkov > > >> <a.korot...@postgrespro.ru> wrote: > > >> > On Wed, Mar 14, 2018 at 7:40 AM, Masahiko Sawada > > >> > <sawada.m...@gmail.com> > > >> > wrote: > > >> >> > > >> >> On Sat, Mar 10, 2018 at 3:40 AM, Alexander Korotkov > > >> >> <a.korot...@postgrespro.ru> wrote: > > >> >> > On Fri, Mar 9, 2018 at 3:12 PM, Masahiko Sawada > > >> >> > <sawada.m...@gmail.com> > > >> >> > wrote: > > >> >> >> > > >> >> >> On Fri, Mar 9, 2018 at 8:43 AM, Alexander Korotkov > > >> >> >> <a.korot...@postgrespro.ru> wrote: > > >> >> >> > 2) These parameters are reset during btbulkdelete() and set > > >> >> >> > during > > >> >> >> > btvacuumcleanup(). > > >> >> >> > > >> >> >> Can't we set these parameters even during btbulkdelete()? By > > >> >> >> keeping > > >> >> >> them up to date, we will able to avoid an unnecessary cleanup > > >> >> >> vacuums > > >> >> >> even after index bulk-delete. > > >> >> > > > >> >> > > > >> >> > We certainly can update cleanup-related parameters during > > >> >> > btbulkdelete(). > > >> >> > However, in this case we would update B-tree meta-page during each > > >> >> > VACUUM cycle. That may cause some overhead for non append-only > > >> >> > workloads. I don't think this overhead would be sensible, because > > >> >> > in > > >> >> > non append-only scenarios VACUUM typically writes much more of > > >> >> > information. > > >> >> > But I would like this oriented to append-only workload patch to be > > >> >> > as harmless as possible for other workloads. > > >> >> > > >> >> What overhead are you referring here? I guess the overhead is only the > > >> >> calculating the oldest btpo.xact. And I think it would be harmless. > > >> > > > >> > > > >> > I meant overhead of setting last_cleanup_num_heap_tuples after every > > >> > btbulkdelete with wal-logging of meta-page. I bet it also would be > > >> > harmless, but I think that needs some testing. > > >> > > >> Agreed. > > >> > > >> After more thought, it might be too late but we can consider the > > >> possibility of another idea proposed by Peter. Attached patch > > >> addresses the original issue of index cleanups by storing the epoch > > >> number of page deletion XID into PageHeader->pd_prune_xid which is > > >> 4byte field. > > > > > > Mmm. It seems to me that the story is returning to the > > > beginning. Could I try retelling the story? > > > > > > I understant that the initial problem was vacuum runs apparently > > > unnecessary full-scan on indexes many times. The reason for that > > > is the fact that a cleanup scan may leave some (or many under > > > certain condition) dead pages not-recycled but we don't know > > > whether a cleanup is needed or not. They will be staying left > > > forever unless we run additional cleanup-scans at the appropriate > > > timing. > > > > > > (If I understand it correctly,) Sawada-san's latest proposal is > > > (fundamentally the same to the first one,) just skipping the > > > cleanup scan if the vacuum scan just before found that the number > > > of *live* tuples are increased. If there where many deletions and > > > insertions but no increase of total number of tuples, we don't > > > have a cleanup. Consequently it had a wraparound problem and it > > > is addressed in this version. > > > > No, it doesn't have a wraparound problem. The patch based on Peter's > > idea I proposed adds an epoch number of page deletion xid and compare > > them when we judge whether the page is recyclable or not. It's > > something like we create 64-bit xid of deletion xid. Also, if there is > > even one deletion the bulk-delete will be performed instead of the > > index cleanup. So with this patch we do the index cleanup only when > > the reltuple of table is increased by fraction of > > vacuum_index_cleanup_scale_factor from previous stats. It doesn't need > > to do the index cleanup by any xid thresholds. > > > Perhaps you took me wrong. I know the last patch doesn't have (or > at least intends to get rid of ) the problem, and I wrote that > the problem was introduced by your *first* patch. > > > > (ditto.) Alexander proposed to record the oldest xid of > > > recyclable pages in metapage (and the number of tuples at the > > > last cleanup). This prevents needless cleanup scan and surely > > > runs cleanups to remove all recyclable pages. > > > > Yes, but the concerns we discussed are that we need extra WAL-logging > > for updating the metapage and it works only for append-only case. If > > we also want to support more cases we will need to update the metapage > > during bulk-delete. The overhead of WAL-logging would be harmless but > > should be tested as Alexander mentioned. > > Agreed. > > > > I think that we can accept Sawada-san's proposal if we accept the > > > fact that indexes can retain recyclable pages for a long > > > time. (Honestly I don't think so.) > > > > > > If (as I might have mentioned as the same upthread for Yura's > > > patch,) we accept to hold the information on index meta page, > > > Alexander's way would be preferable. The difference betwen Yura's > > > and Alexander's is the former runs cleanup scan if a recyclable > > > page is present but the latter avoids that before any recyclable > > > pages are knwon to be removed. > > > > > >> Comparing to the current proposed patch this patch > > >> doesn't need neither the page upgrade code nor extra WAL-logging. If > > > > > > # By the way, my proposal was storing the information as Yura > > > # proposed into stats collector. The information maybe be > > > # available a bit lately, but it doesn't harm. This doesn't need > > > # extra WAL logging nor the upgrad code:p > > > > > >> we also want to address cases other than append-only case we will > > > > > > I'm afraid that "the problem for the other cases" is a new one > > > that this patch introduces, not an existing one. > > > > I meant that the current Alexandor's proposal works for append-only > > table. If we want to support other cases we have to update metapage > > during bulk-delete, which assumes that bulk-delete always scans whole > > index. > > True. Currently no patches so far gets rid of the whole-cleanup-scan. > > > >> require the bulk-delete method of scanning whole index and of logging > > >> WAL. But it leads some extra overhead. With this patch we no longer > > >> need to depend on the full scan on b-tree index. This might be useful > > >> for a future when we make the bulk-delete of b-tree index not scan > > >> whole index. > > > > > > Perhaps I'm taking something incorrectly, but is it just the > > > result of skipping 'maybe needed' scans without condiering the > > > actual necessity? > > > > I meant to scan only index pages that are relevant with garbages TIDs > > on a table. The current b-tree index bulk-deletion is very slow and > > heavy because we always scans the whole index even if there is only 1 > > dead tuples in a table. To address this problem I'm thinking a way to > > make bulk-delete not scan whole index if there is a few dead tuples in > > a table. That is, we do index scans to collect the stack of btree > > pages and reclaim garbage. Maybe we will full index scan if there are > > a lot of dead tuples, which would be same as what we're doing on > > planning access paths. > > Yeah, that seems good! A possible problem of that is that the > pages we want to recycle in a cleanup scan can *not* be only them > that have found to be recyclable in the vacuum-scan just > before. When we leave some recyclable pages in a cleanup scan, we > should do whole-scan at the next chance if we don't have the TID > list (or in other smaller form, or just the number of recyclable > pages?) at the time.
"the number of recyclable pages?" is wrong. It doesn't work to avoid whole-scan. It just indicates whether the "this" round of cleanup needs whole-scan or not. regards, -- Kyotaro Horiguchi NTT Open Source Software Center