On Tue, Jan 19, 2021 at 3:03 AM Peter Geoghegan <p...@bowt.ie> wrote: > > On Mon, Jan 18, 2021 at 1:10 PM Victor Yegorov <vyego...@gmail.com> wrote: > > I must admit, that it's a bit difficult to understand you here (at least > > for me). > > > > I assume that by "bet" you mean flagged tuple, that we marked as such > > (should we implement the suggested case). > > As heapam will give up early in case there are no deletable tuples, why do > > say, > > that "bet" is no longer low? At the end, we still decide between page split > > (and > > index bloat) vs a beneficial space cleanup. > > Well, as I said, there are various ways in which our inferences (say > the ones in nbtdedup.c) are likely to be wrong. You understand this > already. For example, obviously if there are two duplicate index > tuples pointing to the same heap page then it's unlikely that both > will be deletable, and there is even a fair chance that neither will > be (for many reasons). I think that it's important to justify why we > use stuff like that to drive our decisions -- the reasoning really > matters. It's very much not like the usual optimization problem thing. > It's a tricky thing to discuss. > > I don't assume that I understand all workloads, or how I might > introduce regressions. It follows that I should be extremely > conservative about imposing new costs here. It's good that we > currently know of no workloads that the patch is likely to regress, > but absence of evidence isn't evidence of absence. >
The worst cases could be (a) when there is just one such duplicate (indexval logically unchanged) on the page and that happens to be the last item and others are new insertions, (b) same as (a) and along with it lets say there is an open transaction due to which we can't remove even that duplicate version. Have we checked the overhead or results by simulating such workloads? I feel unlike LP_DEAD optimization this new bottom-up scheme can cost us extra CPU and I/O because there seems to be not much consideration given to the fact that we might not be able to delete any item (or very few) due to long-standing open transactions except that we limit ourselves when we are not able to remove even one tuple from any particular heap page. Now, say due to open transactions, we are able to remove very few tuples (for the sake of argument say there is only 'one' such tuple) from the heap page then we will keep on accessing the heap pages without much benefit. I feel extending the deletion mechanism based on the number of LP_DEAD items sounds more favorable than giving preference to duplicate items. Sure, it will give equally good or better results if there are no long-standing open transactions. > I personally will > never vote for a theoretical risk with only a theoretical benefit. And > right now that's what the idea of doing bottom-up deletions in more > marginal cases (the page flag thing) looks like. > I don't think we can say that it is purely theoretical because I have dome shown some basic computation where it can lead to fewer splits. -- With Regards, Amit Kapila.