On Fri, Oct 21, 2022 at 2:26 AM Kyotaro Horiguchi <horikyota....@gmail.com> wrote: > Stickier buffers for index pages seems to be related. I haven't see it > even get started, though. But this might be able be an additional > reason for starting it.
Maybe, but FWIW I think that that will mostly just need to distinguish leaf pages from heap pages (and mostly ignore internal pages). Within each index, internal pages are typically no more than a fraction of 1% of all pages. There are already so few internal pages that it seems very likely that they're practically guaranteed to be cached already. There is a huge asymmetry in how pages are naturally accessed, which justifies treating them as fundamentally different things. Separating leaf pages from internal pages for instrumentation purposes is valuable because it allows the DBA to completely *ignore* internal pages. Internal pages are accessed far far more frequently than leaf pages. In effect, internal pages add "noise" to the instrumentation, obscuring the useful "signal" that the DBA should focus on (by considering leaf level hits and misses in isolation). So the value is from "removing noise", not from "adding signal". You only need about 1% of the memory required to cache a big index to get a "hit rate" of 75% (assuming you don't have a workload that's very scan heavy, which would be unusual). Obviously the standard naive definition of "index hit rate" isn't particularly useful. > > While I do agree that it would be nice to record information about the > > number of deletion operations per index, that information will still > > be tricky to interpret and act upon relative to other kinds of > > information. As a general rule, we should prefer to focus on signals > > that show things really aren't going well in some specific and > > unambiguous way. Signals about things that are going well seem harder > > to work with -- they don't generalize well. > > I think some statistics can be pure-internal purpose. We can maintain > some statistics hidden from users, if we want. (However, I think > people will request for the numbers to be revealed, finally..) It will probably be easy to add information about index tuple deletions, without almost no downside, so of course we should do it. My point was just that it's probably not the single most informative thing that could be instrumented to help users to understand index bloat. It's just much easier to understand what's not working than what is going well. It's a stronger and more informative signal. > > With VACUUM in particular the picture over time can be far easier to > > work with than any given snapshot, from any single VACUUM operation. > > Focusing on how things seem to be changing can make it a lot easier to > > spot concerning trends, especially if you're a non-expert. > > Agreed. It seem like a kind of easy (low-hanging) one. I'll give it a > try. There should be some other numbers that timeseries stats are > useful. Great! There probably is some way that VACUUM itself will ultimately use this information to decide what to do. For example, if we go too long without doing any index vacuuming, we might want to do it despite the fact that there are relatively few LP_DEAD items in heap pages. I don't think that we need to worry too much about how VACUUM itself might apply the same information for now, but it's something that you might want to consider. -- Peter Geoghegan