On Wed, Dec 9, 2020 at 5:12 PM Peter Geoghegan <p...@bowt.ie> wrote: > Most of the real changes in v11 (compared to v10) are in heapam.c. > I've completely replaced the table_compute_xid_horizon_for_tuples() > interface with a new interface that supports all existing requirements > (from index deletions that support LP_DEAD deletion), while also > supporting these new requirements (e.g. bottom-up index deletion).
I came up with a microbenchmark that is designed to give some general sense of how helpful it is to include "extra" TIDs alongside LP_DEAD-in-index TIDs (when they happen to point to the same table block as the LP_DEAD-in-index TIDs, which we'll have to visit anyway). The basic idea is simple: Add custom instrumentation that summarizes each B-Tree index deletion in LOG output, and then run the regression tests. Attached in the result of this for a "make installcheck-world". If you're just looking at this thread for the first time in a while: note that what I'm about to go into isn't technically bottom-up deletion (though you will see some of that in the full log output if you look). It's a closely related optimization that only appeared in recent versions of the patch. So I'm comparing the current approach to simple deletion of LP_DEAD-marked index tuples to a new enhanced approach (that makes it a little more like bottom-up deletion, but only a little). Here is some sample output (selected almost at random from a text file consisting of 889 lines of similar output): ... exact TIDs deleted 17, LP_DEAD tuples 4, LP_DEAD-related table blocks 2 ) ... exact TIDs deleted 38, LP_DEAD tuples 28, LP_DEAD-related table blocks 1 ) ... exact TIDs deleted 39, LP_DEAD tuples 1, LP_DEAD-related table blocks 1 ) ... exact TIDs deleted 44, LP_DEAD tuples 42, LP_DEAD-related table blocks 3 ) ... exact TIDs deleted 6, LP_DEAD tuples 2, LP_DEAD-related table blocks 2 ) (The initial contents of each line were snipped here, to focus on the relevant metrics.) Here we see that the actual number of TIDs/index tuples deleted often *vastly* exceeds the number of LP_DEAD-marked tuples (which are all we would have been able to delete with the existing approach of just deleting LP_DEAD items). It's pretty rare for us to fail to at least delete a couple of extra TIDs. Clearly this technique is broadly effective, because in practice there are significant locality-related effects that we can benefit from. It doesn't really matter that it's hard to precisely describe all of these locality effects. IMV the question that really matters is whether or not the cost of trying is consistently very low (relative to the cost of our current approach to simple LP_DEAD deletion). We do need to understand that fully. It's tempting to think about this quantitatively (and it also bolsters the case for the patch), but that misses the point. The right way to think about this is as a *qualitative* thing. The presence of LP_DEAD bits gives us certain reliable information about the nature of the index tuple (that it is dead-to-all, and therefore safe to delete), but it also *suggests* quite a lot more than that. In practice bloat is usually highly correlated/concentrated, especially when we limit our consideration to workloads where bloat is noticeable in any practical sense. So we're very well advised to look for nearby deletable index tuples in passing -- especially since it's practically free to do so. (Which is what the patch does here, of course.) Let's compare this to an extreme variant of my patch that runs the same test, to see what changes. Consider a variant that exhaustively checks every index tuple on the page at the point of a simple LP_DEAD deletion operation, no matter what. Rather than only including those TIDs that happen to be on the same heap/table blocks (and thus are practically free to check), we include all of them. This design isn't acceptable in the real world because it does a lot of unnecessary I/O, but that shouldn't invalidate any comparison I make here. This is still a reasonable approximation of a version of the patch with magical foreknowledge of where to find dead TIDs. It's an Oracle (ahem) that we can sensibly compare to the real patch within certain constraints. The results of this comparative analysis seem to suggest something important about the general nature of what's going on. The results are: There are only 844 deletion operations total with the Oracle. While this is less than the actual patch's 889 deletion operations, you would expect a bigger improvement from using what is after all supposed to apply magic! This suggests to me that the precise intervention of the patch here (the new LP_DEAD deletion stuff) has an outsized impact. The correlations that naturally exist make this asymmetrical payoff-to-cost situation possible. Simple cheap heuristics once again go a surprisingly long way, kind of like bottom-up deletion itself. -- Peter Geoghegan
delete_stats_regression_tests.txt.gz
Description: application/gzip