On Thu, Jan 21, 2021 at 11:24 PM Masahiko Sawada <sawada.m...@gmail.com> wrote: > > On Wed, Jan 20, 2021 at 7:58 AM Peter Geoghegan <p...@bowt.ie> wrote: > > > > On Sun, Jan 17, 2021 at 9:18 PM Masahiko Sawada <sawada.m...@gmail.com> > > wrote: > > > After more thought, I think that ambulkdelete needs to be able to > > > refer the answer to amvacuumstrategy. That way, the index can skip > > > bulk-deletion when lazy vacuum doesn't vacuum heap and it also doesn’t > > > want to do that. > > > > Makes sense. > > > > BTW, your patch has bitrot already. Peter E's recent pageinspect > > commit happens to conflict with this patch. It might make sense to > > produce a new version that just fixes the bitrot, so that other people > > don't have to deal with it each time. > > > > > I’ve attached the updated version patch that includes the following > > > changes: > > > > Looks good. I'll give this version a review now. I will do a lot more > > soon. I need to come up with a good benchmark for this, that I can > > return to again and again during review as needed. > > Thank you for reviewing the patches. > > > > > Some feedback on the first patch: > > > > * Just so you know: I agree with you about handling > > VACOPT_TERNARY_DISABLED in the index AM's amvacuumstrategy routine. I > > think that it's better to do that there, even though this choice may > > have some downsides. > > > > * Can you add some "stub" sgml doc changes for this? Doesn't have to > > be complete in any way. Just a placeholder for later, that has the > > correct general "shape" to orientate the reader of the patch. It can > > just be a FIXME comment, plus basic mechanical stuff -- details of the > > new amvacuumstrategy_function routine and its signature. > > > > 0002 patch had the doc update (I mistakenly included it to 0002 > patch). Is that update what you meant? > > > Some feedback on the second patch: > > > > * Why do you move around IndexVacuumStrategy in the second patch? > > Looks like a rebasing oversight. > > Check. > > > > > * Actually, do we really need the first and second patches to be > > separate patches? I agree that the nbtree patch should be a separate > > patch, but dividing the first two sets of changes doesn't seem like it > > adds much. Did I miss some something? > > I separated the patches since I used to have separate patches when > adding other index AM options required by parallel vacuum. But I > agreed to merge the first two patches. > > > > > * Is the "MAXALIGN(sizeof(ItemIdData)))" change in the definition of > > MaxHeapTuplesPerPage appropriate? Here is the relevant section from > > the patch: > > > > diff --git a/src/include/access/htup_details.h > > b/src/include/access/htup_details.h > > index 7c62852e7f..038e7cd580 100644 > > --- a/src/include/access/htup_details.h > > +++ b/src/include/access/htup_details.h > > @@ -563,17 +563,18 @@ do { \ > > /* > > * MaxHeapTuplesPerPage is an upper bound on the number of tuples that can > > * fit on one heap page. (Note that indexes could have more, because they > > - * use a smaller tuple header.) We arrive at the divisor because each > > tuple > > - * must be maxaligned, and it must have an associated line pointer. > > + * use a smaller tuple header.) We arrive at the divisor because each line > > + * pointer must be maxaligned. > > *** SNIP *** > > #define MaxHeapTuplesPerPage \ > > - ((int) ((BLCKSZ - SizeOfPageHeaderData) / \ > > - (MAXALIGN(SizeofHeapTupleHeader) + sizeof(ItemIdData)))) > > + ((int) ((BLCKSZ - SizeOfPageHeaderData) / > > (MAXALIGN(sizeof(ItemIdData))))) > > > > It's true that ItemIdData structs (line pointers) are aligned, but > > they're not MAXALIGN()'d. If they were then the on-disk size of line > > pointers would generally be 8 bytes, not 4 bytes. > > You're right. Will fix. > > > > > * Maybe it would be better if you just changed the definition such > > that "MAXALIGN(SizeofHeapTupleHeader)" became "MAXIMUM_ALIGNOF", with > > no other changes? (Some variant of this suggestion might be better, > > not sure.) > > > > For some reason that feels a bit safer: we still have an "imaginary > > tuple header", but it's just 1 MAXALIGN() quantum now. This is still > > much less than the current 3 MAXALIGN() quantums (i.e. what > > MaxHeapTuplesPerPage treats as the tuple header size). Do you think > > that this alternative approach will be noticeably less effective > > within vacuumlazy.c? > > > > Note that you probably understand the issue with MaxHeapTuplesPerPage > > for vacuumlazy.c better than I do currently. I'm still trying to > > understand your choices, and to understand what is really important > > here. > > Yeah, using MAXIMUM_ALIGNOF seems better for safety. I shared my > thoughts on the issue with MaxHeapTuplesPerPage yesterday. I think we > need to discuss how to deal with that. > > > > > * Maybe add a #define for the value 0.7? (I refer to the value used in > > choose_vacuum_strategy() to calculate a "this is the number of LP_DEAD > > line pointers that we consider too many" cut off point, which is to be > > applied throughout lazy_scan_heap() processing.) > > > > Agreed. > > > * I notice that your new lazy_vacuum_table_and_indexes() function is > > the only place that calls lazy_vacuum_table_and_indexes(). I think > > that you should merge them together -- replace the only remaining call > > to lazy_vacuum_table_and_indexes() with the body of the function > > itself. Having a separate lazy_vacuum_table_and_indexes() function > > doesn't seem useful to me -- it doesn't actually hide complexity, and > > might even be harder to maintain. > > lazy_vacuum_table_and_indexes() is called at two places: after > maintenance_work_mem run out (around L1097) and the end of > lazy_scan_heap() (around L1726). I defined this function to pack the > operations such as choosing a strategy, vacuuming indexes and > vacuuming heap. Without this function, will we end up writing the same > codes twice there? > > > > > * I suggest thinking about what the last item will mean for the > > reporting that currently takes place in > > lazy_vacuum_table_and_indexes(), but will now go in an expanded > > lazy_vacuum_table_and_indexes() -- how do we count the total number of > > index scans now? > > > > I don't actually believe that the logic needs to change, but some kind > > of consolidation and streamlining seems like it might be helpful. > > Maybe just a comment that says "note that all index scans might just > > be no-ops because..." -- stuff like that. > > What do you mean by the last item and what report? I think > lazy_vacuum_table_and_indexes() itself doesn't report anything and > vacrelstats->num_index_scan counts the total number of index scans. > > > > > * Any idea about how hard it will be to teach is_wraparound VACUUMs to > > skip index cleanup automatically, based on some practical/sensible > > criteria? > > One simple idea would be to have a to-prevent-wraparound autovacuum > worker disables index cleanup (i.g., setting VACOPT_TERNARY_DISABLED > to index_cleanup). But a downside (but not a common case) is that > since a to-prevent-wraparound vacuum is not necessarily an aggressive > vacuum, it could skip index cleanup even though it cannot move > relfrozenxid forward. > > > > > It would be nice to have a basic PoC for that, even if it remains a > > PoC for the foreseeable future (i.e. even if it cannot be committed to > > Postgres 14). This feature should definitely be something that your > > patch series *enables*. I'd feel good about having covered that > > question as part of this basic design work if there was a PoC. That > > alone should make it 100% clear that it's easy to do (or no harder > > than it should be -- it should ideally be compatible with your basic > > design). The exact criteria that we use for deciding whether or not to > > skip index cleanup (which probably should not just be "this VACUUM is > > is_wraparound, good enough" in the final version) may need to be > > debated at length on pgsql-hackers. Even still, it is "just a detail" > > in the code. Whereas being *able* to do that with your design (now or > > in the future) seems essential now. > > Agreed. I'll write a PoC patch for that. > > > > > > * Store the answers to amvacuumstrategy into either the local memory > > > or DSM (in parallel vacuum case) so that ambulkdelete can refer the > > > answer to amvacuumstrategy. > > > * Fix regression failures. > > > * Update the documentation and commments. > > > > > > Note that 0003 patch is still PoC quality, lacking the btree meta page > > > version upgrade. > > > > This patch is not the hard part, of course -- there really isn't that > > much needed here compared to vacuumlazy.c. So this patch seems like > > the simplest 1 out of the 3 (at least to me). > > > > Some feedback on the third patch: > > > > * The new btm_last_deletion_nblocks metapage field should use P_NONE > > (which is 0) to indicate never having been vacuumed -- not > > InvalidBlockNumber (which is 0xFFFFFFFF). > > > > This is more idiomatic in nbtree, which is nice, but it has a very > > significant practical advantage: it ensures that every heapkeyspace > > nbtree index (i.e. those on recent nbtree versions) can be treated as > > if it has the new btm_last_deletion_nblocks field all along, even when > > it actually built on Postgres 12 or 13. This trick will let you avoid > > dealing with the headache of bumping BTREE_VERSION, which is a huge > > advantage. > > > > Note that this is the same trick I used to avoid bumping BTREE_VERSION > > when the btm_allequalimage field needed to be added (for the nbtree > > deduplication feature added to Postgres 13). > > > > That's a nice way with a great advantage. I'll use P_NONE. > > > * Forgot to do this in the third patch (think I made this same mistake > > once myself): > > > > diff --git a/contrib/pageinspect/btreefuncs.c > > b/contrib/pageinspect/btreefuncs.c > > index 8bb180bbbe..88dfea9af3 100644 > > --- a/contrib/pageinspect/btreefuncs.c > > +++ b/contrib/pageinspect/btreefuncs.c > > @@ -653,7 +653,7 @@ bt_metap(PG_FUNCTION_ARGS) > > BTMetaPageData *metad; > > TupleDesc tupleDesc; > > int j; > > - char *values[9]; > > + char *values[10]; > > Buffer buffer; > > Page page; > > HeapTuple tuple; > > @@ -734,6 +734,11 @@ bt_metap(PG_FUNCTION_ARGS) > > Check. > > I'm updating and testing the patch. I'll submit the updated version > patches tomorrow. >
Sorry for the late. I've attached the updated version patch that incorporated the comments I got so far. I merged the previous 0001 and 0002 patches. 0003 patch is now another PoC patch that disables index cleanup automatically when autovacuum is to prevent xid-wraparound and an aggressive vacuum. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
v3-0002-Skip-btree-bulkdelete-if-the-index-doesn-t-grow.patch
Description: Binary data
v3-0003-PoC-disable-index-cleanup-when-an-anti-wraparound.patch
Description: Binary data
v3-0001-Choose-vacuum-strategy-before-heap-and-index-vacu.patch
Description: Binary data