On Sun, 6 Sep 2020 at 23:37, David Rowley <dgrowle...@gmail.com> wrote: > The test replayed ~2.2 GB of WAL. master took 148.581 seconds and > master+0001+0002 took 115.588 seconds. That's about 28% faster. Pretty > nice!
I was wondering today if we could just get rid of the sort in compactify_tuples() completely. It seems to me that the existing sort is there just so that the memmove() is done in order of tuple at the end of the page first. We seem to be just shunting all the tuples to the end of the page so we need to sort the line items in reverse offset so as not to overwrite memory for other tuples during the copy. I wondered if we could get around that just by having another buffer somewhere and memcpy the tuples into that first then copy the tuples out that buffer back into the page. No need to worry about the order we do that in as there's no chance to overwrite memory belonging to other tuples. Doing that gives me 79.166 seconds in the same recovery test. Or about 46% faster, instead of 22% (I mistakenly wrote 28% yesterday) The top of perf report says: 24.19% postgres postgres [.] PageRepairFragmentation 8.37% postgres postgres [.] hash_search_with_hash_value 7.40% postgres libc-2.31.so [.] 0x000000000018e74b 5.59% postgres libc-2.31.so [.] 0x000000000018e741 5.49% postgres postgres [.] XLogReadBufferExtended 4.05% postgres postgres [.] compactify_tuples 3.27% postgres postgres [.] PinBuffer 2.88% postgres libc-2.31.so [.] 0x000000000018e470 2.02% postgres postgres [.] hash_bytes (I'll need to figure out why libc's debug symbols are not working) I was thinking that there might be a crossover point to where this method becomes faster than the sort method. e.g sorting 1 tuple is pretty cheap, but copying the memory for the entire tuple space might be expensive as that includes the tuples we might be getting rid of. So if we did go down that path we might need some heuristics to decide which method is likely best. Maybe that's based on the number of tuples, I'm not really sure. I've not made any attempt to try to give it a worst-case workload to see if there is a crossover point that's worth worrying about. The attached patch is what I used to test this. It kinda goes and sticks a page-sized variable on the stack, which is not exactly ideal. I think we'd likely want to figure some other way to do that, but I just don't know what that would look like yet. I just put the attached together quickly to test out the idea. (I don't want to derail the sort improvements here. I happen to think those are quite important improvements, so I'll continue to review that patch still. Longer term, we might just end up with something slightly different for compactify_tuples) David
diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c index d708117a40..5bf35aff37 100644 --- a/src/backend/storage/page/bufpage.c +++ b/src/backend/storage/page/bufpage.c @@ -440,22 +440,53 @@ compactify_tuples(itemIdSort itemidbase, int nitems, Page page) Offset upper; int i; - /* sort itemIdSortData array into decreasing itemoff order */ - qsort((char *) itemidbase, nitems, sizeof(itemIdSortData), - itemoffcompare); + if (nitems > MaxHeapTuplesPerPage / 2) + { + char buffer[BLCKSZ + MAXIMUM_ALIGNOF]; + char *bufp = (char *) MAXALIGN(&buffer); + + /* + * Make a temp copy of the tuple data so that we can rearange + * the tuples freely without having to worry about stomping + * on the memory of other tuples. + */ + memcpy(bufp + phdr->pd_upper, + page + phdr->pd_upper, + phdr->pd_special - phdr->pd_upper); - upper = phdr->pd_special; - for (i = 0; i < nitems; i++) + upper = phdr->pd_special; + for (i = 0; i < nitems; i++) + { + itemIdSort itemidptr = &itemidbase[i]; + ItemId lp; + + lp = PageGetItemId(page, itemidptr->offsetindex + 1); + upper -= itemidptr->alignedlen; + memcpy((char *) page + upper, + bufp + itemidptr->itemoff, + itemidptr->alignedlen); + lp->lp_off = upper; + } + } + else { - itemIdSort itemidptr = &itemidbase[i]; - ItemId lp; - - lp = PageGetItemId(page, itemidptr->offsetindex + 1); - upper -= itemidptr->alignedlen; - memmove((char *) page + upper, - (char *) page + itemidptr->itemoff, - itemidptr->alignedlen); - lp->lp_off = upper; + /* sort itemIdSortData array into decreasing itemoff order */ + qsort((char *) itemidbase, nitems, sizeof(itemIdSortData), + itemoffcompare); + + upper = phdr->pd_special; + for (i = 0; i < nitems; i++) + { + itemIdSort itemidptr = &itemidbase[i]; + ItemId lp; + + lp = PageGetItemId(page, itemidptr->offsetindex + 1); + upper -= itemidptr->alignedlen; + memmove((char *) page + upper, + (char *) page + itemidptr->itemoff, + itemidptr->alignedlen); + lp->lp_off = upper; + } } phdr->pd_upper = upper;