On Sun, 6 Sep 2020 at 23:37, David Rowley <dgrowle...@gmail.com> wrote:
> The test replayed ~2.2 GB of WAL. master took 148.581 seconds and
> master+0001+0002 took 115.588 seconds. That's about 28% faster. Pretty
> nice!

I was wondering today if we could just get rid of the sort in
compactify_tuples() completely. It seems to me that the existing sort
is there just so that the memmove() is done in order of tuple at the
end of the page first. We seem to be just shunting all the tuples to
the end of the page so we need to sort the line items in reverse
offset so as not to overwrite memory for other tuples during the copy.

I wondered if we could get around that just by having another buffer
somewhere and memcpy the tuples into that first then copy the tuples
out that buffer back into the page. No need to worry about the order
we do that in as there's no chance to overwrite memory belonging to
other tuples.

Doing that gives me 79.166 seconds in the same recovery test. Or about
46% faster, instead of 22% (I mistakenly wrote 28% yesterday)

The top of perf report says:

  24.19%  postgres  postgres            [.] PageRepairFragmentation
   8.37%  postgres  postgres            [.] hash_search_with_hash_value
   7.40%  postgres  libc-2.31.so        [.] 0x000000000018e74b
   5.59%  postgres  libc-2.31.so        [.] 0x000000000018e741
   5.49%  postgres  postgres            [.] XLogReadBufferExtended
   4.05%  postgres  postgres            [.] compactify_tuples
   3.27%  postgres  postgres            [.] PinBuffer
   2.88%  postgres  libc-2.31.so        [.] 0x000000000018e470
   2.02%  postgres  postgres            [.] hash_bytes

(I'll need to figure out why libc's debug symbols are not working)

I was thinking that there might be a crossover point to where this
method becomes faster than the sort method.  e.g sorting 1 tuple is
pretty cheap, but copying the memory for the entire tuple space might
be expensive as that includes the tuples we might be getting rid of.
So if we did go down that path we might need some heuristics to decide
which method is likely best. Maybe that's based on the number of
tuples, I'm not really sure. I've not made any attempt to try to give
it a worst-case workload to see if there is a crossover point that's
worth worrying about.

The attached patch is what I used to test this.  It kinda goes and
sticks a page-sized variable on the stack, which is not exactly ideal.
I think we'd likely want to figure some other way to do that, but I
just don't know what that would look like yet. I just put the attached
together quickly to test out the idea.

(I don't want to derail the sort improvements here. I happen to think
those are quite important improvements, so I'll continue to review
that patch still.  Longer term, we might just end up with something
slightly different for compactify_tuples)

David
diff --git a/src/backend/storage/page/bufpage.c 
b/src/backend/storage/page/bufpage.c
index d708117a40..5bf35aff37 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -440,22 +440,53 @@ compactify_tuples(itemIdSort itemidbase, int nitems, Page 
page)
        Offset          upper;
        int                     i;
 
-       /* sort itemIdSortData array into decreasing itemoff order */
-       qsort((char *) itemidbase, nitems, sizeof(itemIdSortData),
-                 itemoffcompare);
+       if (nitems > MaxHeapTuplesPerPage / 2)
+       {
+               char buffer[BLCKSZ + MAXIMUM_ALIGNOF];
+               char *bufp = (char *) MAXALIGN(&buffer);
+
+               /*
+                * Make a temp copy of the tuple data so that we can rearange
+                * the tuples freely without having to worry about stomping
+                * on the memory of other tuples.
+                */
+               memcpy(bufp + phdr->pd_upper,
+                          page + phdr->pd_upper,
+                          phdr->pd_special - phdr->pd_upper);
 
-       upper = phdr->pd_special;
-       for (i = 0; i < nitems; i++)
+               upper = phdr->pd_special;
+               for (i = 0; i < nitems; i++)
+               {
+                       itemIdSort      itemidptr = &itemidbase[i];
+                       ItemId          lp;
+
+                       lp = PageGetItemId(page, itemidptr->offsetindex + 1);
+                       upper -= itemidptr->alignedlen;
+                       memcpy((char *) page + upper,
+                                  bufp + itemidptr->itemoff,
+                                  itemidptr->alignedlen);
+                       lp->lp_off = upper;
+               }
+       }
+       else
        {
-               itemIdSort      itemidptr = &itemidbase[i];
-               ItemId          lp;
-
-               lp = PageGetItemId(page, itemidptr->offsetindex + 1);
-               upper -= itemidptr->alignedlen;
-               memmove((char *) page + upper,
-                               (char *) page + itemidptr->itemoff,
-                               itemidptr->alignedlen);
-               lp->lp_off = upper;
+               /* sort itemIdSortData array into decreasing itemoff order */
+               qsort((char *) itemidbase, nitems, sizeof(itemIdSortData),
+                         itemoffcompare);
+
+               upper = phdr->pd_special;
+               for (i = 0; i < nitems; i++)
+               {
+                       itemIdSort      itemidptr = &itemidbase[i];
+                       ItemId          lp;
+
+                       lp = PageGetItemId(page, itemidptr->offsetindex + 1);
+                       upper -= itemidptr->alignedlen;
+                       memmove((char *) page + upper,
+                                       (char *) page + itemidptr->itemoff,
+                                       itemidptr->alignedlen);
+                       lp->lp_off = upper;
+               }
        }
 
        phdr->pd_upper = upper;

Reply via email to