On Mon, Nov 2, 2020 at 3:44 PM David Rowley <dgrowle...@gmail.com> wrote:

> On Tue, 20 Oct 2020 at 22:30, David Rowley <dgrowle...@gmail.com> wrote:
> >
> > So far benchmarking shows there's still a regression from the v8
> > version of the patch. This is using count(*). An earlier test [1] did
> > show speedups when we needed to deform tuples returned by the nested
> > loop node. I've not yet repeated that test again. I was disappointed
> > to see v9 slower than v8 after having spent about 3 days rewriting the
> > patch
>
> I did some further tests this time with some tuple deforming.  Again,
> it does seem that v9 is slower than v8.
>

I run your test case on v8 and v9,  I can produce a stable difference
between them.

v8:
statement latencies in milliseconds:
      1603.611  select count(*) from hundredk hk inner join lookup l on
hk.thousand = l.a;

v9:
statement latencies in milliseconds:
      1772.287  select count(*) from hundredk hk inner join lookup l on
hk.thousand = l.a;

then I did a perf on the 2 version,  Is it possible that you
called tts_minimal_clear twice in
the v9 version?  Both ExecClearTuple and  ExecStoreMinimalTuple
called tts_minimal_clear
on the same  slot.

With the following changes:

diff --git a/src/backend/executor/execMRUTupleCache.c
b/src/backend/executor/execMRUTupleCache.c
index 3553dc26cb..b82d8e98b8 100644
--- a/src/backend/executor/execMRUTupleCache.c
+++ b/src/backend/executor/execMRUTupleCache.c
@@ -203,10 +203,9 @@ prepare_probe_slot(MRUTupleCache *mrucache,
MRUCacheKey *key)
        TupleTableSlot *tslot = mrucache->tableslot;
        int                             numKeys = mrucache->nkeys;

-       ExecClearTuple(pslot);
-
        if (key == NULL)
        {
+               ExecClearTuple(pslot);
                /* Set the probeslot's values based on the current
parameter values */
                for (int i = 0; i < numKeys; i++)
                        pslot->tts_values[i] =
ExecEvalExpr(mrucache->param_exprs[i],
@@ -641,7 +640,7 @@ ExecMRUTupleCacheFetch(MRUTupleCache *mrucache)
                                        {
                                                mrucache->state =
MRUCACHE_FETCH_NEXT_TUPLE;

-
ExecClearTuple(mrucache->cachefoundslot);
+                                               //
ExecClearTuple(mrucache->cachefoundslot);
                                                slot =
mrucache->cachefoundslot;

ExecStoreMinimalTuple(mrucache->last_tuple->mintuple, slot, false);
                                                return slot;
@@ -740,7 +739,7 @@ ExecMRUTupleCacheFetch(MRUTupleCache *mrucache)
                                        return NULL;
                                }

-                               ExecClearTuple(mrucache->cachefoundslot);
+                               // ExecClearTuple(mrucache->cachefoundslot);
                                slot = mrucache->cachefoundslot;

ExecStoreMinimalTuple(mrucache->last_tuple->mintuple, slot, false);
                                return slot;


v9 has the following result:
      1608.048  select count(*) from hundredk hk inner join lookup l on
hk.thousand = l.a;



> Graphs attached
>
> Looking at profiles, I don't really see any obvious reason as to why
> this is.  I'm very much inclined to just pursue the v8 patch (separate
> Result Cache node) and just drop the v9 idea altogether.
>
> David
>


-- 
Best Regards
Andy Fan

Reply via email to