On Mon, Nov 2, 2020 at 3:44 PM David Rowley <dgrowle...@gmail.com> wrote:
> On Tue, 20 Oct 2020 at 22:30, David Rowley <dgrowle...@gmail.com> wrote: > > > > So far benchmarking shows there's still a regression from the v8 > > version of the patch. This is using count(*). An earlier test [1] did > > show speedups when we needed to deform tuples returned by the nested > > loop node. I've not yet repeated that test again. I was disappointed > > to see v9 slower than v8 after having spent about 3 days rewriting the > > patch > > I did some further tests this time with some tuple deforming. Again, > it does seem that v9 is slower than v8. > I run your test case on v8 and v9, I can produce a stable difference between them. v8: statement latencies in milliseconds: 1603.611 select count(*) from hundredk hk inner join lookup l on hk.thousand = l.a; v9: statement latencies in milliseconds: 1772.287 select count(*) from hundredk hk inner join lookup l on hk.thousand = l.a; then I did a perf on the 2 version, Is it possible that you called tts_minimal_clear twice in the v9 version? Both ExecClearTuple and ExecStoreMinimalTuple called tts_minimal_clear on the same slot. With the following changes: diff --git a/src/backend/executor/execMRUTupleCache.c b/src/backend/executor/execMRUTupleCache.c index 3553dc26cb..b82d8e98b8 100644 --- a/src/backend/executor/execMRUTupleCache.c +++ b/src/backend/executor/execMRUTupleCache.c @@ -203,10 +203,9 @@ prepare_probe_slot(MRUTupleCache *mrucache, MRUCacheKey *key) TupleTableSlot *tslot = mrucache->tableslot; int numKeys = mrucache->nkeys; - ExecClearTuple(pslot); - if (key == NULL) { + ExecClearTuple(pslot); /* Set the probeslot's values based on the current parameter values */ for (int i = 0; i < numKeys; i++) pslot->tts_values[i] = ExecEvalExpr(mrucache->param_exprs[i], @@ -641,7 +640,7 @@ ExecMRUTupleCacheFetch(MRUTupleCache *mrucache) { mrucache->state = MRUCACHE_FETCH_NEXT_TUPLE; - ExecClearTuple(mrucache->cachefoundslot); + // ExecClearTuple(mrucache->cachefoundslot); slot = mrucache->cachefoundslot; ExecStoreMinimalTuple(mrucache->last_tuple->mintuple, slot, false); return slot; @@ -740,7 +739,7 @@ ExecMRUTupleCacheFetch(MRUTupleCache *mrucache) return NULL; } - ExecClearTuple(mrucache->cachefoundslot); + // ExecClearTuple(mrucache->cachefoundslot); slot = mrucache->cachefoundslot; ExecStoreMinimalTuple(mrucache->last_tuple->mintuple, slot, false); return slot; v9 has the following result: 1608.048 select count(*) from hundredk hk inner join lookup l on hk.thousand = l.a; > Graphs attached > > Looking at profiles, I don't really see any obvious reason as to why > this is. I'm very much inclined to just pursue the v8 patch (separate > Result Cache node) and just drop the v9 idea altogether. > > David > -- Best Regards Andy Fan