Hi,Playing around [1] to understand how, in practice, an engineer should identify potential indexes, I found that nfiltered1 and nfiltered2 are sufficient enough to detect issues with leaf table scan operators. But the situation is worse when it comes to joins.
The idea behind JOIN optimisation is that sometimes a highly selective, parameterised NestLoop is more performant than a HashJoin. What we need is to identify that only a tiny part of the hash table or a sorted MergeJoin input has been used to produce the JOIN result.
Thanks to [2 - 5], we have metrics showing how many tuples are removed by joinqual and otherquals in a JOIN operator. That’s good for starters. But some cases aren’t covered yet: how many tuples filtered by hashclauses or mergeclauses.
In the attached file, you can see that for the same query, NestLoop exposes 100k filtered tuples, but HashJoin shows nothing. Original threads argued that ‘Filtered out’ characteristics should expose extra work done by the operator. Hashing operation, especially on a set of variable-length columns sometimes quite expensive. Further filtering, involving hashclauses looks pretty similar to the joinqual filtering.
For me, ‘filtered’ value represents a flag that some massive part of the input is not needed at all and using proper parameterisation and indexing, we could optimise such a JOIN with NestLoop or MergeJoin.
From this point of view, it seems logical to add a nfiltered3 instrumentation field and account rows, filtered out by a ‘special’ join clause like hashclauses or mergeclauses.
In the attachment, I propose a sketch on how to calculate these metrics. MergeJoin looks more complicated and I don't propose it for now, but HashJoin is quite trivial.
Known issues: - Hash clause failures are counted in nfiltered1, which is shared with join filter removals. If both are present, counts are combined. - The metric only counts outer tuples with zero hash-value matches, not hash collisions within buckets. Thoughts?[1] Proposal: Add rows_filtered column to pg_stat_statements for index opportunity detectionhttps://www.postgresql.org/message-id/CAM527d-r%2BRsaAeYsyAPmYtnmWB3rJFJtixUq4bnJW59nN%3DZo3w%40mail.gmail.com
[2] RFD: Discarded tuple count for SeqScan nodes in EXPLAIN ANALYZE https://www.postgresql.org/message-id/flat/[email protected][3] EXPLAIN and nfiltered - Mailing list pgsql-hackers
https://www.postgresql.org/message-id/[email protected] [4] Re: REVIEW: EXPLAIN and nfiltered https://www.postgresql.org/message-id/9053.1295888538%40sss.pgh.pa.us [5] EXPLAIN and nfiltered, take two https://www.postgresql.org/message-id/flat/4E68B108.1090907%402ndquadrant.com -- regards, Andrei Lepikhov, pgEdge
filtered-comparison.sql
Description: application/sql
From d1f896a33bd4bf1a338c210023abd42b35838371 Mon Sep 17 00:00:00 2001 From: "Andrei V. Lepikhov" <[email protected]> Date: Sat, 17 Jan 2026 00:13:50 +0100 Subject: [PATCH] Add "Rows Removed by Hash Matching" to EXPLAIN ANALYZE for hash joins This patch adds instrumentation to track outer tuples that found no matching inner tuples during hash join bucket scanning. The new metric "Rows Removed by Hash Matching" is displayed in EXPLAIN ANALYZE output when hash clauses are present. The implementation adds a third filtered-tuple counter (nfiltered3) to the Instrumentation struct. For hash joins, this counter tracks outer tuples where no inner tuple with a matching hash value was found in the probed bucket. Additionally, tuples where the hash value matched but the hash clause evaluation failed are counted in nfiltered1 (displayed as part of "Rows Removed by Join Filter" when a join filter is present). --- src/backend/commands/explain.c | 7 ++++++- src/backend/executor/instrument.c | 1 + src/backend/executor/nodeHash.c | 20 ++++++++++++++++---- src/include/executor/instrument.h | 3 ++- src/include/nodes/execnodes.h | 5 +++++ 5 files changed, 30 insertions(+), 6 deletions(-) diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index b7bb111688c..e68dc2d8eee 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -2191,6 +2191,9 @@ ExplainNode(PlanState *planstate, List *ancestors, "Hash Cond", planstate, ancestors, es); show_upper_qual(((HashJoin *) plan)->join.joinqual, "Join Filter", planstate, ancestors, es); + if (((HashJoin *) plan)->hashclauses) + show_instrumentation_count("Rows Removed by Hash Matching", 3, + planstate, es); if (((HashJoin *) plan)->join.joinqual) show_instrumentation_count("Rows Removed by Join Filter", 1, planstate, es); @@ -3998,7 +4001,9 @@ show_instrumentation_count(const char *qlabel, int which, if (!es->analyze || !planstate->instrument) return; - if (which == 2) + if (which == 3) + nfiltered = planstate->instrument->nfiltered3; + else if (which == 2) nfiltered = planstate->instrument->nfiltered2; else nfiltered = planstate->instrument->nfiltered1; diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c index edab92a0ebe..38a1b5a4757 100644 --- a/src/backend/executor/instrument.c +++ b/src/backend/executor/instrument.c @@ -183,6 +183,7 @@ InstrAggNode(Instrumentation *dst, Instrumentation *add) dst->nloops += add->nloops; dst->nfiltered1 += add->nfiltered1; dst->nfiltered2 += add->nfiltered2; + dst->nfiltered3 += add->nfiltered3; /* Add delta of buffer usage since entry to node's totals */ if (dst->need_bufusage) diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c index f5d3edb90e2..66a9de6dbfc 100644 --- a/src/backend/executor/nodeHash.c +++ b/src/backend/executor/nodeHash.c @@ -1995,6 +1995,7 @@ ExecScanHashBucket(HashJoinState *hjstate, HashJoinTable hashtable = hjstate->hj_HashTable; HashJoinTuple hashTuple = hjstate->hj_CurTuple; uint32 hashvalue = hjstate->hj_CurHashValue; + TupleTableSlot *inntuple = NULL; /* * hj_CurTuple is the address of the tuple last returned from the current @@ -2014,8 +2015,6 @@ ExecScanHashBucket(HashJoinState *hjstate, { if (hashTuple->hashvalue == hashvalue) { - TupleTableSlot *inntuple; - /* insert hashtable's tuple into exec slot so ExecQual sees it */ inntuple = ExecStoreMinimalTuple(HJTUPLE_MINTUPLE(hashTuple), hjstate->hj_HashTupleSlot, @@ -2027,6 +2026,9 @@ ExecScanHashBucket(HashJoinState *hjstate, hjstate->hj_CurTuple = hashTuple; return true; } + + /* Tuple matched hash value but failed hash join clauses */ + InstrCountFiltered1(&hjstate->js.ps, 1); } hashTuple = hashTuple->next.unshared; @@ -2035,6 +2037,10 @@ ExecScanHashBucket(HashJoinState *hjstate, /* * no match */ + + if (hjstate->hj_CurTuple == NULL && inntuple == NULL) + InstrCountFiltered3(&hjstate->js.ps, 1); + return false; } @@ -2056,6 +2062,7 @@ ExecParallelScanHashBucket(HashJoinState *hjstate, HashJoinTable hashtable = hjstate->hj_HashTable; HashJoinTuple hashTuple = hjstate->hj_CurTuple; uint32 hashvalue = hjstate->hj_CurHashValue; + TupleTableSlot *inntuple = NULL; /* * hj_CurTuple is the address of the tuple last returned from the current @@ -2071,8 +2078,6 @@ ExecParallelScanHashBucket(HashJoinState *hjstate, { if (hashTuple->hashvalue == hashvalue) { - TupleTableSlot *inntuple; - /* insert hashtable's tuple into exec slot so ExecQual sees it */ inntuple = ExecStoreMinimalTuple(HJTUPLE_MINTUPLE(hashTuple), hjstate->hj_HashTupleSlot, @@ -2084,6 +2089,9 @@ ExecParallelScanHashBucket(HashJoinState *hjstate, hjstate->hj_CurTuple = hashTuple; return true; } + + /* Tuple matched hash value but failed hash clauses */ + InstrCountFiltered1(&hjstate->js.ps, 1); } hashTuple = ExecParallelHashNextTuple(hashtable, hashTuple); @@ -2092,6 +2100,10 @@ ExecParallelScanHashBucket(HashJoinState *hjstate, /* * no match */ + + if (hjstate->hj_CurTuple == NULL && inntuple == NULL) + InstrCountFiltered3(&hjstate->js.ps, 1); + return false; } diff --git a/src/include/executor/instrument.h b/src/include/executor/instrument.h index 9759f3ea5d8..200ecb2c348 100644 --- a/src/include/executor/instrument.h +++ b/src/include/executor/instrument.h @@ -88,8 +88,9 @@ typedef struct Instrumentation double ntuples; /* total tuples produced */ double ntuples2; /* secondary node-specific tuple counter */ double nloops; /* # of run cycles for this node */ - double nfiltered1; /* # of tuples removed by scanqual or joinqual */ + double nfiltered1; /* # of tuples removed by scanqual, joinqual or hashqual */ double nfiltered2; /* # of tuples removed by "other" quals */ + double nfiltered3; /* # of tuples removed by "hash matching" */ BufferUsage bufusage; /* total buffer usage */ WalUsage walusage; /* total WAL usage */ } Instrumentation; diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h index f8053d9e572..5de0a08a456 100644 --- a/src/include/nodes/execnodes.h +++ b/src/include/nodes/execnodes.h @@ -1278,6 +1278,11 @@ typedef struct PlanState if (((PlanState *)(node))->instrument) \ ((PlanState *)(node))->instrument->nfiltered2 += (delta); \ } while(0) +#define InstrCountFiltered3(node, delta) \ + do { \ + if (((PlanState *)(node))->instrument) \ + ((PlanState *)(node))->instrument->nfiltered3 += (delta); \ + } while(0) /* * EPQState is state for executing an EvalPlanQual recheck on a candidate -- 2.52.0
