Thanks for the detailed feedback!

On 04.11.2025 00:55, Tom Lane wrote:

Hmm.  Those results sure look like there is a performance regression
up to at least 100 MCVs ... not a large one, but consistently a few
percent.  That's a bit sad for a patch purporting to improve
performance.  It looks to me like perhaps we should stick to the old
algorithm up to 100 or possibly even more MCVs.  Certainly the
threshold needs to be higher than 1, as you have it now.


I re-ran the benchmark on JOB with a threshold of 100.Here are the updated results:

default_statistics_target | Planner Speedup (×) | Planner Before (ms) | Planner After (ms)
--------------------------------------------------------------------------------
1                         | 1.00                | 2320.412            | 2318.377 5                         | 0.99                | 2335.894            | 2360.890 10                        | 1.00                | 2350.612            | 2347.154 25                        | 1.01                | 2365.977            | 2342.312 50                        | 0.99                | 2381.554            | 2405.262 75                        | 1.00                | 2396.481            | 2399.828 100                       | 1.00                | 2410.367            | 2412.456 1000                      | 1.11                | 2850.853            | 2564.303 2500                      | 2.04                | 5571.688            | 2731.545 5000                      | 6.05                | 18850.084           | 3114.692 7500                      | 11.96               | 39160.898           | 3273.688 10000                     | 19.04               | 71334.113           | 3745.955

I eyeballed the patch itself very briefly, and have a couple
quick comments:

* Is hash_msv a typo for hash_mcv?  If not, maybe spell out what
it's supposed to mean.


Yes, that was a typo — fixed.

* The patch would be easier to read if it didn't reindent a couple
large chunks of existing code.  Can we change the factorization
to avoid that?  If not, I'd recommend submitting without that
reindentation, and reminding the committer to reindent at the last
moment.


Fixed as well. I’ve removed all reindentation changes. I will keep that in mind for future submissions.


* The calculation loops in eqjoinsel_inner and eqjoinsel_semi
are not identical, which makes it look quite weird to be
writing just one function that conditionally replaces both.
I wonder if we should refactor to have just one copy (and
just eat the extra cycles of calculating matchprodfreq).


Agreed. I’ve dropped the attempt to merge them into a single function.



* In fact ... I wonder if we should try harder to not do essentially
identical calculations twice, remembering that eqjoinsel_semi is
always used alongside eqjoinsel_inner.  (Of course, we could only do
that if clamped_nvalues2 is the same as sslot2->nvalues, but that's
frequently true.)  I think the reason it's duplicative right now
is that we regarded this semijoin calculation as an experiment and
so didn't want to invest a lot of effort in it ... but this patch
is exactly a lot of effort, so maybe it's time to deal with that
unfinished business.

                        regards, tom lane


Good point. I addressed this in a separate patch: eqjoinsel_inner() now saves matchfreq1, matchfreq2, nmatches so that eqjoinsel_semi() can reuse them when (clamped_nvalues2 == sslot2->nvalues). If the MCV list on the RHS is clamped, we still recompute locally. If you have a cleaner idea for how to share these values between the two functions without passing them explicitly, I’d be happy to consider it.

I’m attaching two patches:
1. v4-0001-Avoid-duplicate-MCV-matching-in-eqjoinsel_semi-an.patch - removes redundant MCV matching for semi/anti joins; 2. v4-0002-Optimize-MCV-matching-in-eqjoinsel_inner-and-eqjo.patch - adds hash-based MCV matching with a configurable threshold and includes fixes based on your comments.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com/
From 71f59bd83e559df6b36720a4592bf3fdd689504a Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov <[email protected]>
Date: Mon, 10 Nov 2025 16:11:43 +0300
Subject: [PATCH v1] Avoid duplicate MCV matching in eqjoinsel_semi and
 eqjoinsel_inner.

Previously both eqjoinsel_inner() and eqjoinsel_semi() performed identical
O(N^2) loops over MCV lists, even though the semi join case always follows
the inner join case in eqjoinsel().  Now the MCV matching results from
eqjoinsel_inner() are reused in eqjoinsel_semi() when possible (i.e., when
the RHS MCV list is not clamped).  This saves redundant computation and
simplifies the code.

Author: Ilia Evdokimov <[email protected]>
Reviewed-by: Tom Lane <[email protected]>
Reviewed-by: David Geier <[email protected]>
---
 src/backend/utils/adt/selfuncs.c | 36 ++++++++++++++++++++++++++------
 1 file changed, 30 insertions(+), 6 deletions(-)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index cb23ad52782..55cd0486bf9 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -154,7 +154,9 @@ static double eqjoinsel_inner(Oid opfuncoid, Oid collation,
 							  bool isdefault1, bool isdefault2,
 							  AttStatsSlot *sslot1, AttStatsSlot *sslot2,
 							  Form_pg_statistic stats1, Form_pg_statistic stats2,
-							  bool have_mcvs1, bool have_mcvs2);
+							  bool have_mcvs1, bool have_mcvs2,
+							  double *matchfreq_mcvs1, double *matchfreq_mcvs2,
+							  int *nmatches_mcvs);
 static double eqjoinsel_semi(Oid opfuncoid, Oid collation,
 							 VariableStatData *vardata1, VariableStatData *vardata2,
 							 double nd1, double nd2,
@@ -162,6 +164,7 @@ static double eqjoinsel_semi(Oid opfuncoid, Oid collation,
 							 AttStatsSlot *sslot1, AttStatsSlot *sslot2,
 							 Form_pg_statistic stats1, Form_pg_statistic stats2,
 							 bool have_mcvs1, bool have_mcvs2,
+							 double matchfreq1, int nmatches,
 							 RelOptInfo *inner_rel);
 static bool estimate_multivariate_ndistinct(PlannerInfo *root,
 											RelOptInfo *rel, List **varinfos, double *ndistinct);
@@ -2313,6 +2316,9 @@ eqjoinsel(PG_FUNCTION_ARGS)
 	bool		get_mcv_stats;
 	bool		join_is_reversed;
 	RelOptInfo *inner_rel;
+	int		nmatches_mcvs = 0;
+	double		matchfreq_mcvs1 = 0.0;
+	double		matchfreq_mcvs2 = 0.0;
 
 	get_join_variables(root, args, sjinfo,
 					   &vardata1, &vardata2, &join_is_reversed);
@@ -2367,7 +2373,9 @@ eqjoinsel(PG_FUNCTION_ARGS)
 								  isdefault1, isdefault2,
 								  &sslot1, &sslot2,
 								  stats1, stats2,
-								  have_mcvs1, have_mcvs2);
+								  have_mcvs1, have_mcvs2,
+								  &matchfreq_mcvs1, &matchfreq_mcvs2,
+								  &nmatches_mcvs);
 
 	switch (sjinfo->jointype)
 	{
@@ -2395,6 +2403,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
 									   &sslot1, &sslot2,
 									   stats1, stats2,
 									   have_mcvs1, have_mcvs2,
+									   matchfreq_mcvs1, nmatches_mcvs,
 									   inner_rel);
 			else
 			{
@@ -2408,6 +2417,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
 									   &sslot2, &sslot1,
 									   stats2, stats1,
 									   have_mcvs2, have_mcvs1,
+									   matchfreq_mcvs2, nmatches_mcvs,
 									   inner_rel);
 			}
 
@@ -2455,7 +2465,9 @@ eqjoinsel_inner(Oid opfuncoid, Oid collation,
 				bool isdefault1, bool isdefault2,
 				AttStatsSlot *sslot1, AttStatsSlot *sslot2,
 				Form_pg_statistic stats1, Form_pg_statistic stats2,
-				bool have_mcvs1, bool have_mcvs2)
+				bool have_mcvs1, bool have_mcvs2,
+				double *matchfreq_mcvs1, double *matchfreq_mcvs2,
+				int *nmatches_mcvs)
 {
 	double		selec;
 
@@ -2595,6 +2607,11 @@ eqjoinsel_inner(Oid opfuncoid, Oid collation,
 			totalsel2 += otherfreq2 * (otherfreq1 + unmatchfreq1) /
 				(nd1 - nmatches);
 
+		/* Save MCV match statistics for possible reuse by eqjoinsel_semi() */
+		*matchfreq_mcvs1 = matchfreq1;
+		*matchfreq_mcvs2 = matchfreq2;
+		*nmatches_mcvs = nmatches;
+
 		/*
 		 * Use the smaller of the two estimates.  This can be justified in
 		 * essentially the same terms as given below for the no-stats case: to
@@ -2653,6 +2670,7 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
 			   AttStatsSlot *sslot1, AttStatsSlot *sslot2,
 			   Form_pg_statistic stats1, Form_pg_statistic stats2,
 			   bool have_mcvs1, bool have_mcvs2,
+			   double matchfreq1, int nmatches,
 			   RelOptInfo *inner_rel)
 {
 	double		selec;
@@ -2705,11 +2723,9 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
 		bool	   *hasmatch1;
 		bool	   *hasmatch2;
 		double		nullfrac1 = stats1->stanullfrac;
-		double		matchfreq1,
-					uncertainfrac,
+		double		uncertainfrac,
 					uncertain;
 		int			i,
-					nmatches,
 					clamped_nvalues2;
 
 		/*
@@ -2721,6 +2737,13 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
 		 */
 		clamped_nvalues2 = Min(sslot2->nvalues, nd2);
 
+		/*
+		 * eqjoinsel_inner() normally already did the full MCV comparison,
+		 * so we reuse its results unless RHS MCVs were clamped, in which
+		 * case we must redo the loop for the reduced list.
+		 */
+		if (clamped_nvalues2 != sslot2->nvalues)
+		{
 		fmgr_info(opfuncoid, &eqproc);
 
 		/*
@@ -2777,6 +2800,7 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
 		CLAMP_PROBABILITY(matchfreq1);
 		pfree(hasmatch1);
 		pfree(hasmatch2);
+		}
 
 		/*
 		 * Now we need to estimate the fraction of relation 1 that has at
-- 
2.34.1

From 27c6f4b6e947af27ccc9f6ae881407010f141bbe Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov <[email protected]>
Date: Mon, 10 Nov 2025 17:06:34 +0300
Subject: [PATCH v4 2/2] Optimize MCV matching in eqjoinsel_inner() and
 eqjoinsel_semi()

Previously, MCV values from both sides of a join were compared using
an O(N^2) nested-loop algorithm.  When default_statistics_target was
set to large values, this could make query planning noticeably slower.

This patch introduces a hash-based O(N) algorithm for matching MCVs.
The planner now switches to the hash method when the MCV lists are
large enough to amortize hash setup costs, while keeping the old
nested-loop path for small lists.  The threshold is currently set
to 100 entries.

For data types that do not support hashing, the code still falls back
to the O(N^2) algorithm.

Author: David Geier <[email protected]>
Author: Ilia Evdokimov <[email protected]>
Reviewed-by: Tom Lane <[email protected]>
---
 src/backend/utils/adt/selfuncs.c | 211 +++++++++++++++++++++++++++++--
 1 file changed, 202 insertions(+), 9 deletions(-)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 55cd0486bf9..b22e2e5d920 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -143,12 +143,20 @@
 
 #define DEFAULT_PAGE_CPU_MULTIPLIER 50.0
 
+/*
+ * Switch to hash-based MCV matching when lists are large enough
+ * to amortize hash setup cost.
+ */
+#define EQJOINSEL_MCV_HASH_THRESHOLD 100
+
+struct McvHashTable_hash;
+
 /* Hooks for plugins to get control when we ask for stats */
 get_relation_stats_hook_type get_relation_stats_hook = NULL;
 get_index_stats_hook_type get_index_stats_hook = NULL;
 
 static double eqsel_internal(PG_FUNCTION_ARGS, bool negate);
-static double eqjoinsel_inner(Oid opfuncoid, Oid collation,
+static double eqjoinsel_inner(Oid operator, Oid collation,
 							  VariableStatData *vardata1, VariableStatData *vardata2,
 							  double nd1, double nd2,
 							  bool isdefault1, bool isdefault2,
@@ -157,7 +165,7 @@ static double eqjoinsel_inner(Oid opfuncoid, Oid collation,
 							  bool have_mcvs1, bool have_mcvs2,
 							  double *matchfreq_mcvs1, double *matchfreq_mcvs2,
 							  int *nmatches_mcvs);
-static double eqjoinsel_semi(Oid opfuncoid, Oid collation,
+static double eqjoinsel_semi(Oid operator, Oid opfuncoid, Oid collation,
 							 VariableStatData *vardata1, VariableStatData *vardata2,
 							 double nd1, double nd2,
 							 bool isdefault1, bool isdefault2,
@@ -220,6 +228,55 @@ static bool get_actual_variable_endpoint(Relation heapRel,
 static RelOptInfo *find_join_input_rel(PlannerInfo *root, Relids relids);
 static double btcost_correlation(IndexOptInfo *index,
 								 VariableStatData *vardata);
+static uint32 hash_mcv(struct McvHashTable_hash *hashTable, Datum key);
+static bool are_mcvs_equal(struct McvHashTable_hash *hashTable, Datum value1, Datum value2);
+
+typedef struct McvHashEntry
+{
+	Datum  value;
+	uint32 index;
+	uint32 hash;
+	char   status;
+} McvHashEntry;
+
+typedef struct McvHashContext
+{
+	FmgrInfo equal_proc;
+	FmgrInfo hash_proc;
+	Oid      collation;
+} McvHashContext;
+
+#define SH_PREFIX                  McvHashTable
+#define SH_ELEMENT_TYPE            McvHashEntry
+#define SH_KEY_TYPE                Datum
+#define SH_KEY                     value
+#define SH_HASH_KEY(mcvs, key)     hash_mcv(mcvs, key)
+#define SH_EQUAL(mcvs, key0, key1) are_mcvs_equal(mcvs, key0, key1)
+#define SH_SCOPE                   static inline
+#define SH_STORE_HASH
+#define SH_GET_HASH(mcvs, key)     key->hash
+#define SH_DEFINE
+#define SH_DECLARE
+#include "lib/simplehash.h"
+
+static uint32
+hash_mcv(struct McvHashTable_hash *hashTable, Datum key)
+{
+	McvHashContext *context = (McvHashContext *)hashTable->private_data;
+	return DatumGetUInt32(FunctionCall1Coll(&context->hash_proc, context->collation, key));
+}
+
+static bool
+are_mcvs_equal(struct McvHashTable_hash *hashTable, Datum value1, Datum value2)
+{
+	/*
+	 * We can safely use FunctionCall2Coll() which requires the result to
+	 * never be NULL, because MCV arrays from 'pg_statistic' don't contain
+	 * NULL values
+	 */
+	McvHashContext *context = (McvHashContext *)hashTable->private_data;
+	return DatumGetBool(FunctionCall2Coll(&context->equal_proc, context->collation, value1, value2));
+}
 
 
 /*
@@ -2367,7 +2424,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
 	}
 
 	/* We need to compute the inner-join selectivity in all cases */
-	selec_inner = eqjoinsel_inner(opfuncoid, collation,
+	selec_inner = eqjoinsel_inner(operator, collation,
 								  &vardata1, &vardata2,
 								  nd1, nd2,
 								  isdefault1, isdefault2,
@@ -2396,7 +2453,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
 			inner_rel = find_join_input_rel(root, sjinfo->min_righthand);
 
 			if (!join_is_reversed)
-				selec = eqjoinsel_semi(opfuncoid, collation,
+				selec = eqjoinsel_semi(operator, opfuncoid, collation,
 									   &vardata1, &vardata2,
 									   nd1, nd2,
 									   isdefault1, isdefault2,
@@ -2410,7 +2467,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
 				Oid			commop = get_commutator(operator);
 				Oid			commopfuncoid = OidIsValid(commop) ? get_opcode(commop) : InvalidOid;
 
-				selec = eqjoinsel_semi(commopfuncoid, collation,
+				selec = eqjoinsel_semi(operator, commopfuncoid, collation,
 									   &vardata2, &vardata1,
 									   nd2, nd1,
 									   isdefault2, isdefault1,
@@ -2459,7 +2516,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
  * that it's worth trying to distinguish them here.
  */
 static double
-eqjoinsel_inner(Oid opfuncoid, Oid collation,
+eqjoinsel_inner(Oid operator, Oid collation,
 				VariableStatData *vardata1, VariableStatData *vardata2,
 				double nd1, double nd2,
 				bool isdefault1, bool isdefault2,
@@ -2502,8 +2559,11 @@ eqjoinsel_inner(Oid opfuncoid, Oid collation,
 					totalsel2;
 		int			i,
 					nmatches;
+		Oid			hashLeft  = InvalidOid;
+		Oid			hashRight = InvalidOid;
 
-		fmgr_info(opfuncoid, &eqproc);
+		fmgr_info(get_opcode(operator), &eqproc);
+		get_op_hash_functions(operator, &hashLeft, &hashRight);
 
 		/*
 		 * Save a few cycles by setting up the fcinfo struct just once. Using
@@ -2527,6 +2587,70 @@ eqjoinsel_inner(Oid opfuncoid, Oid collation,
 		 */
 		matchprodfreq = 0.0;
 		nmatches = 0;
+
+		/*
+		 * If one MCV array contains less than 100 values, there's no gain in using a hash table.
+		 * The sweet spot of using hash table lookups instead of iterating is slightly higher
+		 * than 1 but we don't bother here because the gains are neglectable.
+		 */
+		if (OidIsValid(hashLeft) && hashLeft == hashRight &&
+			Min(sslot1->nvalues, sslot2->nvalues) > EQJOINSEL_MCV_HASH_THRESHOLD)
+		{
+			AttStatsSlot	*statsInner = sslot2;
+			AttStatsSlot	*statsOuter = sslot1;
+			bool			*hasMatchInner = hasmatch2;
+			bool			*hasMatchOuter = hasmatch1;
+			int				nvaluesInner = sslot2->nvalues;
+			int				nvaluesOuter = sslot1->nvalues;
+			McvHashContext	hashContext;
+			McvHashTable_hash *hashTable;
+
+			/* Make sure we build the hash table on the smaller array. */
+			if (sslot1->nvalues < sslot2->nvalues)
+			{
+				statsInner = sslot1;
+				statsOuter = sslot2;
+				hasMatchInner = hasmatch1;
+				hasMatchOuter = hasmatch2;
+				nvaluesInner = sslot1->nvalues;
+				nvaluesOuter = sslot2->nvalues;
+			}
+
+			/* 1. Create hash table of smaller 'pg_statistic' array. That's O(n). */
+			fmgr_info(get_opcode(operator), &hashContext.equal_proc);
+			fmgr_info(hashLeft, &hashContext.hash_proc); /* hashLeft == hashRight */
+			hashContext.collation = collation;
+
+			hashTable = McvHashTable_create(CurrentMemoryContext, nvaluesInner, &hashContext);
+
+			for (i = 0; i < nvaluesInner; i++)
+			{
+				bool found = false;
+				McvHashEntry *entry = McvHashTable_insert(hashTable, statsInner->values[i], &found);
+
+				Assert(!found);
+
+				entry->index = i;
+			}
+
+			/* 2. Look-up values from other 'pg_statistic' array against hash map to find matches. */
+			for (i = 0; i < nvaluesOuter; i++)
+			{
+				McvHashEntry *entry = McvHashTable_lookup(hashTable, statsOuter->values[i]);
+
+				if (entry != NULL)
+				{
+					hasMatchInner[entry->index] = hasMatchOuter[i] = true;
+					nmatches++;
+					matchprodfreq += statsInner->numbers[entry->index] * statsOuter->numbers[i];
+				}
+			}
+
+			McvHashTable_destroy(hashTable);
+		}
+		else
+		{
+		/* Fallback to O(N^2) algorithm if hash based variant didn't succeed. */
 		for (i = 0; i < sslot1->nvalues; i++)
 		{
 			int			j;
@@ -2551,6 +2675,7 @@ eqjoinsel_inner(Oid opfuncoid, Oid collation,
 				}
 			}
 		}
+		}
 		CLAMP_PROBABILITY(matchprodfreq);
 		/* Sum up frequencies of matched and unmatched MCVs */
 		matchfreq1 = unmatchfreq1 = 0.0;
@@ -2663,7 +2788,7 @@ eqjoinsel_inner(Oid opfuncoid, Oid collation,
  * Unlike eqjoinsel_inner, we have to cope with opfuncoid being InvalidOid.
  */
 static double
-eqjoinsel_semi(Oid opfuncoid, Oid collation,
+eqjoinsel_semi(Oid operator, Oid opfuncoid, Oid collation,
 			   VariableStatData *vardata1, VariableStatData *vardata2,
 			   double nd1, double nd2,
 			   bool isdefault1, bool isdefault2,
@@ -2744,7 +2869,11 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
 		 */
 		if (clamped_nvalues2 != sslot2->nvalues)
 		{
-		fmgr_info(opfuncoid, &eqproc);
+			Oid 		hashLeft  = InvalidOid;
+			Oid 		hashRight = InvalidOid;
+
+			fmgr_info(get_opcode(operator), &eqproc);
+			get_op_hash_functions(operator, &hashLeft, &hashRight);
 
 		/*
 		 * Save a few cycles by setting up the fcinfo struct just once. Using
@@ -2767,6 +2896,69 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
 		 * and because the math wouldn't add up...
 		 */
 		nmatches = 0;
+
+		/*
+			* If one MCV array contains less than 100 values, there's no gain in using a hash table.
+			* The sweet spot of using hash table lookups instead of iterating is slightly higher
+			* than 1 but we don't bother here because the gains are neglectable.
+			*/
+		if (OidIsValid(hashLeft) && hashLeft == hashRight &&
+			Min(sslot1->nvalues, clamped_nvalues2) > EQJOINSEL_MCV_HASH_THRESHOLD)
+		{
+				AttStatsSlot	*statsInner = sslot2;
+				AttStatsSlot	*statsOuter = sslot1;
+				bool			*hasMatchInner = hasmatch2;
+				bool			*hasMatchOuter = hasmatch1;
+				int				nvaluesInner = clamped_nvalues2;
+				int				nvaluesOuter = sslot1->nvalues;
+				McvHashContext	hashContext;
+				McvHashTable_hash *hashTable;
+
+				/* Make sure we build the hash table on the smaller array. */
+				if (sslot1->nvalues < clamped_nvalues2)
+				{
+					statsInner = sslot1;
+					statsOuter = sslot2;
+					hasMatchInner = hasmatch1;
+					hasMatchOuter = hasmatch2;
+					nvaluesInner = sslot1->nvalues;
+					nvaluesOuter = clamped_nvalues2;
+				}
+
+				/* 1. Create hash table of smaller 'pg_statistic' array. That's O(n). */
+				fmgr_info(get_opcode(operator), &hashContext.equal_proc);
+				fmgr_info(hashLeft, &hashContext.hash_proc); /* hashLeft == hashRight */
+				hashContext.collation = collation;
+
+				hashTable = McvHashTable_create(CurrentMemoryContext, nvaluesInner, &hashContext);
+
+			for (i = 0; i < nvaluesInner; i++)
+			{
+				bool found = false;
+				McvHashEntry *entry = McvHashTable_insert(hashTable, statsInner->values[i], &found);
+
+					Assert(!found);
+
+					entry->index = i;
+				}
+
+				/* 2. Look-up values from other 'pg_statistic' array against hash map to find matches. */
+				for (i = 0; i < nvaluesOuter; i++)
+				{
+					McvHashEntry *entry = McvHashTable_lookup(hashTable, statsOuter->values[i]);
+
+					if (entry != NULL)
+				{
+					hasMatchInner[entry->index] = hasMatchOuter[i] = true;
+					nmatches++;
+				}
+			}
+
+			McvHashTable_destroy(hashTable);
+		}
+		else
+		{
+		/* Fallback to O(N^2) algorithm if hash based variant didn't succeed. */
 		for (i = 0; i < sslot1->nvalues; i++)
 		{
 			int			j;
@@ -2790,6 +2982,7 @@ eqjoinsel_semi(Oid opfuncoid, Oid collation,
 				}
 			}
 		}
+		}
 		/* Sum up frequencies of matched MCVs */
 		matchfreq1 = 0.0;
 		for (i = 0; i < sslot1->nvalues; i++)
-- 
2.34.1

Reply via email to