> Actually, there's a second large problem with this patch: blindly
> iterating through all combinations of MCV and histogram entries makes the
> runtime O(N^2) in the statistics target.  I made up some test data (by
> scanning my mail logs) and observed the following planning times, as
> reported by EXPLAIN ANALYZE:
>
> explain analyze select * from relays r1, relays r2 where r1.ip = r2.ip;
> explain analyze select * from relays r1, relays r2 where r1.ip && r2.ip;
>
> stats target            eqjoinsel       networkjoinsel
>
> 100                     0.27 ms         1.85 ms
> 1000                    2.56 ms         167.2 ms
> 10000                   56.6 ms         13987.1 ms
>
> I don't think it's necessary for network selectivity to be quite as
> fast as eqjoinsel, but I doubt we can tolerate 14 seconds planning
> time for a query that might need just milliseconds to execute :-(
>
> It seemed to me that it might be possible to reduce the runtime by
> exploiting knowledge about the ordering of the histograms, but
> I don't have time to pursue that right now.

I make it break the loop when we passed the last possible match. Patch
attached.  It reduces the runtime almost 50% with large histograms.

We can also make it use only some elements of the MCV and histogram
for join estimation.  I have experimented with reducing the right and
the left hand side of the lists on the previous versions.  I remember
it was better to reduce only the left hand side.  I think it would be
enough to use log(n) elements of the right hand side MCV and histogram.
I can make the change, if you think selectivity estimation function
is the right place for this optimization.
diff --git a/src/backend/utils/adt/network_selfuncs.c b/src/backend/utils/adt/network_selfuncs.c
index f854847..16f39db 100644
--- a/src/backend/utils/adt/network_selfuncs.c
+++ b/src/backend/utils/adt/network_selfuncs.c
@@ -612,20 +612,23 @@ inet_hist_value_sel(Datum *values, int nvalues, Datum constvalue,
 		return 0.0;
 
 	query = DatumGetInetPP(constvalue);
 
 	/* "left" is the left boundary value of the current bucket ... */
 	left = DatumGetInetPP(values[0]);
 	left_order = inet_inclusion_cmp(left, query, opr_codenum);
 
 	for (i = 1; i < nvalues; i++)
 	{
+		if (left_order == 256)
+			break;
+
 		/* ... and "right" is the right boundary value */
 		right = DatumGetInetPP(values[i]);
 		right_order = inet_inclusion_cmp(right, query, opr_codenum);
 
 		if (left_order == 0 && right_order == 0)
 		{
 			/* The whole bucket matches, since both endpoints do. */
 			match += 1.0;
 		}
 		else if ((left_order <= 0 && right_order >= 0) ||
@@ -854,20 +857,23 @@ inet_opr_codenum(Oid operator)
 static int
 inet_inclusion_cmp(inet *left, inet *right, int opr_codenum)
 {
 	if (ip_family(left) == ip_family(right))
 	{
 		int			order;
 
 		order = bitncmp(ip_addr(left), ip_addr(right),
 						Min(ip_bits(left), ip_bits(right)));
 
+		if (order > 0)
+			return 256;
+
 		if (order != 0)
 			return order;
 
 		return inet_masklen_inclusion_cmp(left, right, opr_codenum);
 	}
 
 	return ip_family(left) - ip_family(right);
 }
 
 /*
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to