> Actually, there's a second large problem with this patch: blindly > iterating through all combinations of MCV and histogram entries makes the > runtime O(N^2) in the statistics target. I made up some test data (by > scanning my mail logs) and observed the following planning times, as > reported by EXPLAIN ANALYZE: > > explain analyze select * from relays r1, relays r2 where r1.ip = r2.ip; > explain analyze select * from relays r1, relays r2 where r1.ip && r2.ip; > > stats target eqjoinsel networkjoinsel > > 100 0.27 ms 1.85 ms > 1000 2.56 ms 167.2 ms > 10000 56.6 ms 13987.1 ms > > I don't think it's necessary for network selectivity to be quite as > fast as eqjoinsel, but I doubt we can tolerate 14 seconds planning > time for a query that might need just milliseconds to execute :-( > > It seemed to me that it might be possible to reduce the runtime by > exploiting knowledge about the ordering of the histograms, but > I don't have time to pursue that right now.
I make it break the loop when we passed the last possible match. Patch attached. It reduces the runtime almost 50% with large histograms. We can also make it use only some elements of the MCV and histogram for join estimation. I have experimented with reducing the right and the left hand side of the lists on the previous versions. I remember it was better to reduce only the left hand side. I think it would be enough to use log(n) elements of the right hand side MCV and histogram. I can make the change, if you think selectivity estimation function is the right place for this optimization.
diff --git a/src/backend/utils/adt/network_selfuncs.c b/src/backend/utils/adt/network_selfuncs.c index f854847..16f39db 100644 --- a/src/backend/utils/adt/network_selfuncs.c +++ b/src/backend/utils/adt/network_selfuncs.c @@ -612,20 +612,23 @@ inet_hist_value_sel(Datum *values, int nvalues, Datum constvalue, return 0.0; query = DatumGetInetPP(constvalue); /* "left" is the left boundary value of the current bucket ... */ left = DatumGetInetPP(values[0]); left_order = inet_inclusion_cmp(left, query, opr_codenum); for (i = 1; i < nvalues; i++) { + if (left_order == 256) + break; + /* ... and "right" is the right boundary value */ right = DatumGetInetPP(values[i]); right_order = inet_inclusion_cmp(right, query, opr_codenum); if (left_order == 0 && right_order == 0) { /* The whole bucket matches, since both endpoints do. */ match += 1.0; } else if ((left_order <= 0 && right_order >= 0) || @@ -854,20 +857,23 @@ inet_opr_codenum(Oid operator) static int inet_inclusion_cmp(inet *left, inet *right, int opr_codenum) { if (ip_family(left) == ip_family(right)) { int order; order = bitncmp(ip_addr(left), ip_addr(right), Min(ip_bits(left), ip_bits(right))); + if (order > 0) + return 256; + if (order != 0) return order; return inet_masklen_inclusion_cmp(left, right, opr_codenum); } return ip_family(left) - ip_family(right); } /*
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers