On Wed, Dec 3, 2014 at 6:14 AM, Emre Hasegeli <e...@hasegeli.com> wrote: >> Actually, there's a second large problem with this patch: blindly >> iterating through all combinations of MCV and histogram entries makes the >> runtime O(N^2) in the statistics target. I made up some test data (by >> scanning my mail logs) and observed the following planning times, as >> reported by EXPLAIN ANALYZE: >> >> explain analyze select * from relays r1, relays r2 where r1.ip = r2.ip; >> explain analyze select * from relays r1, relays r2 where r1.ip && r2.ip; >> >> stats target eqjoinsel networkjoinsel >> >> 100 0.27 ms 1.85 ms >> 1000 2.56 ms 167.2 ms >> 10000 56.6 ms 13987.1 ms >> >> I don't think it's necessary for network selectivity to be quite as >> fast as eqjoinsel, but I doubt we can tolerate 14 seconds planning >> time for a query that might need just milliseconds to execute :-( >> >> It seemed to me that it might be possible to reduce the runtime by >> exploiting knowledge about the ordering of the histograms, but >> I don't have time to pursue that right now. > > I make it break the loop when we passed the last possible match. Patch > attached. It reduces the runtime almost 50% with large histograms. > > We can also make it use only some elements of the MCV and histogram > for join estimation. I have experimented with reducing the right and > the left hand side of the lists on the previous versions. I remember > it was better to reduce only the left hand side. I think it would be > enough to use log(n) elements of the right hand side MCV and histogram. > I can make the change, if you think selectivity estimation function > is the right place for this optimization. Marking as "Returned with feedback" as more work needs to be done. -- Michael
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers