"Robert Haas" <robertmh...@gmail.com> writes: > Ah, that makes sense. Here's a test case based on Greg's. This is > definitely more than linear once you get above about n = 80, but it's > not quadratic either. n = 1000 is only 43x n = 80, and while that's > surely more than 1000/80 = 12.5, it's also a lot less than (1000/80)^2 > = 156.25.
Yeah, that's not too surprising. There's a linear cost associated with fetching/deconstructing the stats array, and a quadratic cost in the comparison loop. What these results say is that the upper limit of 1000 keeps you from getting into the range where the linear component is completely negligible. If you plot the results and try to fit a curve like c1*x^2 + c2*x to them, you get a very good fit for all the points above about 80. Below that, the curve has a flatter slope, indicating a smaller linear cost component. The values in this test are about 100 bytes each, so the breakpoint at 80 seems to correspond to what happens when the stats array no longer fits in one page. I replicated your test and got timings like these, using CVS HEAD with cassert off: 10 1.587 20 1.997 30 2.208 40 2.499 50 2.734 60 3.048 70 3.368 80 3.706 90 4.686 100 6.418 150 10.016 200 13.598 250 17.905 300 22.777 400 33.471 500 46.394 600 61.185 700 77.664 800 96.304 900 116.615 1000 140.117 So this machine is a bit slower than yours, but otherwise it's pretty consistent with your numbers. I then modified the test to use array[random()::text,random()::text,random()::text,random()::text,random()::text,random()::text] ie, the same data except stuffed into an array. I did this because I know that array_eq is pretty durn expensive, and indeed: 10 1.662 20 2.478 30 3.119 40 3.885 50 4.636 60 5.437 70 6.403 80 7.427 90 8.473 100 9.597 150 16.542 200 24.919 250 35.225 300 47.423 400 76.509 500 114.076 600 157.535 700 211.189 800 269.731 900 335.427 1000 409.638 When looking at these numbers one might think the threshold of pain is about 50, rather than 100 which is where I'd put it for the text example. However, this is probably an extreme worst case. On the whole I think we have some evidence here to say that upping the default value of default_stats_target to 100 wouldn't be out of line, but 1000 definitely is. Comments? BTW, does anyone have an opinion about changing the upper limit for default_stats_target to, say, 10000? These tests suggest that you wouldn't want such a value for a column used as a join key, but I can see a possible argument for high values in text search and similar applications. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers