"Robert Haas" <robertmh...@gmail.com> writes:
> Ah, that makes sense.  Here's a test case based on Greg's.  This is
> definitely more than linear once you get above about n = 80, but it's
> not quadratic either.  n = 1000 is only 43x n = 80, and while that's
> surely more than 1000/80 = 12.5, it's also a lot less than (1000/80)^2
> = 156.25.

Yeah, that's not too surprising.  There's a linear cost associated with
fetching/deconstructing the stats array, and a quadratic cost in the
comparison loop.  What these results say is that the upper limit of 1000
keeps you from getting into the range where the linear component is
completely negligible.  If you plot the results and try to fit a curve
like c1*x^2 + c2*x to them, you get a very good fit for all the points
above about 80.  Below that, the curve has a flatter slope, indicating
a smaller linear cost component.  The values in this test are about 100
bytes each, so the breakpoint at 80 seems to correspond to what happens
when the stats array no longer fits in one page.

I replicated your test and got timings like these, using CVS HEAD with
cassert off:

10      1.587
20      1.997
30      2.208
40      2.499
50      2.734
60      3.048
70      3.368
80      3.706
90      4.686
100     6.418
150     10.016
200     13.598
250     17.905
300     22.777
400     33.471
500     46.394
600     61.185
700     77.664
800     96.304
900     116.615
1000    140.117

So this machine is a bit slower than yours, but otherwise it's pretty
consistent with your numbers.  I then modified the test to use

array[random()::text,random()::text,random()::text,random()::text,random()::text,random()::text]

ie, the same data except stuffed into an array.  I did this because
I know that array_eq is pretty durn expensive, and indeed:

10      1.662
20      2.478
30      3.119
40      3.885
50      4.636
60      5.437
70      6.403
80      7.427
90      8.473
100     9.597
150     16.542
200     24.919
250     35.225
300     47.423
400     76.509
500     114.076
600     157.535
700     211.189
800     269.731
900     335.427
1000    409.638

When looking at these numbers one might think the threshold of pain
is about 50, rather than 100 which is where I'd put it for the text
example.  However, this is probably an extreme worst case.

On the whole I think we have some evidence here to say that upping the
default value of default_stats_target to 100 wouldn't be out of line,
but 1000 definitely is.  Comments?

BTW, does anyone have an opinion about changing the upper limit for
default_stats_target to, say, 10000?  These tests suggest that you
wouldn't want such a value for a column used as a join key, but
I can see a possible argument for high values in text search and
similar applications.

                        regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to