Last month I wrote:
It seems clear that our qsort.c is doing a pretty awful job of picking
qsort pivots, while glibc is mostly managing not to make that mistake.
I re-ran Gary's test script using the just-committed improvements to
qsort.c, and got pretty nice numbers (attached --- compare to
] Strange Create
Index
behaviour)
Ron [EMAIL PROTECTED] writes:
How are we choosing our pivots?
See qsort.c: it looks like median of nine equally spaced inputs (ie,
the 1/8th points of the initial input array, plus the end points),
implemented as two rounds of median-of-three
Gary Doades [EMAIL PROTECTED] writes:
I think the reason I wasn't seeing performance issues with normal sort
operations is because they use work_mem not maintenance_work_mem which was
only set to 2048 anyway. Does that sound right?
Very probable. Do you want to test the theory by jacking that
Tom Lane wrote:
Interesting. I tried your test script and got fairly close times
for all the cases on two different machines:
old HPUX machine: shortest 5800 msec, longest 7960 msec
new Fedora 4 machine: shortest 461 msec, longest 608 msec
(the HPUX machine was doing other stuff
Tom Lane wrote:
shared_buffers is unlikely to impact index build time noticeably in
recent PG releases. maintenance_work_mem would affect it a lot, though.
What setting were you using for that?
Also, i tried upping maintenance_work_mem to 65536 and it didn't make
much difference (maybe 10%
On Wed, 2006-02-15 at 20:00 +, Gary Doades wrote:
I have put together a test case
Please enable trace_sort=on and then repeat tests and post the
accompanying log file.
I think this is simply the sort taking longer depending upon the data
distribution, but I'd like to know for sure.
I wrote:
Interesting. I tried your test script and got fairly close times
for all the cases on two different machines:
old HPUX machine: shortest 5800 msec, longest 7960 msec
new Fedora 4 machine: shortest 461 msec, longest 608 msec
So what this looks like to me is a corner case
Tom Lane wrote:
I tried forcing PG to use src/port/qsort.c on the Fedora machine,
and lo and behold:
new Fedora 4 machine: shortest 434 msec, longest 8530 msec
So it sure looks like this script does expose a problem on BSD-derived
qsorts. Curiously, the case that's much the worst for
Simon Riggs [EMAIL PROTECTED] writes:
Please enable trace_sort=on and then repeat tests and post the
accompanying log file.
I did this on my Fedora machine with port/qsort.c, and got the results
attached. Curiously, this run has the spikes in completely different
places than the prior one did.
Gary Doades [EMAIL PROTECTED] writes:
Interestingly, if I don't delete the table after a run, but just drop
and re-create the index repeatedly it stays a pretty consistent time,
either repeatedly good or repeatedly bad!
This is consistent with the theory of a data-dependent performance
Tom Lane wrote:
So it sure looks like this script does expose a problem on BSD-derived
qsorts. Curiously, the case that's much the worst for me is the third
in the script, while the shortest time is the first case, which was slow
for Gary. So I'd venture that the *BSD code has been tweaked
Gary Doades [EMAIL PROTECTED] writes:
If I run the script again, it is not always the first case that is slow,
it varies from run to run, which is why I repeated it quite a few times
for the test.
For some reason I hadn't immediately twigged to the fact that your test
script is just N
Gary Doades [EMAIL PROTECTED] writes:
Is this likely to hit me in a random fashion during normal operation,
joins, sorts, order by for example?
Yup, anytime you're passing data with that kind of distribution
through a sort.
So the options are:
1) Fix the included qsort.c code and use that
I wrote:
Gary Doades [EMAIL PROTECTED] writes:
Ouch! That confirms my problem. I generated the random test case because
it was easier than including the dump of my tables, but you can
appreciate that tables 20 times the size are basically crippled when it
comes to creating an index on
Ron [EMAIL PROTECTED] writes:
How are we choosing our pivots?
See qsort.c: it looks like median of nine equally spaced inputs (ie,
the 1/8th points of the initial input array, plus the end points),
implemented as two rounds of median-of-three choices. With half of the
data inputs zero, it's not
On Wed, 2006-02-15 at 20:00 +, Gary Doades wrote:
I have put together a test case that demonstrates the problem (see
below). I create a simple table, as close in structure to one of my
problem tables and populate an integer column with 100,000 zeros follow
by 100,000 random integers
Ouch! That confirms my problem. I generated the random test case because
it was easier than including the dump of my tables, but you can
appreciate that tables 20 times the size are basically crippled when it
comes to creating an index on them.
I have to say that I restored a few gigabyte
17 matches
Mail list logo