Re: qsort again (was Re: [PERFORM] Strange Create Index behaviour)
Last month I wrote: It seems clear that our qsort.c is doing a pretty awful job of picking qsort pivots, while glibc is mostly managing not to make that mistake. I re-ran Gary's test script using the just-committed improvements to qsort.c, and got pretty nice numbers (attached --- compare to http://archives.postgresql.org/pgsql-performance/2006-02/msg00227.php). So it was wrong to blame his problems on the pivot selection --- the culprit was that ill-considered switch to insertion sort. regards, tom lane 100 runtimes for latest port/qsort.c, sorted ascending: Time: 335.481 ms Time: 335.606 ms Time: 335.932 ms Time: 336.039 ms Time: 336.182 ms Time: 336.231 ms Time: 336.711 ms Time: 336.721 ms Time: 336.971 ms Time: 336.982 ms Time: 337.036 ms Time: 337.190 ms Time: 337.223 ms Time: 337.312 ms Time: 337.350 ms Time: 337.423 ms Time: 337.523 ms Time: 337.528 ms Time: 337.565 ms Time: 337.566 ms Time: 337.732 ms Time: 337.741 ms Time: 337.744 ms Time: 337.786 ms Time: 337.790 ms Time: 337.898 ms Time: 337.905 ms Time: 337.952 ms Time: 337.976 ms Time: 338.017 ms Time: 338.123 ms Time: 338.206 ms Time: 338.306 ms Time: 338.514 ms Time: 338.594 ms Time: 338.597 ms Time: 338.683 ms Time: 338.705 ms Time: 338.729 ms Time: 338.748 ms Time: 338.816 ms Time: 338.958 ms Time: 338.963 ms Time: 338.997 ms Time: 339.074 ms Time: 339.106 ms Time: 339.134 ms Time: 339.159 ms Time: 339.226 ms Time: 339.260 ms Time: 339.289 ms Time: 339.341 ms Time: 339.500 ms Time: 339.585 ms Time: 339.595 ms Time: 339.774 ms Time: 339.897 ms Time: 339.927 ms Time: 340.064 ms Time: 340.133 ms Time: 340.172 ms Time: 340.219 ms Time: 340.261 ms Time: 340.323 ms Time: 340.708 ms Time: 340.761 ms Time: 340.785 ms Time: 340.900 ms Time: 340.986 ms Time: 341.339 ms Time: 341.564 ms Time: 341.707 ms Time: 342.155 ms Time: 342.213 ms Time: 342.452 ms Time: 342.515 ms Time: 342.540 ms Time: 342.928 ms Time: 343.548 ms Time: 343.663 ms Time: 344.192 ms Time: 344.952 ms Time: 345.152 ms Time: 345.174 ms Time: 345.444 ms Time: 346.848 ms Time: 348.144 ms Time: 348.842 ms Time: 354.550 ms Time: 356.877 ms Time: 357.475 ms Time: 358.487 ms Time: 364.178 ms Time: 370.730 ms Time: 493.098 ms Time: 648.009 ms Time: 849.345 ms Time: 860.616 ms Time: 936.800 ms Time: 1727.085 ms ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index behaviour)
Added to TODO: * Improve port/qsort() to handle sorts with 50% unique and 50% duplicate value [qsort] This involves choosing better pivot points for the quicksort. --- Dann Corbit wrote: -Original Message- From: [EMAIL PROTECTED] [mailto:pgsql-hackers- [EMAIL PROTECTED] On Behalf Of Tom Lane Sent: Wednesday, February 15, 2006 5:22 PM To: Ron Cc: pgsql-performance@postgresql.org; pgsql-hackers@postgresql.org Subject: Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index behaviour) Ron [EMAIL PROTECTED] writes: How are we choosing our pivots? See qsort.c: it looks like median of nine equally spaced inputs (ie, the 1/8th points of the initial input array, plus the end points), implemented as two rounds of median-of-three choices. With half of the data inputs zero, it's not too improbable for two out of the three samples to be zeroes in which case I think the med3 result will be zero --- so choosing a pivot of zero is much more probable than one would like, and doing so in many levels of recursion causes the problem. Adding some randomness to the selection of the pivot is a known technique to fix the oddball partitions problem. However, Bentley and Sedgewick proved that every quick sort algorithm has some input set that makes it go quadratic (hence the recent popularity of introspective sort, which switches to heapsort if quadratic behavior is detected. The C++ template I submitted was an example of introspective sort, but PostgreSQL does not use C++ so it was not helpful). I think. I'm not too sure if the code isn't just being sloppy about the case where many data values are equal to the pivot --- there's a special case there to switch to insertion sort, and maybe that's getting invoked too soon. Here are some cases known to make qsort go quadratic: 1. Data already sorted 2. Data reverse sorted 3. Data organ-pipe sorted or ramp 4. Almost all data of the same value There are probably other cases. Randomizing the pivot helps some, as does check for in-order or reverse order partitions. Imagine if 1/3 of the partitions fall into a category that causes quadratic behavior (have one of the above formats and have more than CUTOFF elements in them). It is doubtful that the switch to insertion sort is causing any sort of problems. It is only going to be invoked on tiny sets, for which it has a fixed cost that is probably less that qsort() function calls on sets of the same size. It'd be useful to get a line-level profile of the behavior of this code in the slow cases... I guess that my in-order or presorted tests [which often arise when there are very few distinct values] may solve the bad partition problems. Don't forget that the algorithm is called recursively. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster -- Bruce Momjian http://candle.pha.pa.us SRA OSS, Inc. http://www.sraoss.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index behaviour)
Gary Doades [EMAIL PROTECTED] writes: I think the reason I wasn't seeing performance issues with normal sort operations is because they use work_mem not maintenance_work_mem which was only set to 2048 anyway. Does that sound right? Very probable. Do you want to test the theory by jacking that up? ;-) regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Strange Create Index behaviour
Tom Lane wrote: Interesting. I tried your test script and got fairly close times for all the cases on two different machines: old HPUX machine: shortest 5800 msec, longest 7960 msec new Fedora 4 machine: shortest 461 msec, longest 608 msec (the HPUX machine was doing other stuff at the same time, so some of its variation is probably only noise). So what this looks like to me is a corner case that FreeBSD's qsort fails to handle well. You might try forcing Postgres to use our private copy of qsort, as we do on Solaris for similar reasons. (The easy way to do this by hand is to configure as normal, then alter the LIBOBJS setting in src/Makefile.global to add qsort.o, then proceed with normal build.) However, I think that our private copy is descended from *BSD sources, so it might have the same failure mode. It'd be worth finding out. The final interesting thing is that as I increase shared buffers to 2000 or 3000 the problem gets *worse* shared_buffers is unlikely to impact index build time noticeably in recent PG releases. maintenance_work_mem would affect it a lot, though. What setting were you using for that? Can anyone else try these test cases on other platforms? Thanks for that. I've since tried it on Windows (pg 8.1.2) and the times were all similar, around 1200ms so it might just be BSD. I'll have to wait until tomorrow to get back to my BSD box. FreeBSD ports makes it easy to install, so I'll have to figure out how to get in and change things manually. I guess the appropriate files are still left around after the ports make command finishes, so I just edit the file and make again? If it can't be fixed though I guess we may have a problem using BSD. I'm surprised this hasn't been brought up before, the case doesn't seem *that* rare. Maybe not that many using FreeBSD? I'd certainly be interested if anyone else can repro it on FreeBSD though. Regards, Gary. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Strange Create Index behaviour
Tom Lane wrote: shared_buffers is unlikely to impact index build time noticeably in recent PG releases. maintenance_work_mem would affect it a lot, though. What setting were you using for that? Also, i tried upping maintenance_work_mem to 65536 and it didn't make much difference (maybe 10% faster for the normal cases). Upping the shared_buffers *definitely* makes the bad cases worse though, but I agree I don't see why... Regards, Gary. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Strange Create Index behaviour
On Wed, 2006-02-15 at 20:00 +, Gary Doades wrote: I have put together a test case Please enable trace_sort=on and then repeat tests and post the accompanying log file. I think this is simply the sort taking longer depending upon the data distribution, but I'd like to know for sure. Thanks, Best Regards, Simon Riggs ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Strange Create Index behaviour
I wrote: Interesting. I tried your test script and got fairly close times for all the cases on two different machines: old HPUX machine: shortest 5800 msec, longest 7960 msec new Fedora 4 machine: shortest 461 msec, longest 608 msec So what this looks like to me is a corner case that FreeBSD's qsort fails to handle well. I tried forcing PG to use src/port/qsort.c on the Fedora machine, and lo and behold: new Fedora 4 machine: shortest 434 msec, longest 8530 msec So it sure looks like this script does expose a problem on BSD-derived qsorts. Curiously, the case that's much the worst for me is the third in the script, while the shortest time is the first case, which was slow for Gary. So I'd venture that the *BSD code has been tweaked somewhere along the way, in a manner that moves the problem around without really fixing it. (Anyone want to compare the actual FreeBSD source to what we have?) This is pretty relevant stuff, because there was a thread recently advocating that we stop using the platform qsort on all platforms: http://archives.postgresql.org/pgsql-hackers/2005-12/msg00610.php It's really interesting to see a case where port/qsort is radically worse than other qsorts ... unless we figure that out and fix it, I think the idea of using port/qsort everywhere has just taken a major hit. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Strange Create Index behaviour
Tom Lane wrote: I tried forcing PG to use src/port/qsort.c on the Fedora machine, and lo and behold: new Fedora 4 machine: shortest 434 msec, longest 8530 msec So it sure looks like this script does expose a problem on BSD-derived qsorts. Curiously, the case that's much the worst for me is the third in the script, while the shortest time is the first case, which was slow for Gary. So I'd venture that the *BSD code has been tweaked somewhere along the way, in a manner that moves the problem around without really fixing it. (Anyone want to compare the actual FreeBSD source to what we have?) If I run the script again, it is not always the first case that is slow, it varies from run to run, which is why I repeated it quite a few times for the test. Interestingly, if I don't delete the table after a run, but just drop and re-create the index repeatedly it stays a pretty consistent time, either repeatedly good or repeatedly bad! Regards, Gary. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Strange Create Index behaviour
Simon Riggs [EMAIL PROTECTED] writes: Please enable trace_sort=on and then repeat tests and post the accompanying log file. I did this on my Fedora machine with port/qsort.c, and got the results attached. Curiously, this run has the spikes in completely different places than the prior one did. So the random component of the test data is affecting the results quite a lot. There seems absolutely no doubt that we are looking at data-dependent qsort misbehavior, though. The CPU time eaten by performsort accounts for all but about 100 msec of the elapsed time reported on the psql side. regards, tom lane LOG: begin index sort: unique = f, workMem = 16384, randomAccess = f LOG: performsort starting: CPU 0.00s/0.15u sec elapsed 0.15 sec LOG: performsort done: CPU 0.00s/12.43u sec elapsed 12.44 sec LOG: internal sort ended, 9861 KB used: CPU 0.01s/12.51u sec elapsed 12.52 sec LOG: begin index sort: unique = f, workMem = 16384, randomAccess = f LOG: performsort starting: CPU 0.00s/0.14u sec elapsed 0.15 sec LOG: performsort done: CPU 0.00s/0.78u sec elapsed 0.78 sec LOG: internal sort ended, 9861 KB used: CPU 0.02s/0.85u sec elapsed 0.87 sec LOG: begin index sort: unique = f, workMem = 16384, randomAccess = f LOG: performsort starting: CPU 0.01s/0.14u sec elapsed 0.15 sec LOG: performsort done: CPU 0.01s/0.96u sec elapsed 0.97 sec LOG: internal sort ended, 9861 KB used: CPU 0.02s/1.03u sec elapsed 1.06 sec LOG: begin index sort: unique = f, workMem = 16384, randomAccess = f LOG: performsort starting: CPU 0.00s/0.14u sec elapsed 0.15 sec LOG: performsort done: CPU 0.00s/0.31u sec elapsed 0.32 sec LOG: internal sort ended, 9861 KB used: CPU 0.02s/0.38u sec elapsed 0.40 sec LOG: begin index sort: unique = f, workMem = 16384, randomAccess = f LOG: performsort starting: CPU 0.00s/0.14u sec elapsed 0.15 sec LOG: performsort done: CPU 0.00s/7.91u sec elapsed 7.92 sec LOG: internal sort ended, 9861 KB used: CPU 0.02s/7.99u sec elapsed 8.01 sec LOG: begin index sort: unique = f, workMem = 16384, randomAccess = f LOG: performsort starting: CPU 0.01s/0.13u sec elapsed 0.15 sec LOG: performsort done: CPU 0.01s/0.61u sec elapsed 0.63 sec LOG: internal sort ended, 9861 KB used: CPU 0.04s/0.67u sec elapsed 0.71 sec LOG: begin index sort: unique = f, workMem = 16384, randomAccess = f LOG: performsort starting: CPU 0.01s/0.13u sec elapsed 0.15 sec LOG: performsort done: CPU 0.01s/11.52u sec elapsed 11.54 sec LOG: internal sort ended, 9861 KB used: CPU 0.03s/11.59u sec elapsed 11.62 sec LOG: begin index sort: unique = f, workMem = 16384, randomAccess = f LOG: performsort starting: CPU 0.00s/0.14u sec elapsed 0.15 sec LOG: performsort done: CPU 0.00s/0.45u sec elapsed 0.46 sec LOG: internal sort ended, 9861 KB used: CPU 0.02s/0.55u sec elapsed 0.57 sec LOG: begin index sort: unique = f, workMem = 16384, randomAccess = f LOG: performsort starting: CPU 0.00s/0.14u sec elapsed 0.15 sec LOG: performsort done: CPU 0.00s/0.45u sec elapsed 0.46 sec LOG: internal sort ended, 9861 KB used: CPU 0.04s/0.54u sec elapsed 0.57 sec LOG: begin index sort: unique = f, workMem = 16384, randomAccess = f LOG: performsort starting: CPU 0.02s/0.12u sec elapsed 0.15 sec LOG: performsort done: CPU 0.02s/0.44u sec elapsed 0.46 sec LOG: internal sort ended, 9861 KB used: CPU 0.03s/0.55u sec elapsed 0.58 sec LOG: begin index sort: unique = f, workMem = 16384, randomAccess = f LOG: performsort starting: CPU 0.02s/0.13u sec elapsed 0.15 sec LOG: performsort done: CPU 0.02s/0.44u sec elapsed 0.46 sec LOG: internal sort ended, 9861 KB used: CPU 0.03s/0.54u sec elapsed 0.58 sec LOG: begin index sort: unique = f, workMem = 16384, randomAccess = f LOG: performsort starting: CPU 0.02s/0.13u sec elapsed 0.15 sec LOG: performsort done: CPU 0.02s/0.44u sec elapsed 0.46 sec LOG: internal sort ended, 9861 KB used: CPU 0.04s/0.54u sec elapsed 0.59 sec ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Strange Create Index behaviour
Gary Doades [EMAIL PROTECTED] writes: Interestingly, if I don't delete the table after a run, but just drop and re-create the index repeatedly it stays a pretty consistent time, either repeatedly good or repeatedly bad! This is consistent with the theory of a data-dependent performance problem in qsort. If you don't generate a fresh set of random test data, then you get repeatable runtimes. With a new set of test data, you might or might not hit the not-so-sweet-spot that we seem to have detected. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Strange Create Index behaviour
Tom Lane wrote: So it sure looks like this script does expose a problem on BSD-derived qsorts. Curiously, the case that's much the worst for me is the third in the script, while the shortest time is the first case, which was slow for Gary. So I'd venture that the *BSD code has been tweaked somewhere along the way, in a manner that moves the problem around without really fixing it. (Anyone want to compare the actual FreeBSD source to what we have?) It's really interesting to see a case where port/qsort is radically worse than other qsorts ... unless we figure that out and fix it, I think the idea of using port/qsort everywhere has just taken a major hit. More specifically to BSD, is there any way I can use a non-BSD qsort for building Postresql server? Regards, Gary. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
qsort again (was Re: [PERFORM] Strange Create Index behaviour)
Gary Doades [EMAIL PROTECTED] writes: If I run the script again, it is not always the first case that is slow, it varies from run to run, which is why I repeated it quite a few times for the test. For some reason I hadn't immediately twigged to the fact that your test script is just N repetitions of the exact same structure with random data. So it's not so surprising that you get random variations in behavior with different test data sets. I did some experimentation comparing the qsort from Fedora Core 4 (glibc-2.3.5-10.3) with our src/port/qsort.c. For those who weren't following the pgsql-performance thread, the test case is just this repeated a lot of times: create table atest(i int4, r int4); insert into atest (i,r) select generate_series(1,10), 0; insert into atest (i,r) select generate_series(1,10), random()*10; \timing create index idx on atest(r); \timing drop table atest; I did this 100 times and sorted the reported runtimes. (Investigation with trace_sort = on confirms that the runtime is almost entirely spent in qsort() called from our performsort --- the Postgres overhead is about 100msec on this machine.) Results are below. It seems clear that our qsort.c is doing a pretty awful job of picking qsort pivots, while glibc is mostly managing not to make that mistake. I haven't looked at the glibc code yet to see what they are doing differently. I'd say this puts a considerable damper on my enthusiasm for using our qsort all the time, as was recently debated in this thread: http://archives.postgresql.org/pgsql-hackers/2005-12/msg00610.php We need to fix our qsort.c before pushing ahead with that idea. regards, tom lane 100 runtimes for glibc qsort, sorted ascending: Time: 459.860 ms Time: 460.209 ms Time: 460.704 ms Time: 461.317 ms Time: 461.538 ms Time: 461.652 ms Time: 461.988 ms Time: 462.573 ms Time: 462.638 ms Time: 462.716 ms Time: 462.917 ms Time: 463.219 ms Time: 463.455 ms Time: 463.650 ms Time: 463.723 ms Time: 463.737 ms Time: 463.750 ms Time: 463.852 ms Time: 463.964 ms Time: 463.988 ms Time: 464.003 ms Time: 464.135 ms Time: 464.372 ms Time: 464.458 ms Time: 464.496 ms Time: 464.551 ms Time: 464.599 ms Time: 464.655 ms Time: 464.656 ms Time: 464.722 ms Time: 464.814 ms Time: 464.827 ms Time: 464.878 ms Time: 464.899 ms Time: 464.905 ms Time: 464.987 ms Time: 465.055 ms Time: 465.138 ms Time: 465.159 ms Time: 465.194 ms Time: 465.310 ms Time: 465.316 ms Time: 465.375 ms Time: 465.450 ms Time: 465.535 ms Time: 465.595 ms Time: 465.680 ms Time: 465.769 ms Time: 465.865 ms Time: 465.892 ms Time: 465.903 ms Time: 466.003 ms Time: 466.154 ms Time: 466.164 ms Time: 466.203 ms Time: 466.305 ms Time: 466.344 ms Time: 466.364 ms Time: 466.388 ms Time: 466.502 ms Time: 466.593 ms Time: 466.725 ms Time: 466.794 ms Time: 466.798 ms Time: 466.904 ms Time: 466.971 ms Time: 466.997 ms Time: 467.122 ms Time: 467.146 ms Time: 467.221 ms Time: 467.224 ms Time: 467.244 ms Time: 467.277 ms Time: 467.587 ms Time: 468.142 ms Time: 468.207 ms Time: 468.237 ms Time: 468.471 ms Time: 468.663 ms Time: 468.700 ms Time: 469.235 ms Time: 469.840 ms Time: 470.472 ms Time: 471.140 ms Time: 472.811 ms Time: 472.959 ms Time: 474.858 ms Time: 477.210 ms Time: 479.571 ms Time: 479.671 ms Time: 482.797 ms Time: 488.852 ms Time: 514.639 ms Time: 529.287 ms Time: 612.185 ms Time: 660.748 ms Time: 742.227 ms Time: 866.814 ms Time: 1234.848 ms Time: 1267.398 ms 100 runtimes for port/qsort.c, sorted ascending: Time: 418.905 ms Time: 420.611 ms Time: 420.764 ms Time: 420.904 ms Time: 421.706 ms Time: 422.466 ms Time: 422.627 ms Time: 423.189 ms Time: 423.302 ms Time: 425.096 ms Time: 425.731 ms Time: 425.851 ms Time: 427.253 ms Time: 430.113 ms Time: 432.756 ms Time: 432.963 ms Time: 440.502 ms Time: 440.640 ms Time: 450.452 ms Time: 458.143 ms Time: 459.212 ms Time: 467.706 ms Time: 468.006 ms Time: 468.574 ms Time: 470.003 ms Time: 472.313 ms Time: 483.622 ms Time: 492.395 ms Time: 509.564 ms Time: 531.037 ms Time: 533.366 ms Time: 535.610 ms Time: 575.523 ms Time: 582.688 ms Time: 593.545 ms Time: 647.364 ms Time: 660.612 ms Time: 677.312 ms Time: 680.288 ms Time: 697.626 ms Time: 833.066 ms Time: 834.511 ms Time: 851.819 ms Time: 920.443 ms Time: 926.731 ms Time: 954.289 ms Time: 1045.214 ms Time: 1059.200 ms Time: 1062.328 ms Time: 1136.018 ms Time: 1260.091 ms Time: 1276.883 ms Time: 1319.351 ms Time: 1438.854 ms Time: 1475.457 ms Time: 1538.211 ms Time: 1549.004 ms Time: 1744.642 ms Time: 1771.258 ms Time: 1959.530 ms Time: 2300.140 ms Time: 2589.641 ms Time: 2612.780 ms Time: 3100.024 ms Time: 3284.125 ms Time: 3379.792 ms Time: 3750.278 ms Time: 4302.278 ms Time: 4780.624 ms Time: 5000.056 ms Time: 5092.604 ms Time: 5168.722 ms Time: 5292.941 ms Time: 5895.964 ms Time: 7003.164 ms Time: 7099.449 ms Time: 7115.083 ms Time: 7384.940 ms Time: 8214.010 ms Time: 8700.771 ms Time: 9331.225 ms Time: 10503.360 ms Time: 12496.026 ms Time: 12982.474 ms Time: 15192.390 ms
Re: qsort again (was Re: [PERFORM] Strange Create Index behaviour)
Gary Doades [EMAIL PROTECTED] writes: Is this likely to hit me in a random fashion during normal operation, joins, sorts, order by for example? Yup, anytime you're passing data with that kind of distribution through a sort. So the options are: 1) Fix the included qsort.c code and use that 2) Get FreeBSD to fix their qsort code 3) Both I guess that 1 is the real solution in case anyone else's qsort is broken in the same way. Then at least you *could* use it all the time :) It's reasonable to assume that most of the *BSDen have basically the same qsort code. Ours claims to have come from NetBSD sources, but I don't doubt that they all trace back to a common ancestor. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index behaviour)
I wrote: Gary Doades [EMAIL PROTECTED] writes: Ouch! That confirms my problem. I generated the random test case because it was easier than including the dump of my tables, but you can appreciate that tables 20 times the size are basically crippled when it comes to creating an index on them. Actually... we only use qsort when we have a sorting problem that fits within the allowed sort memory. The external-sort logic doesn't go through that code at all. So all the analysis we just did on your test case doesn't necessarily apply to sort problems that are too large for the sort_mem setting. I increased the size of the test case by 10x (basically s/10/100/) which is enough to push it into the external-sort regime. I get amazingly stable runtimes now --- I didn't have the patience to run 100 trials, but in 30 trials I have slowest 11538 msec and fastest 11144 msec. So this code path is definitely not very sensitive to this data distribution. While these numbers aren't glittering in comparison to the best-case qsort times (~450 msec to sort 10% as much data), they are sure a lot better than the worst-case times. So maybe a workaround for you is to decrease maintenance_work_mem, counterintuitive though that be. (Now, if you *weren't* using maintenance_work_mem of 100MB or more for your problem restore, then I'm not sure I know what's going on...) We still ought to try to fix qsort of course. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: qsort again (was Re: [PERFORM] Strange Create Index behaviour)
Ron [EMAIL PROTECTED] writes: How are we choosing our pivots? See qsort.c: it looks like median of nine equally spaced inputs (ie, the 1/8th points of the initial input array, plus the end points), implemented as two rounds of median-of-three choices. With half of the data inputs zero, it's not too improbable for two out of the three samples to be zeroes in which case I think the med3 result will be zero --- so choosing a pivot of zero is much more probable than one would like, and doing so in many levels of recursion causes the problem. I think. I'm not too sure if the code isn't just being sloppy about the case where many data values are equal to the pivot --- there's a special case there to switch to insertion sort, and maybe that's getting invoked too soon. It'd be useful to get a line-level profile of the behavior of this code in the slow cases... regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Strange Create Index behaviour
On Wed, 2006-02-15 at 20:00 +, Gary Doades wrote: I have put together a test case that demonstrates the problem (see below). I create a simple table, as close in structure to one of my problem tables and populate an integer column with 100,000 zeros follow by 100,000 random integers between 0 and 100,000. Then create an index on this column. I then drop the table and repeat. The create index should take around 1-2 seconds. A fair proportion of the time it takes 50 seconds!!! If I fill the same row with all random data the create index always takes a second or two. If I fill the column with all zeros everything is still OK. Aside from the importance of investigating sort behaviour, have you tried to build a partial index WHERE col 0 ? That way you wouldn't even be indexing the zeros. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: qsort again (was Re: [PERFORM] Strange Create Index behaviour)
Ouch! That confirms my problem. I generated the random test case because it was easier than including the dump of my tables, but you can appreciate that tables 20 times the size are basically crippled when it comes to creating an index on them. I have to say that I restored a few gigabyte dump on freebsd the other day, and most of the restore time was in index creation - I didn't think too much of it though at the time. FreeBSD 4.x. Chris ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq