Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index behaviour)

2006-03-21 Thread Tom Lane
Last month I wrote:
 It seems clear that our qsort.c is doing a pretty awful job of picking
 qsort pivots, while glibc is mostly managing not to make that mistake.

I re-ran Gary's test script using the just-committed improvements to
qsort.c, and got pretty nice numbers (attached --- compare to
http://archives.postgresql.org/pgsql-performance/2006-02/msg00227.php).
So it was wrong to blame his problems on the pivot selection --- the
culprit was that ill-considered switch to insertion sort.

regards, tom lane

100 runtimes for latest port/qsort.c, sorted ascending:

Time: 335.481 ms
Time: 335.606 ms
Time: 335.932 ms
Time: 336.039 ms
Time: 336.182 ms
Time: 336.231 ms
Time: 336.711 ms
Time: 336.721 ms
Time: 336.971 ms
Time: 336.982 ms
Time: 337.036 ms
Time: 337.190 ms
Time: 337.223 ms
Time: 337.312 ms
Time: 337.350 ms
Time: 337.423 ms
Time: 337.523 ms
Time: 337.528 ms
Time: 337.565 ms
Time: 337.566 ms
Time: 337.732 ms
Time: 337.741 ms
Time: 337.744 ms
Time: 337.786 ms
Time: 337.790 ms
Time: 337.898 ms
Time: 337.905 ms
Time: 337.952 ms
Time: 337.976 ms
Time: 338.017 ms
Time: 338.123 ms
Time: 338.206 ms
Time: 338.306 ms
Time: 338.514 ms
Time: 338.594 ms
Time: 338.597 ms
Time: 338.683 ms
Time: 338.705 ms
Time: 338.729 ms
Time: 338.748 ms
Time: 338.816 ms
Time: 338.958 ms
Time: 338.963 ms
Time: 338.997 ms
Time: 339.074 ms
Time: 339.106 ms
Time: 339.134 ms
Time: 339.159 ms
Time: 339.226 ms
Time: 339.260 ms
Time: 339.289 ms
Time: 339.341 ms
Time: 339.500 ms
Time: 339.585 ms
Time: 339.595 ms
Time: 339.774 ms
Time: 339.897 ms
Time: 339.927 ms
Time: 340.064 ms
Time: 340.133 ms
Time: 340.172 ms
Time: 340.219 ms
Time: 340.261 ms
Time: 340.323 ms
Time: 340.708 ms
Time: 340.761 ms
Time: 340.785 ms
Time: 340.900 ms
Time: 340.986 ms
Time: 341.339 ms
Time: 341.564 ms
Time: 341.707 ms
Time: 342.155 ms
Time: 342.213 ms
Time: 342.452 ms
Time: 342.515 ms
Time: 342.540 ms
Time: 342.928 ms
Time: 343.548 ms
Time: 343.663 ms
Time: 344.192 ms
Time: 344.952 ms
Time: 345.152 ms
Time: 345.174 ms
Time: 345.444 ms
Time: 346.848 ms
Time: 348.144 ms
Time: 348.842 ms
Time: 354.550 ms
Time: 356.877 ms
Time: 357.475 ms
Time: 358.487 ms
Time: 364.178 ms
Time: 370.730 ms
Time: 493.098 ms
Time: 648.009 ms
Time: 849.345 ms
Time: 860.616 ms
Time: 936.800 ms
Time: 1727.085 ms

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index behaviour)

2006-03-02 Thread Bruce Momjian

Added to TODO:

* Improve port/qsort() to handle sorts with 50% unique and 50% duplicate
  value [qsort]

  This involves choosing better pivot points for the quicksort.


---

Dann Corbit wrote:
 
 
  -Original Message-
  From: [EMAIL PROTECTED] [mailto:pgsql-hackers-
  [EMAIL PROTECTED] On Behalf Of Tom Lane
  Sent: Wednesday, February 15, 2006 5:22 PM
  To: Ron
  Cc: pgsql-performance@postgresql.org; pgsql-hackers@postgresql.org
  Subject: Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create
 Index
  behaviour)
  
  Ron [EMAIL PROTECTED] writes:
   How are we choosing our pivots?
  
  See qsort.c: it looks like median of nine equally spaced inputs (ie,
  the 1/8th points of the initial input array, plus the end points),
  implemented as two rounds of median-of-three choices.  With half of
 the
  data inputs zero, it's not too improbable for two out of the three
  samples to be zeroes in which case I think the med3 result will be
 zero
  --- so choosing a pivot of zero is much more probable than one would
  like, and doing so in many levels of recursion causes the problem.
 
 Adding some randomness to the selection of the pivot is a known
 technique to fix the oddball partitions problem.  However, Bentley and
 Sedgewick proved that every quick sort algorithm has some input set that
 makes it go quadratic (hence the recent popularity of introspective
 sort, which switches to heapsort if quadratic behavior is detected.  The
 C++ template I submitted was an example of introspective sort, but
 PostgreSQL does not use C++ so it was not helpful).
 
  I think.  I'm not too sure if the code isn't just being sloppy about
 the
  case where many data values are equal to the pivot --- there's a
 special
  case there to switch to insertion sort, and maybe that's getting
 invoked
  too soon.  
 
 Here are some cases known to make qsort go quadratic:
 1. Data already sorted
 2. Data reverse sorted
 3. Data organ-pipe sorted or ramp
 4. Almost all data of the same value
 
 There are probably other cases.  Randomizing the pivot helps some, as
 does check for in-order or reverse order partitions.
 
 Imagine if 1/3 of the partitions fall into a category that causes
 quadratic behavior (have one of the above formats and have more than
 CUTOFF elements in them).
 
 It is doubtful that the switch to insertion sort is causing any sort of
 problems.  It is only going to be invoked on tiny sets, for which it has
 a fixed cost that is probably less that qsort() function calls on sets
 of the same size.
 
 It'd be useful to get a line-level profile of the behavior of
  this code in the slow cases...
 
 I guess that my in-order or presorted tests [which often arise when
 there are very few distinct values] may solve the bad partition
 problems.  Don't forget that the algorithm is called recursively.
 
  regards, tom lane
  
  ---(end of
 broadcast)---
  TIP 3: Have you checked our extensive FAQ?
  
 http://www.postgresql.org/docs/faq
 
 ---(end of broadcast)---
 TIP 2: Don't 'kill -9' the postmaster
 

-- 
  Bruce Momjian   http://candle.pha.pa.us
  SRA OSS, Inc.   http://www.sraoss.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index behaviour)

2006-03-02 Thread Jonah H. Harris
My introsort is almost complete and its the fastest variant of
quicksort I can find, I'll submit it to -patches in the next couple
days as-well.On 3/2/06, Bruce Momjian pgman@candle.pha.pa.us wrote:
Added to TODO:* Improve port/qsort() to handle sorts with 50% unique and 50% duplicatevalue [qsort]This involves choosing better pivot points for the quicksort.
---Dann Corbit wrote:  -Original Message-  From: 
[EMAIL PROTECTED] [mailto:pgsql-hackers-  [EMAIL PROTECTED]] On Behalf Of Tom Lane  Sent: Wednesday, February 15, 2006 5:22 PM
  To: Ron  Cc: pgsql-performance@postgresql.org; pgsql-hackers@postgresql.org  Subject: Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create
 Index  behaviour)   Ron [EMAIL PROTECTED] writes:   How are we choosing our pivots?   See 
qsort.c: it looks like median of nine equally spaced inputs (ie,  the 1/8th points of the initial input array, plus the end points),  implemented as two rounds of median-of-three choices.With half of
 the  data inputs zero, it's not too improbable for two out of the three  samples to be zeroes in which case I think the med3 result will be zero  --- so choosing a pivot of zero is much more probable than one would
  like, and doing so in many levels of recursion causes the problem. Adding some randomness to the selection of the pivot is a known technique to fix the oddball partitions problem.However, Bentley and
 Sedgewick proved that every quick sort algorithm has some input set that makes it go quadratic (hence the recent popularity of introspective sort, which switches to heapsort if quadratic behavior is detected.The
 C++ template I submitted was an example of introspective sort, but PostgreSQL does not use C++ so it was not helpful).  I think.I'm not too sure if the code isn't just being sloppy about
 the  case where many data values are equal to the pivot --- there's a special  case there to switch to insertion sort, and maybe that's getting invoked  too soon.
 Here are some cases known to make qsort go quadratic: 1. Data already sorted 2. Data reverse sorted 3. Data organ-pipe sorted or ramp 4. Almost all data of the same value
 There are probably other cases.Randomizing the pivot helps some, as does check for in-order or reverse order partitions. Imagine if 1/3 of the partitions fall into a category that causes
 quadratic behavior (have one of the above formats and have more than CUTOFF elements in them). It is doubtful that the switch to insertion sort is causing any sort of problems.It is only going to be invoked on tiny sets, for which it has
 a fixed cost that is probably less that qsort() function calls on sets of the same size. It'd be useful to get a line-level profile of the behavior of  this code in the slow cases...
 I guess that my in-order or presorted tests [which often arise when there are very few distinct values] may solve the bad partition problems.Don't forget that the algorithm is called recursively.


regards, tom lane   ---(end of broadcast)---  TIP 3: Have you checked our extensive FAQ? 
http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
--Bruce Momjian http://candle.pha.pa.usSRA OSS, Inc. http://www.sraoss.com+ If your life is a hard drive, Christ can be your backup. +
---(end of broadcast)---TIP 6: explain analyze is your friend-- Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation732.331.1324


[HACKERS] qsort again (was Re: [PERFORM] Strange Create Index behaviour)

2006-02-15 Thread Tom Lane
Gary Doades [EMAIL PROTECTED] writes:
 If I run the script again, it is not always the first case that is slow, 
 it varies from run to run, which is why I repeated it quite a few times 
 for the test.

For some reason I hadn't immediately twigged to the fact that your test
script is just N repetitions of the exact same structure with random data.
So it's not so surprising that you get random variations in behavior
with different test data sets.

I did some experimentation comparing the qsort from Fedora Core 4
(glibc-2.3.5-10.3) with our src/port/qsort.c.  For those who weren't
following the pgsql-performance thread, the test case is just this
repeated a lot of times:

create table atest(i int4, r int4);
insert into atest (i,r) select generate_series(1,10), 0;
insert into atest (i,r) select generate_series(1,10), random()*10;
\timing
create index idx on atest(r);
\timing
drop table atest;

I did this 100 times and sorted the reported runtimes.  (Investigation
with trace_sort = on confirms that the runtime is almost entirely spent
in qsort() called from our performsort --- the Postgres overhead is
about 100msec on this machine.)  Results are below.

It seems clear that our qsort.c is doing a pretty awful job of picking
qsort pivots, while glibc is mostly managing not to make that mistake.
I haven't looked at the glibc code yet to see what they are doing
differently.

I'd say this puts a considerable damper on my enthusiasm for using our
qsort all the time, as was recently debated in this thread:
http://archives.postgresql.org/pgsql-hackers/2005-12/msg00610.php
We need to fix our qsort.c before pushing ahead with that idea.

regards, tom lane


100 runtimes for glibc qsort, sorted ascending:

Time: 459.860 ms
Time: 460.209 ms
Time: 460.704 ms
Time: 461.317 ms
Time: 461.538 ms
Time: 461.652 ms
Time: 461.988 ms
Time: 462.573 ms
Time: 462.638 ms
Time: 462.716 ms
Time: 462.917 ms
Time: 463.219 ms
Time: 463.455 ms
Time: 463.650 ms
Time: 463.723 ms
Time: 463.737 ms
Time: 463.750 ms
Time: 463.852 ms
Time: 463.964 ms
Time: 463.988 ms
Time: 464.003 ms
Time: 464.135 ms
Time: 464.372 ms
Time: 464.458 ms
Time: 464.496 ms
Time: 464.551 ms
Time: 464.599 ms
Time: 464.655 ms
Time: 464.656 ms
Time: 464.722 ms
Time: 464.814 ms
Time: 464.827 ms
Time: 464.878 ms
Time: 464.899 ms
Time: 464.905 ms
Time: 464.987 ms
Time: 465.055 ms
Time: 465.138 ms
Time: 465.159 ms
Time: 465.194 ms
Time: 465.310 ms
Time: 465.316 ms
Time: 465.375 ms
Time: 465.450 ms
Time: 465.535 ms
Time: 465.595 ms
Time: 465.680 ms
Time: 465.769 ms
Time: 465.865 ms
Time: 465.892 ms
Time: 465.903 ms
Time: 466.003 ms
Time: 466.154 ms
Time: 466.164 ms
Time: 466.203 ms
Time: 466.305 ms
Time: 466.344 ms
Time: 466.364 ms
Time: 466.388 ms
Time: 466.502 ms
Time: 466.593 ms
Time: 466.725 ms
Time: 466.794 ms
Time: 466.798 ms
Time: 466.904 ms
Time: 466.971 ms
Time: 466.997 ms
Time: 467.122 ms
Time: 467.146 ms
Time: 467.221 ms
Time: 467.224 ms
Time: 467.244 ms
Time: 467.277 ms
Time: 467.587 ms
Time: 468.142 ms
Time: 468.207 ms
Time: 468.237 ms
Time: 468.471 ms
Time: 468.663 ms
Time: 468.700 ms
Time: 469.235 ms
Time: 469.840 ms
Time: 470.472 ms
Time: 471.140 ms
Time: 472.811 ms
Time: 472.959 ms
Time: 474.858 ms
Time: 477.210 ms
Time: 479.571 ms
Time: 479.671 ms
Time: 482.797 ms
Time: 488.852 ms
Time: 514.639 ms
Time: 529.287 ms
Time: 612.185 ms
Time: 660.748 ms
Time: 742.227 ms
Time: 866.814 ms
Time: 1234.848 ms
Time: 1267.398 ms


100 runtimes for port/qsort.c, sorted ascending:

Time: 418.905 ms
Time: 420.611 ms
Time: 420.764 ms
Time: 420.904 ms
Time: 421.706 ms
Time: 422.466 ms
Time: 422.627 ms
Time: 423.189 ms
Time: 423.302 ms
Time: 425.096 ms
Time: 425.731 ms
Time: 425.851 ms
Time: 427.253 ms
Time: 430.113 ms
Time: 432.756 ms
Time: 432.963 ms
Time: 440.502 ms
Time: 440.640 ms
Time: 450.452 ms
Time: 458.143 ms
Time: 459.212 ms
Time: 467.706 ms
Time: 468.006 ms
Time: 468.574 ms
Time: 470.003 ms
Time: 472.313 ms
Time: 483.622 ms
Time: 492.395 ms
Time: 509.564 ms
Time: 531.037 ms
Time: 533.366 ms
Time: 535.610 ms
Time: 575.523 ms
Time: 582.688 ms
Time: 593.545 ms
Time: 647.364 ms
Time: 660.612 ms
Time: 677.312 ms
Time: 680.288 ms
Time: 697.626 ms
Time: 833.066 ms
Time: 834.511 ms
Time: 851.819 ms
Time: 920.443 ms
Time: 926.731 ms
Time: 954.289 ms
Time: 1045.214 ms
Time: 1059.200 ms
Time: 1062.328 ms
Time: 1136.018 ms
Time: 1260.091 ms
Time: 1276.883 ms
Time: 1319.351 ms
Time: 1438.854 ms
Time: 1475.457 ms
Time: 1538.211 ms
Time: 1549.004 ms
Time: 1744.642 ms
Time: 1771.258 ms
Time: 1959.530 ms
Time: 2300.140 ms
Time: 2589.641 ms
Time: 2612.780 ms
Time: 3100.024 ms
Time: 3284.125 ms
Time: 3379.792 ms
Time: 3750.278 ms
Time: 4302.278 ms
Time: 4780.624 ms
Time: 5000.056 ms
Time: 5092.604 ms
Time: 5168.722 ms
Time: 5292.941 ms
Time: 5895.964 ms
Time: 7003.164 ms
Time: 7099.449 ms
Time: 7115.083 ms
Time: 7384.940 ms
Time: 8214.010 ms
Time: 8700.771 ms
Time: 9331.225 ms
Time: 10503.360 ms
Time: 12496.026 ms
Time: 12982.474 ms
Time: 15192.390 ms

Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index behaviour)

2006-02-15 Thread Gary Doades

Tom Lane wrote:

For some reason I hadn't immediately twigged to the fact that your test
script is just N repetitions of the exact same structure with random data.
So it's not so surprising that you get random variations in behavior
with different test data sets.


  It seems clear that our qsort.c is doing a pretty awful job of picking

qsort pivots, while glibc is mostly managing not to make that mistake.
I haven't looked at the glibc code yet to see what they are doing
differently.

I'd say this puts a considerable damper on my enthusiasm for using our
qsort all the time, as was recently debated in this thread:
http://archives.postgresql.org/pgsql-hackers/2005-12/msg00610.php
We need to fix our qsort.c before pushing ahead with that idea.


[snip]


Time: 28314.182 ms
Time: 29400.278 ms
Time: 34142.534 ms


Ouch! That confirms my problem. I generated the random test case because 
it was easier than including the dump of my tables, but you can 
appreciate that tables 20 times the size are basically crippled when it 
comes to creating an index on them.


Examining the dump and the associated times during restore it looks like 
I have 7 tables with this approximate distribution, thus the 
ridiculously long restore time. Better not re-index soon!


Is this likely to hit me in a random fashion during normal operation, 
joins, sorts, order by for example?


So the options are:
1) Fix the included qsort.c code and use that
2) Get FreeBSD to fix their qsort code
3) Both

I guess that 1 is the real solution in case anyone else's qsort is 
broken in the same way. Then at least you *could* use it all the time :)


Regards,
Gary.




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index behaviour)

2006-02-15 Thread Tom Lane
Gary Doades [EMAIL PROTECTED] writes:
 Is this likely to hit me in a random fashion during normal operation, 
 joins, sorts, order by for example?

Yup, anytime you're passing data with that kind of distribution
through a sort.

 So the options are:
 1) Fix the included qsort.c code and use that
 2) Get FreeBSD to fix their qsort code
 3) Both

 I guess that 1 is the real solution in case anyone else's qsort is 
 broken in the same way. Then at least you *could* use it all the time :)

It's reasonable to assume that most of the *BSDen have basically the
same qsort code.  Ours claims to have come from NetBSD sources, but
I don't doubt that they all trace back to a common ancestor.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index behaviour)

2006-02-15 Thread Tom Lane
Gary Doades [EMAIL PROTECTED] writes:
 Ouch! That confirms my problem. I generated the random test case because 
 it was easier than including the dump of my tables, but you can 
 appreciate that tables 20 times the size are basically crippled when it 
 comes to creating an index on them.

Actually... we only use qsort when we have a sorting problem that fits
within the allowed sort memory.  The external-sort logic doesn't go
through that code at all.  So all the analysis we just did on your test
case doesn't necessarily apply to sort problems that are too large for
the sort_mem setting.

The test case would be sorting 20 index entries, which'd probably
occupy at least 24 bytes apiece of sort memory, so probably about 5 meg.
A problem 20 times that size would definitely not fit in the default
16MB maintenance_work_mem.  Were you using a large value of
maintenance_work_mem for your restore?

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index behaviour)

2006-02-15 Thread Tom Lane
I wrote:
 Gary Doades [EMAIL PROTECTED] writes:
 Ouch! That confirms my problem. I generated the random test case because 
 it was easier than including the dump of my tables, but you can 
 appreciate that tables 20 times the size are basically crippled when it 
 comes to creating an index on them.

 Actually... we only use qsort when we have a sorting problem that fits
 within the allowed sort memory.  The external-sort logic doesn't go
 through that code at all.  So all the analysis we just did on your test
 case doesn't necessarily apply to sort problems that are too large for
 the sort_mem setting.

I increased the size of the test case by 10x (basically s/10/100/)
which is enough to push it into the external-sort regime.  I get
amazingly stable runtimes now --- I didn't have the patience to run 100
trials, but in 30 trials I have slowest 11538 msec and fastest 11144 msec.
So this code path is definitely not very sensitive to this data
distribution.

While these numbers aren't glittering in comparison to the best-case
qsort times (~450 msec to sort 10% as much data), they are sure a lot
better than the worst-case times.  So maybe a workaround for you is
to decrease maintenance_work_mem, counterintuitive though that be.
(Now, if you *weren't* using maintenance_work_mem of 100MB or more
for your problem restore, then I'm not sure I know what's going on...)

We still ought to try to fix qsort of course.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index behaviour)

2006-02-15 Thread Tom Lane
Ron [EMAIL PROTECTED] writes:
 How are we choosing our pivots?

See qsort.c: it looks like median of nine equally spaced inputs (ie,
the 1/8th points of the initial input array, plus the end points),
implemented as two rounds of median-of-three choices.  With half of the
data inputs zero, it's not too improbable for two out of the three
samples to be zeroes in which case I think the med3 result will be zero
--- so choosing a pivot of zero is much more probable than one would
like, and doing so in many levels of recursion causes the problem.

I think.  I'm not too sure if the code isn't just being sloppy about the
case where many data values are equal to the pivot --- there's a special
case there to switch to insertion sort, and maybe that's getting invoked
too soon.  It'd be useful to get a line-level profile of the behavior of
this code in the slow cases...

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index behaviour)

2006-02-15 Thread Dann Corbit


 -Original Message-
 From: [EMAIL PROTECTED] [mailto:pgsql-hackers-
 [EMAIL PROTECTED] On Behalf Of Tom Lane
 Sent: Wednesday, February 15, 2006 5:22 PM
 To: Ron
 Cc: pgsql-performance@postgresql.org; pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create
Index
 behaviour)
 
 Ron [EMAIL PROTECTED] writes:
  How are we choosing our pivots?
 
 See qsort.c: it looks like median of nine equally spaced inputs (ie,
 the 1/8th points of the initial input array, plus the end points),
 implemented as two rounds of median-of-three choices.  With half of
the
 data inputs zero, it's not too improbable for two out of the three
 samples to be zeroes in which case I think the med3 result will be
zero
 --- so choosing a pivot of zero is much more probable than one would
 like, and doing so in many levels of recursion causes the problem.

Adding some randomness to the selection of the pivot is a known
technique to fix the oddball partitions problem.  However, Bentley and
Sedgewick proved that every quick sort algorithm has some input set that
makes it go quadratic (hence the recent popularity of introspective
sort, which switches to heapsort if quadratic behavior is detected.  The
C++ template I submitted was an example of introspective sort, but
PostgreSQL does not use C++ so it was not helpful).

 I think.  I'm not too sure if the code isn't just being sloppy about
the
 case where many data values are equal to the pivot --- there's a
special
 case there to switch to insertion sort, and maybe that's getting
invoked
 too soon.  

Here are some cases known to make qsort go quadratic:
1. Data already sorted
2. Data reverse sorted
3. Data organ-pipe sorted or ramp
4. Almost all data of the same value

There are probably other cases.  Randomizing the pivot helps some, as
does check for in-order or reverse order partitions.

Imagine if 1/3 of the partitions fall into a category that causes
quadratic behavior (have one of the above formats and have more than
CUTOFF elements in them).

It is doubtful that the switch to insertion sort is causing any sort of
problems.  It is only going to be invoked on tiny sets, for which it has
a fixed cost that is probably less that qsort() function calls on sets
of the same size.

It'd be useful to get a line-level profile of the behavior of
 this code in the slow cases...

I guess that my in-order or presorted tests [which often arise when
there are very few distinct values] may solve the bad partition
problems.  Don't forget that the algorithm is called recursively.

   regards, tom lane
 
 ---(end of
broadcast)---
 TIP 3: Have you checked our extensive FAQ?
 
http://www.postgresql.org/docs/faq

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index behaviour)

2006-02-15 Thread Christopher Kings-Lynne
Ouch! That confirms my problem. I generated the random test case because 
it was easier than including the dump of my tables, but you can 
appreciate that tables 20 times the size are basically crippled when it 
comes to creating an index on them.



I have to say that I restored a few gigabyte dump on freebsd the other 
day, and most of the restore time was in index creation - I didn't think 
too much of it though at the time.  FreeBSD 4.x.


Chris


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index behaviour)

2006-02-15 Thread Qingqing Zhou

Tom Lane [EMAIL PROTECTED] wrote

 I did this 100 times and sorted the reported runtimes.

 I'd say this puts a considerable damper on my enthusiasm for using our
 qsort all the time, as was recently debated in this thread:
 http://archives.postgresql.org/pgsql-hackers/2005-12/msg00610.php

 100 runtimes for glibc qsort, sorted ascending:

 Time: 866.814 ms
 Time: 1234.848 ms
 Time: 1267.398 ms

 100 runtimes for port/qsort.c, sorted ascending:

 Time: 28314.182 ms
 Time: 29400.278 ms
 Time: 34142.534 ms


By did this 100 times do you mean generate a sequence of at most
20*100 numbers, and for every 20 numbers, the first half are all
zeros and the other half are uniform random numbers? I tried to confirm it
by patching the program mentioned in the link, but seems BSDqsort is still a
little bit leading.

Regards,
Qingqing

---
Result

sort#./sort
[3] [glibc qsort]: nelem(2000), range(4294901760) distr(halfhalf)
ccost(2) : 18887.285000 ms
[3] [BSD qsort]: nelem(2000), range(4294901760) distr(halfhalf) ccost(2)
: 18801.018000 ms
[3] [qsortG]: nelem(2000), range(4294901760) distr(halfhalf) ccost(2) :
22997.004000 ms

---
Patch to sort.c

sort#diff -c sort.c sort1.c
*** sort.c  Thu Dec 15 12:18:59 2005
--- sort1.c Wed Feb 15 22:21:15 2006
***
*** 35,43 
{BSD qsort, qsortB},
{qsortG, qsortG}
  };
! static const size_t d_nelem[] = {1000, 1, 10, 100, 500};
! static const size_t d_range[] = {2, 32, 1024, 0xL};
! static const char *d_distr[] = {uniform, gaussian, 95sorted,
95reversed};
  static const size_t d_ccost[] = {2};

  /* factor index */
--- 35,43 
{BSD qsort, qsortB},
{qsortG, qsortG}
  };
! static const size_t d_nelem[] = {500, 1000, 2000};
! static const size_t d_range[] = {0xL};
! static const char *d_distr[] = {halfhalf};
  static const size_t d_ccost[] = {2};

  /* factor index */
***
*** 180,185 
--- 180,192 
swap(karray[i], karray[nelem-i-1]);
}
}
+   else if (!strcmp(distr, halfhalf))
+   {
+   int j;
+   for (i = 0; i  nelem/20; i++)
+   for (j = 0; j  10; j++)
+   karray[i*20 + j] = 0;
+   }

return array;
  }




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index behaviour)

2006-02-15 Thread Tom Lane
Qingqing Zhou [EMAIL PROTECTED] writes:
 By did this 100 times do you mean generate a sequence of at most
 20*100 numbers, and for every 20 numbers, the first half are all
 zeros and the other half are uniform random numbers?

No, I mean I ran the bit of SQL script I gave 100 separate times.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index behaviour)

2006-02-15 Thread Qingqing Zhou

Tom Lane [EMAIL PROTECTED] wrote
 Qingqing Zhou [EMAIL PROTECTED] writes:
  By did this 100 times do you mean generate a sequence of at most
  20*100 numbers, and for every 20 numbers, the first half are all
  zeros and the other half are uniform random numbers?

 No, I mean I ran the bit of SQL script I gave 100 separate times.


I must misunderstand something here -- I can't figure out that why the cost
of the same procedure keep climbing?

Regards,
Qingqing



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index behaviour)

2006-02-15 Thread Qingqing Zhou

Qingqing Zhou [EMAIL PROTECTED] wrote

 I must misunderstand something here -- I can't figure out that why the
cost
 of the same procedure keep climbing?


Ooops, I mis-intepret the sentence --  you sorted the results ...

Regards,
Qingqing



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index behaviour)

2006-02-15 Thread Tom Lane
Qingqing Zhou [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] wrote
 No, I mean I ran the bit of SQL script I gave 100 separate times.

 I must misunderstand something here -- I can't figure out that why the cost
 of the same procedure keep climbing?

No, the run cost varies randomly depending on the random data supplied
by the test script.  The reason the numbers are increasing is that I
sorted them for ease of inspection.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend