Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

Ron Wed, 15 Feb 2006 23:03:04 -0800

At 08:21 PM 2/15/2006, Tom Lane wrote:

Ron <[EMAIL PROTECTED]> writes:
> How are we choosing our pivots?


See qsort.c: it looks like median of nine equally spaced inputs (ie,
the 1/8th points of the initial input array, plus the end points),
implemented as two rounds of median-of-three choices.

OK, this is a bad way to do median-of-n partitioning for a fewreasons. See Sedgewick's PhD thesis for details.

Basically, if one is using median-of-n partitioning to choose apivot, one should do it in =one= pass, and n for that pass should be<= the numbers of registers in the CPU. Since the x86 ISA has 8GPR's, n should be <= 8. 7 for instance.

Special purposing the median-of-n code so that the minimal number ofcomparisons and moves is used to sort the sample and then"partitioning in place" is the best way to do it. In addition, caremust be taken to deal with the possibility that many of the keys may be equal.


The (pseudo) code looks something like this:

qs(a[],L,R){
if((R-L) > SAMPLE_SIZE){ // Not worth using qs for too few elements
   SortSample(SAMPLE_SIZE,a[],L,R);

// Sorts SAMPLE_SIZE= n elements and does median-of-npartitioning for small n

   // using the minimal number of comparisons and moves.

// In the process it ends up partitioning the first n/2 and lastn/2 elements

   // SAMPLE_SIZE is a constant chosen to work best for a given CPU.
   //  #GPRs - 1 is a good initial guess.
   // For the x86 ISA, #GPRs - 1 = 7. For native x86-64, it's 15.
   // For most RISC CPUs it's 31 or 63.  For Itanium, it's 127 (!)
   pivot= a[(L+R)>>1]; i= L+(SAMPLE_SIZE>>1); j= R-(SAMPLE_SIZE>>1);
   for(;;){
      while(a[++i] < pivot);
      while(a[--j] > pivot);
      if(i >= j) break;
      if(a[i] > a[j]) swap(a[i],a[j]);
      }
   if((i-R) >= (j-L)){qs(a,L,i-1);}
   else{qs(a,i,R);}
else{OofN^2_Sort(a,L,R);}

// SelectSort may be better than InsertSort if KeySize in bits <<RecordSize in bits

} // End of qs

Given that the most common CPU ISA in existence has 8 GPRs,SAMPLE_SIZE= 7 is probably optimal:

t= (L+R);
the set would be {L; t/8; t/4; t/2; 3*t/4; 7*t/8; R;}
==> {L; t>>3; t>>2; t>>1; (3*t)>>2; (7*t)>>3; R} as the locations.

Even better (and more easily scaled as the number of GPR's in the CPUchanges) is to use

the set {L; L+1; L+2; t>>1; R-2; R-1; R}

This means that instead of 7 random memory accesses, we have 3; twoof which result in a

burst access for three elements each.

That's much faster; _and_ using a sample of 9, 15, 31, 63, etc (tomax of ~GPRs -1) elements is more easily done.

It also means that the work we do sorting the sample can be takenadvantage of when startinginner loop of quicksort: items L..L+2, t, and R-2..R are alreadypartitioned by SortSample().

Insuring that the minimum number of comparisons and moves is done inSortSample can be down by using a code generator to create acomparison tree that identifies which permutation(s) of n we aredealing with and then moving them into place with the minimal number of moves.

SIDE NOTE: IIRC glibc's qsort is actually merge sort. Merge sortperformance is insensitive to all inputs, and there are way tooptimize it as well.

I'll leave the actual coding to someone who knows the pg sourcebetter than I do.Ron



---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

              http://archives.postgresql.org

Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

Reply via email to