At 08:21 PM 2/15/2006, Tom Lane wrote:
Ron <[EMAIL PROTECTED]> writes:
> How are we choosing our pivots?

See qsort.c: it looks like median of nine equally spaced inputs (ie,
the 1/8th points of the initial input array, plus the end points),
implemented as two rounds of median-of-three choices.

OK, this is a bad way to do median-of-n partitioning for a few reasons. See Sedgewick's PhD thesis for details.

Basically, if one is using median-of-n partitioning to choose a pivot, one should do it in =one= pass, and n for that pass should be <= the numbers of registers in the CPU. Since the x86 ISA has 8 GPR's, n should be <= 8. 7 for instance.

Special purposing the median-of-n code so that the minimal number of comparisons and moves is used to sort the sample and then "partitioning in place" is the best way to do it. In addition, care must be taken to deal with the possibility that many of the keys may be equal.

The (pseudo) code looks something like this:

qs(a[],L,R){
if((R-L) > SAMPLE_SIZE){ // Not worth using qs for too few elements
   SortSample(SAMPLE_SIZE,a[],L,R);
// Sorts SAMPLE_SIZE= n elements and does median-of-n partitioning for small n
   // using the minimal number of comparisons and moves.
// In the process it ends up partitioning the first n/2 and last n/2 elements
   // SAMPLE_SIZE is a constant chosen to work best for a given CPU.
   //  #GPRs - 1 is a good initial guess.
   // For the x86 ISA, #GPRs - 1 = 7. For native x86-64, it's 15.
   // For most RISC CPUs it's 31 or 63.  For Itanium, it's 127 (!)
   pivot= a[(L+R)>>1]; i= L+(SAMPLE_SIZE>>1); j= R-(SAMPLE_SIZE>>1);
   for(;;){
      while(a[++i] < pivot);
      while(a[--j] > pivot);
      if(i >= j) break;
      if(a[i] > a[j]) swap(a[i],a[j]);
      }
   if((i-R) >= (j-L)){qs(a,L,i-1);}
   else{qs(a,i,R);}
else{OofN^2_Sort(a,L,R);}
// SelectSort may be better than InsertSort if KeySize in bits << RecordSize in bits
} // End of qs

Given that the most common CPU ISA in existence has 8 GPRs, SAMPLE_SIZE= 7 is probably optimal:
t= (L+R);
the set would be {L; t/8; t/4; t/2; 3*t/4; 7*t/8; R;}
==> {L; t>>3; t>>2; t>>1; (3*t)>>2; (7*t)>>3; R} as the locations.
Even better (and more easily scaled as the number of GPR's in the CPU changes) is to use
the set {L; L+1; L+2; t>>1; R-2; R-1; R}
This means that instead of 7 random memory accesses, we have 3; two of which result in a
burst access for three elements each.
That's much faster; _and_ using a sample of 9, 15, 31, 63, etc (to max of ~GPRs -1) elements is more easily done.

It also means that the work we do sorting the sample can be taken advantage of when starting inner loop of quicksort: items L..L+2, t, and R-2..R are already partitioned by SortSample().

Insuring that the minimum number of comparisons and moves is done in SortSample can be down by using a code generator to create a comparison tree that identifies which permutation(s) of n we are dealing with and then moving them into place with the minimal number of moves.

SIDE NOTE: IIRC glibc's qsort is actually merge sort. Merge sort performance is insensitive to all inputs, and there are way to optimize it as well.

I'll leave the actual coding to someone who knows the pg source better than I do. Ron


---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

              http://archives.postgresql.org

Reply via email to