At 06:35 AM 2/16/2006, Steinar H. Gunderson wrote:
On Wed, Feb 15, 2006 at 11:30:54PM -0500, Ron wrote:
> Even better (and more easily scaled as the number of GPR's in the CPU
> changes) is to use
> the set {L; L+1; L+2; t>>1; R-2; R-1; R}
> This means that instead of 7 random memory accesses, we have 3; two
> of which result in a burst access for three elements each.

Isn't that improvement going to disappear competely if you choose a bad
pivot?
Only if you _consistently_ (read: "the vast majority of the time": quicksort is actually darn robust) choose a _pessimal_, not just "bad", pivot quicksort will degenerate to the O(N^2) behavior everyone worries about. See Corman & Rivest for a proof on this.

Even then, doing things as above has benefits:
1= The worst case is less bad since the guaranteed O(lgs!) pivot choosing algorithm puts s elements into final position.
Worst case becomes better than O(N^2/(s-1)).

2= The overhead of pivot choosing can overshadow the benefits using more traditional methods for even moderate values of s. See discussions on the quicksort variant known as "samplesort" and Sedgewick's PhD thesis for details. Using a pivot choosing algorithm that actually does some of the partitioning (and does it more efficiently than the "usual" partitioning algorithm does) plus using partition-in-place (rather then Lomuto's method) reduces overhead very effectively (at the "cost" of more complicated / delicate to get right partitioning code). The above reduces the number of moves used in a quicksort pass considerably regardless of the number of compares used.

3= Especially in modern systems where the gap between internal CPU bandwidth and memory bandwidth is so great, the overhead of memory accesses for comparisons and moves is the majority of the overhead for both the pivot choosing and the partitioning algorithms within quicksort. Particularly random memory accesses. The reason (#GPRs - 1) is a magic constant is that it's the most you can compare and move using only register-to-register operations.

In addition, replacing as many of the memory accesses you must do with sequential rather than random memory accesses is a big deal: sequential memory access is measured in 10's of CPU cycles while random memory access is measured in hundreds of CPU cycles. It's no accident that the advances in Grey's sorting contest have involved algorithms that are both register and cache friendly, minimizing overall memory access and using sequential memory access as much as possible when said access can not be avoided. As caches grow larger and memory accesses more expensive, it's often worth it to use a BucketSort+QuickSort hybrid rather than just QuickSort.

...and of course if you know enough about the data to be sorted so as to constrain it appropriately, one should use a non comparison based O(N) sorting algorithm rather than any of the general comparison based O(NlgN) methods.


> SIDE NOTE: IIRC glibc's qsort is actually merge sort.  Merge sort
> performance is insensitive to all inputs, and there are way to
> optimize it as well.

glibc-2.3.5/stdlib/qsort.c:

  /* Order size using quicksort.  This implementation incorporates
     four optimizations discussed in Sedgewick:

I can't see any references to merge sort in there at all.
Well, then I'm not the only person on the lists whose memory is faulty ;-)

The up side of MergeSort is that its performance is always O(NlgN).
The down sides are that it is far more memory hungry than QuickSort and slower.


Ron


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Reply via email to