Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

Ron Thu, 16 Feb 2006 05:23:30 -0800

At 06:35 AM 2/16/2006, Steinar H. Gunderson wrote:

On Wed, Feb 15, 2006 at 11:30:54PM -0500, Ron wrote:
> Even better (and more easily scaled as the number of GPR's in the CPU
> changes) is to use
> the set {L; L+1; L+2; t>>1; R-2; R-1; R}
> This means that instead of 7 random memory accesses, we have 3; two
> of which result in a burst access for three elements each.


Isn't that improvement going to disappear competely if you choose a bad
pivot?

Only if you _consistently_ (read: "the vast majority of the time":quicksort is actually darn robust) choose a _pessimal_, not just"bad", pivot quicksort will degenerate to the O(N^2) behavioreveryone worries about. See Corman & Rivest for a proof on this.


Even then, doing things as above has benefits:

1= The worst case is less bad since the guaranteed O(lgs!) pivotchoosing algorithm puts s elements into final position.

Worst case becomes better than O(N^2/(s-1)).

2= The overhead of pivot choosing can overshadow the benefits usingmore traditional methods for even moderate values of s. Seediscussions on the quicksort variant known as "samplesort" andSedgewick's PhD thesis for details. Using a pivot choosing algorithmthat actually does some of the partitioning (and does it moreefficiently than the "usual" partitioning algorithm does) plus usingpartition-in-place (rather then Lomuto's method) reduces overheadvery effectively (at the "cost" of more complicated / delicate to getright partitioning code). The above reduces the number of moves usedin a quicksort pass considerably regardless of the number of compares used.

3= Especially in modern systems where the gap between internal CPUbandwidth and memory bandwidth is so great, the overhead of memoryaccesses for comparisons and moves is the majority of the overheadfor both the pivot choosing and the partitioning algorithms withinquicksort. Particularly random memory accesses. The reason (#GPRs -1) is a magic constant is that it's the most you can compare and moveusing only register-to-register operations.

In addition, replacing as many of the memory accesses you must dowith sequential rather than random memory accesses is a big deal:sequential memory access is measured in 10's of CPU cycles whilerandom memory access is measured in hundreds of CPU cycles. It's noaccident that the advances in Grey's sorting contest have involvedalgorithms that are both register and cache friendly, minimizingoverall memory access and using sequential memory access as much aspossible when said access can not be avoided. As caches grow largerand memory accesses more expensive, it's often worth it to use aBucketSort+QuickSort hybrid rather than just QuickSort.

...and of course if you know enough about the data to be sorted so asto constrain it appropriately, one should use a non comparison basedO(N) sorting algorithm rather than any of the general comparisonbased O(NlgN) methods.

> SIDE NOTE: IIRC glibc's qsort is actually merge sort.  Merge sort
> performance is insensitive to all inputs, and there are way to
> optimize it as well.

glibc-2.3.5/stdlib/qsort.c:

  /* Order size using quicksort.  This implementation incorporates
     four optimizations discussed in Sedgewick:

I can't see any references to merge sort in there at all.

Well, then I'm not the only person on the lists whose memory is faulty ;-)

The up side of MergeSort is that its performance is always O(NlgN).
The down sides are that it is far more memory hungry than QuickSort and slower.

Ron



---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

Reply via email to