Re: Replacement of Quicksort in java.util.Arrays with new Dual-Pivot Quicksort

Vladimir Iaroslavski Fri, 11 Sep 2009 07:29:26 -0700

Hi,

I've tried to use local variable int ak = a[k] in the loop
and not found saving of time. There are 3 occurrences of a[k]
in each loop, but only two can be changed because third
comes after line: swap(a, k, great--);


As summary: I'm changing only hardcoded constant by
private static final variables.

Thank you,
Vladimir

Ulf Zibis wrote:

Am 11.09.2009 15:32, Rémi Forax schrieb:

just my two cents :
In the loop tagged "sorting" and "equals element",
a[k] can be stored in a local variable to avoid to recompute itseveral time.


I add 2 cents more:
Doesn't HotSpot see this, and optimize accordingly?

IMO: It's time that javac should do such optimization, as there are lessoptimized VM's in the world.


-Ulf


The algorithm use two constants: 37 and 13,
I think they should be declared as private static final.

Rémi


Le 11/09/2009 12:35, Vladimir Yaroslavskiy a écrit :

Hello All,

I'd like to share with you new Dual-Pivot Quicksort which is
faster than the known implementations (theoretically and
experimental). I'd like to propose to replace the JDK's
Quicksort implementation by new one.

Description
-----------
The classical Quicksort algorithm uses the following scheme:

1. Pick an element P, called a pivot, from the array.
2. Reorder the array so that all elements, which are less than
   the pivot, come before the pivot and all elements greater than
   the pivot come after it (equal values can go either way). After
   this partitioning, the pivot element is in its final position.
3. Recursively sort the sub-array of lesser elements and the
   sub-array of greater elements.

The invariant of classical Quicksort is:

[ <= p | >= p ]

There are several modifications of the schema:

[ < p | = p | > p ]  or  [ = p | < p | > p | = p ]

But all of them use *one* pivot.

The new Dual-Pivot Quicksort uses *two* pivots elements in this manner:

1. Pick an elements P1, P2, called pivots from the array.
2. Assume that P1 <= P2, otherwise swap it.
3. Reorder the array into three parts: those less than the smaller
   pivot, those larger than the larger pivot, and in between are
   those elements between (or equal to) the two pivots.
4. Recursively sort the sub-arrays.

The invariant of the Dual-Pivot Quicksort is:

[ < P1 | P1 <= & <= P2 } > P2 ]

The new Quicksort is faster than current implementation of Quicksort
in JDK (L. Bentley and M. Douglas McIlroy) and classical Quicksort.

The full description of the Dual-Pivot Quicksort you can find
on my page: http://iaroslavski.narod.ru/quicksort

Performance tests
-----------------
Here is result of running on different types of input data:

Client VM                          all    85%  organ  0..1          0..4

random ascend descend equal equal pipes random 010101random

Dual-Pivot:  16.83  5.31    5.47   0.35  0.68  10.59  1.06   1.02   2.18

Bentley's: 19.77 9.08 10.13 0.63 1.12 13.22 1.63 1.082.49


Server VM                          all    85%  organ  0..1          0..4

random ascend descend equal equal pipes random 010101random

Dual-Pivot:  23.94  6.68    6.63   0.43  0.62  17.14  1.42   1.96   3.41

Bentley's: 25.20 10.18 10.32 2.07 1.33 16.72 2.95 1.823.39


The a lot of other tests have been run under client and server mode.
The most interesting is BentleyBasher test framework. It runs battery
of tests for all cases:

{ 100, 1000, 10000, 1000000 } x
{ sawtooth, rand, stagger, plateau, shuffle } x
{ ident, reverse, reverse_front, reverse_back, sort, dither}

where

100, ... , 1000000 - array length

sawtooth: x[i] =i%m
rand: x[i] = rand() % m
stagger: x[i] = (i*m + i) % n
plateau: x[i] = min(i, m)
shuffle: x[i] = rand()%m? (j+=2): (k+=2)

ident(x) - a copy of x
reverse(x, 0, n) - reversed copy
reverse_front(x, 0, n/2) - front half reversed
reverse_back(x, n/2, n) - back half reversed
sort(x) - an ordered copy
dither(x) - add i%5 to x[i]

Here is the result of execution:

Server VM:http://spreadsheets.google.com/pub?key=t_EAWUkQ4mD3BIbOv8Fa-AQ&output=htmlClient VM:http://spreadsheets.google.com/pub?key=tdiMo8xleTxd23nKUObcz0Q&single=true&gid=0&output=html


Mathematical investigations
---------------------------
It is proved that for the Dual-Pivot Quicksort the average number of
comparisons is 2*n*ln(n), the average number of swaps is 0.8*n*ln(n),
whereas classical Quicksort algorithm has 2*n*ln(n) and 1*n*ln(n)
respectively. Full mathematical proof see in attached proof.txt
and proof_add.txt files. Theoretical results are also confirmed
by experimental counting of the operations.

Diff between current and new implementation of Quicksort
--------------------------------------------------------

Here is the link to the diff for java.util.Arrays class:
http://cr.openjdk.java.net/~alanb/6880672/webrev.00

If you like to look and play with new algorithm,
please, take attached class DualPivotQuicksort.java

Feedback
--------

Also I'd like to share a feedback from Joshua Bloch and
Jon Bentley who spent a lot of time investigating this
algorithm, who gave me many advices and tips how to
make new Quicksort better.

-------- Original Message --------
Subject: Re: Integration of new Dual-Pivot Quicksort into JDK 7
Date: Thu, 10 Sep 2009 07:20:11 -0700
From: Joshua Bloch <j...@google.com>

Jon also says that Vladimir should make every reasonable improvement to
the basic method before checking in the code. In his words, "It would be
horrible to put the new code into the library, and then have someone
else come along and speed it up by another 20% by using standard
techniques." I believe it's not unlikely that this code may end up
getting ported to many languages and widely deployed in much the manner
of Bentley and McIlroy's fine sort (which is nearing 20 successful years
in the field). Jon will help Vladimir do this.

-------- Original Message --------
Subject: Dual-Pivot Quicksort: Next Steps
Date: Wed, 09 Sep 2009 15:02:25 -0400
From: Jon Bentley <jbent...@avaya.com>

Vladimir, Josh,
     I *finally* feel like I understand what is going on.  Now that I
(think that) I see it, it seems straightforward and obvious.
     Tony Hoare developed Quicksort in the early 1960s. I was very
proud to make minor contributions to a particularly clean (binary)
quicksort in the mid 1980s, to a relatively straightforward, industrial
strength Quicksort with McIlroy in the early 1990s, and then to

algorithms and data structures for strings with Sedgewick in the mid1990s.

     I think that Vladimir's contributions to Quicksort go way beyond
anything that I've ever done, and rank up there with Hoare's original
design and Sedgewick's analysis. I feel so privileged to play a very,
very minor role in helping Vladimir with the most excellent work!

-----------------------------------------------

Let me know, if you have any questions/comments.

Thank you,
Vladimir

Re: Replacement of Quicksort in java.util.Arrays with new Dual-Pivot Quicksort

Reply via email to