While reviewing Thomas Munro's patchset to consider expanding the uses
of specialized qsort [1], I wondered about some aspects of the current
qsort implementation. For background, ours is based on Bentley &
McIlroy "Engineering a Sort Function" [2], which is a classic paper
worth studying.

1) the check for pre-sorted input at the start each and every recursion

This has been discussed before [3]. This is great if the input is
sorted, and mostly harmless if the current partition is badly sorted
enough that this check fails quickly, but it's not hard to imagine
that this is mostly a waste of cycles. There have been proposals to
base the pre-sorted check on input size [4] or to do it only once at
the beginning, but that strikes me as looking for the keys under the
lamppost because that's where the light is. The logical criterion for
proceeding to check if the input is sorted is whether we *think* the
input could be sorted. That may sound like a tautology, but consider
the case where the partitioning phase didn't perform any swaps. Then,
it makes sense to check, but we can go further. What if we do the
check, but towards the end that check fails. If just a couple elements
are out of place, does it really make sense to give up, partition, and
recurse? If just a few iterations of insertion sort are all that is
needed to finish, why not just do that? This way, we *dynamically*
decide to optimistically start an insertion sort, in the hopes that it
will occasionally prevent a recursion, and worst case the input is
slightly more sorted for the next recursion. All we need is a
lightweight way to detect that the partitioning phase didn't do any
swaps. More on this later.

2) cardinality issues can cancel abbreviated keys, but our qsort is
not optimized for that

Since in this case comparison is very expensive, it stands to reason
that qsort could profitably be optimized for a low number of unique
keys. The Bentley & McIlroy scheme does take great pains to prevent
quadratic behavior from lots of duplicates, but I've found something
that might have stronger guarantees: Pattern-defeating quicksort
(paper: [5] C++ implementation: [6]) claims worst-case runtime of
O(nk) for inputs with k distinct elements. This is achieved via an
asymmetric partitioning scheme. It's not complex, but it is subtle, so
I won't try to describe it here. I recommend reading section 3 of the
paper for details. Rust and Boost use PDQuicksort in some form, so it
has some production use.

The partitioning scheme just mentioned also provides an easy way to
detect when no swaps have been done, thus solving #1 above.

There is one requirement that is a double-edged sword: Recursion to
the partitions must happen in order. This has an additional advantage
that in every case but one, insertion sort doesn't need to guard
against running off the beginning of the array. The downside for us is
that we currently recurse to a partition based on its size as a
stack-guarding measure, so that guard would have to be replaced. The
C++ implementation of PDQuicksort falls back to heap sort as a last
resort to bound runtime, but it would be just as effective at guarding
the stack. That's a bit of a snag for making it production-ready, but
not enough to prevent testing the actual qsort part.

Does anyone see a reason not to put in the necessary work to try it out?

[1] 
https://www.postgresql.org/message-id/CA%2BhUKGKztHEWm676csTFjYzortziWmOcf8HDss2Zr0muZ2xfEg%40mail.gmail.com
[2] https://cs.fit.edu/~pkc/classes/writing/samples/bentley93engineering.pdf
[3] 
https://www.postgresql.org/message-id/flat/87F42982BF2B434F831FCEF4C45FC33E5BD369DF%40EXCHANGE.corporate.connx.com#e69718293c45d89555020bd0922ad055
[4] 
https://www.postgresql.org/message-id/CA%2BTgmobKFcb6ajVEN8eSnBa78sB3FSwuEWTHXd2x9JC5DOh5OA%40mail.gmail.com
[5] https://arxiv.org/pdf/2106.05123.pdf
[6] https://github.com/orlp/pdqsort

-- 
John Naylor
EDB: http://www.enterprisedb.com


Reply via email to