On 3/13/12 1:31 AM, Xinok wrote:
I've been playing with sorting algorithms a lot in recent months, so I
want to implement a *working* stable sort for Phobos which is broken at
the moment. I have a working library and I'm still adding to it. It's
much more complex than a simple merge sort, being over 300 lines of code
at the moment.

Working is better than broken.

- It's a natural merge sort, which is faster on partially sorted lists,
and adds little overhead for mostly random lists.
- It uses O(log n log n) additional space for merging.

That's 1024 when n is 4 billion. I think you can safely approximate it with alloca or a fixed-size stack-allocated array.

- I wrote it to sort random-access ranges *without* slicing, but I think
the exclusion of slicing makes it slower. I'm writing a separate
implementation which uses slicing and I'll keep it if it's much faster.

Having random access implies having slicing.

- To avoid multiple allocations, the user can allocate their own
temporary memory and pass it to the sort function.

If you need different allocation strategies, I suggest you make it a policy (like stability is).

- I decided against using pointers. While I could use them to write
highly optimized code for arrays, pointers can't be used in safe code
and don't work very well in CTFE at the moment.

Perhaps it's best to have two distinct implementations guarded by if (__ctfe). The runtime implementation can be @trusted.

Is it worth keeping the implementation *without* slicing? Many functions
in Phobos do require slicing, including the unstable sort, and I think
most random-access ranges do or could support slicing.

No need.

What would you prefer in terms of memory usage vs performance? O(n/2)
space is optimal for performance, O(1) (in-place) requires zero
allocations but is slower, and O(log n log n) provides a good balance.

The latter rounded up to a constant sounds best.

Should I implement concurrency? Merge sort is very easy to parallelize,
and being in-place or natural doesn't change this fact.

Let's save that for std.parallel_algorithm.

Should we take a different path and go for a basic merge sort or even
Timsort? I've considered writing a Timsort though that seems like
daunting task to me, so here's an implementation written in C -
https://github.com/swenson/sort

I don't know how your function's performance profile compares with timsort's.


Andrei

Reply via email to