Hi!

I'd like to propose adding an /x/exp package with in-place radix sort.
Maybe someday something like it gets into stdlib and backs sort.Ints and
company (great!), maybe not (works for me, just the review's worth it). I'm
posting here rather than opening a GitHub issue because earlier this year
other folks opened a related issue (github.com/golang/go/issues/14653) and
people there suggested taking discussion here.

As background, last year I put out github.com/twotwotwo/sorts. To use its
in-place radix sorts, you implement sort.Interface + a Key(int) method
returning a (u)int64, string, or []byte sort key at an index. In timings I
ran at the time, the ns/op delta with one goroutine was -38% for sorting
11M English strings (Wikipedia titles) and -71% for Benchmark1e6. It can do
parallel sorts too, which got a roughly -80% wall-time delta for both
strings (with 8 procs) and ints (with 2). More data from back then is at
http://bit.ly/2biQpDz . Haven't rerun all that but the order of magnitude
of the differences has been the same with current compilers and the
shellsort change.

Those sorts are in-place, there's no reflect/unsafe, and it can sort
[]MyThing not just []int/[]string (or non-slices, anything satisfying the
iface). It's got full test coverage from tests from the stdlib and new
ones. I think those things distinguish it from many of the past ideas re:
sorting. I'm not proposing that package for Go--busy API, trying to do too
much at once, some weird internals--but maybe it helps show the idea's
viable.

I think, in general, a great thing about Go is most stdlib packages scale
well--net/http's still a good pick when you get real traffic, say. Adding
stdlib sorts that are faster for large collections seems in that spirit.
They'd be no more niche than, say, index/suffixarray or big.Float. (I love
that Go has those, 'cause they're cool fundamental capabilities; I'm just
saying maybe faster sorts for larger datasets are, too.) I know arguments
against--don't add APIs just for constant-factor speedups, most data's
small, etc.--so maybe there's no real chance stdlib ever gets sort/radix or
such, but I figure I might as well make a pitch for it.

Experimenting, I pulled out *just* an in-place serial uint64 sort and it's
about as much code as stdlib's hybrid sort. I wrote helpers to adapt it to
sorting other types. (Even strings, via abbreviated-key sorting which
bit.ly/2biZ737 explains better than I can. That's faster in tests *and*
simpler than my "real" string sort, but allocates a big []uint64 so eh.)
Parallel sorts, a proper in-place string sort, etc. could each be tackled
independently. My scratchpad's at godoc.org/github.com/twotwotwo/sort/radix
but don't take my futzing around there too seriously (and *no one* use it
for real; breaking the API is the point of it and I didn't bring over the
test suite!).

Stepping back, I'm flexible on where to start, if there's anything to start
on. I should probably hear about the threshold question of whether radix or
parallel sorts are non-starters for the Go project (meaning, even /x/)
before getting deeper in the weeds. I may seem to answer stuff at 1/8 speed
because of life and work, but interested in what folks have to say.

Thanks for reading!

Randall

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to