On 9 Dec 2015 02:44, "Peter Geoghegan" <p...@heroku.com> wrote: > > I guess you mean insertion sort. What's the theoretical justification > for the change?
Er, right. Insertion sort. The sort networks I used here are optimal both in number of comparisons and depth. I suspect modern CPUs actually manage to do some of the comparisons in parallel even. I was experimenting with using SIMD registers and did a non SIMD implementation like this first and noticed it was doing 15% fewer comparisons than insertion sort and ran faster. That was for sets of 8, I'm not sure there's as much saving on smaller sets.