On Mon, Dec 8, 2025 at 9:52 AM Chao Li <[email protected]> wrote:
> First, I changed my direction and implemented the in-place switching in the 
> other way, where I did a way like chained-switching. Say starting from item0, 
> for example, switching item0 to item5, then check where item5 should be 
> switched to, and makes the switch, till an item is switch to position 0. See 
> my implementation in other-implemation.diff if you are interested in it. This 
> time, I eyeball checked the sort result and confirmed the correction. But my 
> implementation is slightly slower than your implementation, based on the same 
> test procedure I described in my previous email, my implementation is roughly 
> ~3% slower your implementation. So I think that at least proves your current 
> implementation in v5 has been perfectly fine tuned.

That shouldn't be surprising, since the way you describe is basically
"American flag sort", which is much older, and the innovation of
ska_byte_sort was to recognize that this is bad for CPU pipelining.
That was explained in detail in the blog post I linked to in my first
email.

Also notice that by attaching a .diff, the CF bot tries and fails to
apply that to master, and has been complaining that my patch needs a
rebase. Please don't do that again.

--
John Naylor
Amazon Web Services


Reply via email to