Re: Idempotent partition around median of 5?

Ola Fosheim Grøstad via Digitalmars-d Sat, 06 Feb 2016 02:45:52 -0800

On Saturday, 6 February 2016 at 07:06:27 UTC, Ivan Kazmenko wrote:

1. Primitive types (cheap swap, cheap comparison).
2. Heavy structs A (expensive swap, cheap comparison - e.g.,compare one field of primitive type).
3. Heavy structs B (expensive swap, expensive comparison -e.g., call a heavy external function).
4. Heavy classes (cheap swap - pointers only, expensivecomparison).
So there's perhaps no single best solution.

That's right, but other factors are more important: preventingpipeline stalls. If you are collecting from 5 differentcachelines in an array you are likely to get several 40-120cycles delays unless you do prefetching, and if you do, you needto have other instructions to fill in the latency gaps.

But also instructions have latency and concurrency issues. Whichis why your version did reasonably well as it made thecompares/swaps independent so that they could be concurrentlyscheduled.

Yet, Haswell has SIMD instructions that can do 8-16x 32-bitmax/min operations per cycle, with a latency of only 1 cycle, and4-8x 64bit compares with a latency of 1 cycle.

So if you use as small fixed N, like 5, it makes very littlesense to count compares/swaps.

What makes sense is to focus on how you can avoid branching andbuild an algorithm with no pipeline stalls.

If sorting large arrays you also might want to look atmulti-threaded parallel sort functions.

Re: Idempotent partition around median of 5?

Reply via email to