btw, on my corei5, in debug build,
reduce (using double): 11msec
non_parallel: 37msec
parallel with atomicOp: 123msec

so, that is the reason for using parallel reduce, assuming the ulong range thing will get fixed.

Reply via email to