Re: Floating Point + Threads?

Robert Jacques Fri, 15 Apr 2011 23:11:22 -0700

On Fri, 15 Apr 2011 23:22:04 -0400, dsimcha <dsim...@yahoo.com> wrote:

I'm trying to debug an extremely strange bug whose symptoms appear in astd.parallelism example, though I'm not at all sure the root cause is instd.parallelism. The bug report is athttps://github.com/dsimcha/std.parallelism/issues/1#issuecomment-1011717.
Basically, the example in question sums up all the elements of a lazyrange (actually, std.algorithm.map) in parallel. It usestaskPool.reduce, which divides the summation into work units to beexecuted in parallel. When executed in parallel, the results of thesummation are non-deterministic after about the 12th decimal place, eventhough all of the following properties are true:
1.  The work is divided into work units in a deterministic fashion.
2. Within each work unit, the summation happens in a deterministicorder.
3. The final summation of the results of all the work units is done ina deterministic order.
4. The smallest term in the summation is about 5e-10. This means thedifference across runs is about two orders of magnitude smaller than thesmallest term. It can't be a concurrency bug where some terms sometimesget skipped.
5. The results for the individual tasks, not just the final summation,differ in the low-order bits. Each task is executed in a single thread.
6.  The rounding mode is apparently the same in all of the threads.
7. The bug appears even on machines with only one core, as long as thenumber of task pool threads is manually set to >0. Since it's a singlecore machine, it can't be a low level memory model issue.
What could possibly cause such small, non-deterministic differences infloating point results, given everything above? I'm just looking forsuggestions here, as I don't even know where to start hunting for a buglike this.

Well, on one hand floating point math is not cumulative, and running sumshave many known issues (I'd recommend looking up Khan summation). On thehand, it should be repeatably different.As for suggestions? First and foremost, you should always add small tolarge, so try using iota(n-1,-1,-1) instead of iota(n). Not only shouldthe answer be better, but if your error rate goes down, you have a goodidea of where the problem is. I'd also try isolating your implementation'snumerics, from the underlying concurrency. i.e. use a task pool of 1 anddon't let the host thread join it, so the entire job is done by oneworker. The other thing to try is isolation /removing map and iota fromthe equation.

Re: Floating Point + Threads?

Reply via email to