On 01/14/2013 04:06 PM, Paul Sandoz wrote:
On Jan 14, 2013, at 3:38 PM, Peter Levart <[email protected]> wrote:
I think these classes are targeted at use cases such as gathering real-time
statistics of profiling or business data, where data comes in from various
sources in real-time and statistics are sampled in real-time too...
For bulk processing, the new streams API seems more appropriate. I think the
user might be able to control the order of operations applied
(j.u.stream.Spliterator API indicates that the spliting of work among FJP
threads could be controled and we can hope that the order of reduction of
intermediary results would also be controllable by the user or at least
defined).
Can streams API developers shed some light on that?
DoubleStream (when added) will have a sum method that will defer to a reduce,
so elements will be processed in order, but the grouping of elements depends on
how the input is split and to what depth, and the user will have no control
over that.
Unless user implements his own Spliterator, right?
It is similar in concept to the IntStream.sum method, but i expect for
DoubleStream the collectors API will be used with a double sum collector impl
that compensates for errors and supports merging (in order) of intermediate sum
values.
Paul.
Regards, Peter
On 01/14/2013 07:18 AM, Howard Lovatt wrote:
If you make a binary tree and sum it, the rounding errors aren't that bad and
this algorithm is easy to parallelise.
Higham, Nicholas J 1993 the accuracy of floating point summation SIAM Sci Comp
14 (4) 783-799
Also see Wikipedia for a description of Kahan summation and a general
discussion of this topic.
Why not commit to binary tree reductions and that will allow everyone to
understand what is going on and design lambdas accordingly.
-- Howard.
Sent from my iPad
On 13/01/2013, at 2:04 AM, Doug Lea <[email protected]> wrote:
On 01/11/13 21:37, Joe Darcy wrote:
I would prefer to cautionary note along the lines of "if you want numerical
accuracy, take care to use a summation algorithm with defined worst-case
behavior."
(Varying magnitude is not so much of a problem if you add things up in the right
order.)
Thanks. I do not believe such an algorithm exists, because
no ordering control is possible, and all other known accuracy
improvements (like Kahn) require multiword atomicity, which we
explicitly do not provide.
Which leaves me thinking that the current disclaimer (below)
is the best we can do.
-Doug
"The order of accumulation within or across threads is not guaranteed.
Thus, this class may not be applicable if numerical stability is
required, especially when combining values of substantially different
orders of magnitude."