Re: Initial request for review of JDK-8006572 DoubleStream.sum()/average() implementations that reduce numerical errors

Joe Darcy Wed, 20 Nov 2013 00:22:44 -0800

Hi Paul,

On 11/18/2013 07:38 AM, Paul Sandoz wrote:

Hi Joe,


You can use the three arg form of collect for DoublePipeline.sum/average impls, 
which is already used for average:

     public final OptionalDouble average() {
         double[] avg = collect(() -> new double[2],
                                (ll, i) -> {
                                    ll[0]++;
                                    ll[1] += i;
                                },
                                (ll, rr) -> {
                                    ll[0] += rr[0];
                                    ll[1] += rr[1];
                                });
         return avg[0] > 0
                ? OptionalDouble.of(avg[1] / avg[0])
                : OptionalDouble.empty();
     }

That would be the most expedient way.

Thanks for the tip. For the moment, I'm feeling a bit expedient and usedthe array-based approach in an iteration of the change:


    http://cr.openjdk.java.net/~darcy/8006572.3/

This allows one of the kernels of the compensated summation logic to beshared in two of the use-sites.

For the test, all 6 scenarios fail on a JDK *without* the compensatedsummation code and all 6 pass with it; all the other java/util/streamsregression tests pass with the new code too.

Off-list, I've asked a numerically inclined colleague to look over thesummation logic and how the compensated summation states are combined.


Thanks,

-Joe

  However, for sum() it is somewhat unfortunate to have to create a double[1] 
box for each leaf. If you are willing to write a little more more code you can 
create your own double sum ReducingSink implementation as Brian suggests.


Testing wise you don't need to implement an Iterator, you can do the following:

   DoubleStream.iterate(base, e -> increment).limit(count)

It might be more convenient to test as follows:

Stream<Boolean> s = Stream.iterate(false, e -> true).limit(count); // [*]

   DoubleSummaryStatistics stats = s.collect(Collectors.summarizingDouble(e -> 
e ? increment : base)); // Cross fingers that compiles!

   Stream<Boolean> s = Stream.iterate(false, e -> true).limit(count);
   Double d = s.iterate(false, e -> 
true).limit(count)..collect(Collectors.summingDouble(e ? increment : base));

Thereby covering both Collector implementations.

I guess it might be harder to test the combining step using parallel streams 
since combining will be platform dependent (depends on #cores) unless you track 
how things are combined. Perhaps the Collector instance could be tested 
directly with combination?

Paul.

[*] Another way is to use stream concatenation:

   Stream.concat(Stream.of(false), IntStream.range(1, count).mapToObj(e -> 
true))
   Stream.concat(Stream.of(false), Stream.generate(() -> true)).limit(count)


On Nov 14, 2013, at 9:08 AM, Joe Darcy <[email protected]> wrote:

Hello,

Please take an initial look over a fix for

    JDK-8006572 DoubleStream.sum() & DoubleSummaryStats implementations that 
reduce numerical errors
    http://cr.openjdk.java.net/~darcy/8006572.0/

The basic approach is to use compensated summation

    http://en.wikipedia.org/wiki/Kahan_summation_algorithm

to computed streams-related sum and average statistics in the various locations 
that this can be done.

All existing streams tests pass and new newly-written test passes too.

I believe the DoubleSummaryStatistics.java portion, including the test, is 
fully review-worthy. In the test, for the sample computation in question, the 
naive summation implementation had a error of 500,000 ulps compared to 2 ups 
with the new implementation.

Two other locations I've found where this summation technique should be used 
are in

    java.util.stream.Collectors.{summingDouble, averagingDouble}

and

    java.util.stream.DoublePipeline.{sum, average}

DoublePipeline is the primary implementation class of DoubleStream.

For Collectors, the proposed code is a fairly clear adaptation of how the 
current code passes state around; there is not currently a dedicated test for 
the new summation technique in this location.

I'm new to the streams API so for DoublePipeline I don't know the idiomatic way 
to phrase the collect I want to perform over the code. (Based on my current 
understanding, I believe I want to perform a collect rather than a reduce since 
for the compensated summation I need to maintain some additional state.) 
Guidance here welcome.

Thanks,

-Joe

Re: Initial request for review of JDK-8006572 DoubleStream.sum()/average() implementations that reduce numerical errors

Reply via email to