aherbert commented on code in PR #52: URL: https://github.com/apache/commons-statistics/pull/52#discussion_r1301693640
########## commons-statistics-descriptive/src/main/java/org/apache/commons/statistics/descriptive/SumOfSquaredDeviations.java: ########## @@ -0,0 +1,127 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.commons.statistics.descriptive; + +/** + * Computes the sum of squared deviations from the sample mean. This + * statistic is related to the second moment. + * + * <p> + * The following recursive updating formula is used: + * <p> + * Let <ul> + * <li> dev = (current obs - previous mean) </li> + * <li> n = number of observations (including current obs) </li> + * </ul> + * Then + * <p> + * new value = old value + dev^2 * (n - 1) / n. + * <p> + * + * Returns the sum of squared deviations of all values seen so far. + * + * <p><strong>Note that this implementation is not synchronized.</strong> If + * multiple threads access an instance of this class concurrently, and at least + * one of the threads invokes the <code>accept()</code> or + * <code>combine()</code> method, it must be synchronized externally. + * + * <p>However, it is safe to use <code>accept()</code> and <code>combine()</code> + * as <code>accumulator</code> and <code>combiner</code> functions of + * {@link java.util.stream.Collector Collector} on a parallel stream, + * because the parallel implementation of {@link java.util.stream.Stream#collect Stream.collect()} + * provides the necessary partitioning, isolation, and merging of results for + * safe and efficient parallel execution. + */ +class SumOfSquaredDeviations implements DoubleStatistic, DoubleStatisticAccumulator<SumOfSquaredDeviations> { + /** Sum of squared deviations of the values that have been added. */ + private double squaredDevSum; + + /** First moment instance which holds the first moment of values that have been added. + * This is required because {@link FirstMoment#getDev()} and {@link FirstMoment#getDevNormalizedByN()} are + * used in the updating the sum of squared deviations. + */ + private final FirstMoment firstMoment; + + /** + * Create a FirstMoment instance. + */ + SumOfSquaredDeviations() { + firstMoment = new FirstMoment(); + } + + /** + * Create a SumOfSquaredDeviations instance with the given sum of + * squared deviations and a FirstMoment instance. + * + * @param squaredDevSum Sum of squared deviations. + * @param mean Mean of values. + * @param n Number of values. + * @param nonFiniteValue Sum of values. + */ + SumOfSquaredDeviations(double squaredDevSum, double mean, long n, double nonFiniteValue) { + this.squaredDevSum = squaredDevSum; + firstMoment = new FirstMoment(mean, n, nonFiniteValue); + } + + /** + * Updates the state of the statistic to reflect the addition of {@code value}. + * @param value Value. + */ + @Override + public void accept(double value) { + firstMoment.accept(value); + if (Double.isInfinite(value)) { + squaredDevSum = Double.NaN; + return; + } + final double n = firstMoment.getN(); + squaredDevSum += (n - 1) * firstMoment.getDev() * firstMoment.getDevNormalizedByN(); + } + + /** + * Gets the sum of squared deviations of all input values. + * + * @return {@code SumOfSquaredDeviations} of all values seen so far. + */ + @Override + public double getAsDouble() { + return squaredDevSum; + } + + /** {@inheritDoc} */ + @Override + public SumOfSquaredDeviations combine(SumOfSquaredDeviations other) { + final long oldN = firstMoment.getN(); + final long otherN = other.getN(); + if (oldN == 0) { + squaredDevSum = other.squaredDevSum; + } else if (otherN != 0) { + final double sqDiffOfMean = + Math.pow((other.firstMoment.getAsDouble() * 0.5 - firstMoment.getAsDouble() * 0.5) * 2, 2); Review Comment: Using Math.pow here is inefficient. A single multiply will return the same result much faster. Math.pow is an advantage for higher order powers where it will be more exact than repeat multiplication. There is no need for scaling by 0.5 here. If the difference overflows then the end result is the same since you immediately square the difference. ```java double dx = other.firstMoment.getAsDouble() - firstMoment.getAsDouble(); final double sqDiffOfMean = dx * dx; ``` Note that this highlights a bug in the FirstMoment. I uncommented the scaling in the accept method there and all the unit test pass. So we are not really testing the scaling is needed. This case is missing: ```java Arguments.of(new double[] {Double.MAX_VALUE, Double.MAX_VALUE, -Double.MAX_VALUE}), ``` This is a situation where multiple values have been encountered and then the difference from the current mean is infinite. However half the difference is finite. Thus the difference must be computed using half of its value and then rescaled at the end. This can be fixed in FirstMoment using: ```java public void accept(double value) { n++; nonFiniteValue += value; // To prevent overflow, dev is computed by scaling down and then scaling up. // We choose to scale down and scale up by a factor of two to ensure that the scaling is lossless. dev = (value * 0.5 - m1 * 0.5); // Here nDev cannot overflow as dev is <= MAX_VALUE when n > 1; or <= MAX_VALUE / 2 when n = 1 nDev = (dev / n) * 2; m1 += nDev; // Rescale the deviation dev *= 2; } public FirstMoment combine(FirstMoment other) { if (n == 0) { n = other.n; dev = other.dev; nDev = other.nDev; m1 = other.m1; nonFiniteValue = other.nonFiniteValue; } else if (other.n != 0) { n += other.n; nonFiniteValue += other.nonFiniteValue; // To prevent overflow, dev is computed by scaling down and then scaling up. dev = (other.m1 * 0.5 - m1 * 0.5); // In contrast to the accept method, here nDev can be close to MAX_VALUE // if the weight (other.n / n) approaches 1. So we cannot yet rescale nDev and // instead combine with m1 scaled down. nDev = dev * ((double) other.n / n); m1 = m1 * 0.5 + nDev; // Rescale m1 *= 2; dev *= 2; nDev *= 2; } return this; } ``` The `Mean.of(double[])` method also fails on this data because the correction is an infinite sum. So we have to fix this too: ```java // Correction may be infinite correction = Double.isFinite(correction) ? correction : 0; ``` Note: We could be smarter here and compute the next power of 2 from values.length. Then scale down the values and the initial mean estimate, compute the correction and rescale the correction. This is a bit overkill so it is simpler to ignore correcting very large means. ########## commons-statistics-descriptive/src/main/java/org/apache/commons/statistics/descriptive/FirstMoment.java: ########## @@ -35,8 +35,8 @@ * * <p><strong>Note that this implementation is not synchronized.</strong> If * multiple threads access an instance of this class concurrently, and at least - * one of the threads invokes the <code>increment()</code> or - * <code>clear()</code> method, it must be synchronized externally. + * one of the threads invokes the <code>accept()</code> or Review Comment: To prevent these types of error we should change this to a javadoc link. The links will be verified by the javadoc tool. E.g ```java /* * ... * one of the threads invokes the {@link java.util.function.DoubleConsumer#accept(double) accept} * or {@link DoubleStatisticAccumulator#combine(DoubleStatistic) combine} method, it must * be synchronized externally. * ... */ ``` ########## commons-statistics-descriptive/src/main/java/org/apache/commons/statistics/descriptive/SumOfSquaredDeviations.java: ########## @@ -0,0 +1,127 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.commons.statistics.descriptive; + +/** + * Computes the sum of squared deviations from the sample mean. This + * statistic is related to the second moment. + * + * <p> + * The following recursive updating formula is used: + * <p> + * Let <ul> + * <li> dev = (current obs - previous mean) </li> + * <li> n = number of observations (including current obs) </li> + * </ul> + * Then + * <p> + * new value = old value + dev^2 * (n - 1) / n. + * <p> + * + * Returns the sum of squared deviations of all values seen so far. + * + * <p><strong>Note that this implementation is not synchronized.</strong> If + * multiple threads access an instance of this class concurrently, and at least + * one of the threads invokes the <code>accept()</code> or + * <code>combine()</code> method, it must be synchronized externally. + * + * <p>However, it is safe to use <code>accept()</code> and <code>combine()</code> + * as <code>accumulator</code> and <code>combiner</code> functions of + * {@link java.util.stream.Collector Collector} on a parallel stream, + * because the parallel implementation of {@link java.util.stream.Stream#collect Stream.collect()} + * provides the necessary partitioning, isolation, and merging of results for + * safe and efficient parallel execution. + */ +class SumOfSquaredDeviations implements DoubleStatistic, DoubleStatisticAccumulator<SumOfSquaredDeviations> { + /** Sum of squared deviations of the values that have been added. */ + private double squaredDevSum; + + /** First moment instance which holds the first moment of values that have been added. + * This is required because {@link FirstMoment#getDev()} and {@link FirstMoment#getDevNormalizedByN()} are + * used in the updating the sum of squared deviations. + */ + private final FirstMoment firstMoment; + + /** + * Create a FirstMoment instance. + */ + SumOfSquaredDeviations() { + firstMoment = new FirstMoment(); + } + + /** + * Create a SumOfSquaredDeviations instance with the given sum of + * squared deviations and a FirstMoment instance. + * + * @param squaredDevSum Sum of squared deviations. + * @param mean Mean of values. + * @param n Number of values. + * @param nonFiniteValue Sum of values. + */ + SumOfSquaredDeviations(double squaredDevSum, double mean, long n, double nonFiniteValue) { + this.squaredDevSum = squaredDevSum; + firstMoment = new FirstMoment(mean, n, nonFiniteValue); + } + + /** + * Updates the state of the statistic to reflect the addition of {@code value}. + * @param value Value. + */ + @Override + public void accept(double value) { + firstMoment.accept(value); + if (Double.isInfinite(value)) { Review Comment: We should find a way to avoid if statements in the main accept method. This should be a branchless excecution of code for speed. What matters is that the getAsDouble knows that a non-finite value has been encountered. This information is maintained by the FirstMoment. So I removed this if statement. The FirstMoment will never overflow (by design) when all inputs are finite. So if the final first moment is finite then you know no infinite values were encountered. The tests work if you use the first moment when returning the sum of the squared deviations: ```java return Double.isFinite(firstMoment.getAsDouble()) ? squaredDevSum : Double.NaN; ``` ########## commons-statistics-descriptive/src/test/java/org/apache/commons/statistics/descriptive/VarianceTest.java: ########## @@ -0,0 +1,320 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.commons.statistics.descriptive; + +import java.math.BigDecimal; +import java.math.MathContext; +import java.util.Arrays; +import java.util.stream.Stream; +import org.apache.commons.rng.UniformRandomProvider; +import org.junit.jupiter.api.Assertions; +import org.junit.jupiter.api.Test; +import org.junit.jupiter.params.ParameterizedTest; +import org.junit.jupiter.params.provider.Arguments; +import org.junit.jupiter.params.provider.MethodSource; + +/** + * Test for {@link Variance}. + */ +final class VarianceTest { + + @Test + void testEmpty() { + Variance var = Variance.create(); + Assertions.assertEquals(Double.NaN, var.getAsDouble()); + } + + @ParameterizedTest + @MethodSource(value = "testVariance") + void testVariance(double[] values) { + double expected = computeExpectedVariance(values); + Variance var = Variance.create(); + for (double value : values) { + var.accept(value); + } + TestHelper.assertEquals(expected, var.getAsDouble(), 7, () -> "variance"); + TestHelper.assertEquals(expected, Variance.of(values).getAsDouble(), 5, () -> "of (values)"); + } + + @ParameterizedTest + @MethodSource(value = "testVariance") + void testParallelStream(double[] values) { + double expected = computeExpectedVariance(values); + double actual = Arrays.stream(values) + .parallel() + .collect(Variance::create, Variance::accept, Variance::combine) + .getAsDouble(); + TestHelper.assertEquals(expected, actual, 7, () -> "parallel stream"); + } + + @ParameterizedTest + @MethodSource(value = "testVariance") + void testVarianceRandomOrder(double[] values) { + UniformRandomProvider rng = TestHelper.createRNG(); + for (int i = 1; i <= 10; i++) { + testVariance(TestHelper.shuffle(rng, values)); + testParallelStream(TestHelper.shuffle(rng, values)); + } + } + + static Stream<Arguments> testVariance() { + return Stream.of( + Arguments.of(new double[] {10, 8, 13, 9, 11, 14, 6, 4, 12, 7, 5}), + Arguments.of(new double[] {8.04, 6.95, 7.58, 8.81, 8.33, 9.96, 7.24, 4.26, 10.84, 4.82, 5.68}), + Arguments.of(new double[] {9.14, 8.14, 8.74, 8.77, 9.26, 8.10, 6.13, 3.10, 9.13, 7.26, 4.74, 7.46, 6.77, 12.74, 7.11, 7.81, 8.84, 6.08, 5.39, 8.15, 6.42, 5.73}), + Arguments.of(new double[] {8, 8, 8, 8, 8, 8, 8, 19, 8, 8, 8}), + Arguments.of(new double[] {6.58, 5.76, 7.71, 8.84, 8.47, 7.04, 5.25, 12.50, 5.56, 7.91, 6.89}), + Arguments.of(new double[] {0, 0, 0.0}), + Arguments.of(new double[] {1, -7, 6}), + Arguments.of(new double[] {1, 7, -15, 3}), + Arguments.of(new double[] {2, 2, 2, 2}), + Arguments.of(new double[] {2.3}), + Arguments.of(new double[] {3.14, 2.718, 1.414}), + Arguments.of(new double[] {12.5, 12.0, 11.8, 14.2, 14.9, 14.5, 21.0, 8.2, 10.3, 11.3, 14.1, 9.9, 12.2, 12.0, 12.1, 11.0, 19.8, 11.0, 10.0, 8.8, 9.0, 12.3}), + Arguments.of(new double[] {-0.0, +0.0}), + Arguments.of(new double[] {0.0, -0.0}), + Arguments.of(new double[] {0.0, +0.0}), + Arguments.of(new double[] {0.001, 0.0002, 0.00003, 10000.11, 0.000004}), + Arguments.of(new double[] {10E-50, 5E-100, 25E-200, 35.345E-50}), + // Overflow of the sum + Arguments.of(new double[] {Double.MAX_VALUE, Double.MAX_VALUE}), + Arguments.of(new double[] {-Double.MAX_VALUE, -Double.MAX_VALUE}), + Arguments.of(new double[] {Double.MAX_VALUE, 1}), + Arguments.of(new double[] {-Double.MAX_VALUE, 1, 1}), + Arguments.of(new double[] {-Double.MAX_VALUE, -1, 1}), + Arguments.of(new double[] {Double.MAX_VALUE, -1}), + Arguments.of(new double[] {Double.MAX_VALUE, -Double.MAX_VALUE}), + Arguments.of(new double[] {1, -Double.MAX_VALUE}), + Arguments.of(new double[] {1, 1, 1, -Double.MAX_VALUE}), + Arguments.of(new double[] {Double.MAX_VALUE, Double.MAX_VALUE / 2}) + ); + } + + @ParameterizedTest + @MethodSource(value = "testVarianceNonFinite") + void testVarianceNonFinite(double[] values, double expected) { + Variance var = Variance.create(); + for (double value : values) { + var.accept(value); + } + Assertions.assertEquals(expected, var.getAsDouble(), "variance non-finite"); + Assertions.assertEquals(expected, Variance.of(values).getAsDouble(), "of (values) non-finite"); + } + + @ParameterizedTest + @MethodSource(value = "testVarianceNonFinite") + void testParallelStreamNonFinite(double[] values, double expected) { + double ans = Arrays.stream(values) + .parallel() + .collect(Variance::create, Variance::accept, Variance::combine) + .getAsDouble(); + Assertions.assertEquals(expected, ans, "parallel stream non-finite"); + } + + @ParameterizedTest + @MethodSource(value = "testVarianceNonFinite") + void testVarianceRandomOrderNonFinite(double[] values, double expected) { + UniformRandomProvider rng = TestHelper.createRNG(); + for (int i = 1; i <= 10; i++) { + testVarianceNonFinite(TestHelper.shuffle(rng, values), expected); + testParallelStreamNonFinite(TestHelper.shuffle(rng, values), expected); + } + } + static Stream<Arguments> testVarianceNonFinite() { + return Stream.of( + Arguments.of(new double[] {}, Double.NaN), + Arguments.of(new double[] {Double.POSITIVE_INFINITY, Double.NEGATIVE_INFINITY}, + Double.NaN), + Arguments.of(new double[] {Double.POSITIVE_INFINITY, Double.POSITIVE_INFINITY}, + Double.NaN), + Arguments.of(new double[] {Double.NEGATIVE_INFINITY, Double.NEGATIVE_INFINITY}, + Double.NaN), + Arguments.of(new double[] {Double.POSITIVE_INFINITY, Double.MAX_VALUE}, + Double.NaN), + Arguments.of(new double[] {Double.NEGATIVE_INFINITY, -Double.MIN_VALUE}, + Double.NaN), + Arguments.of(new double[] {Double.NaN, +0.0d, -0.0d, Double.POSITIVE_INFINITY, + Double.NEGATIVE_INFINITY}, Double.NaN), + Arguments.of(new double[] {Double.NaN, 34.56, 89.74}, Double.NaN), + Arguments.of(new double[] {34.56, Double.NaN, 89.74}, Double.NaN), + Arguments.of(new double[] {34.56, 89.74, Double.NaN}, Double.NaN), + Arguments.of(new double[] {Double.NaN, 3.14, Double.NaN, Double.NaN}, + Double.NaN), + Arguments.of(new double[] {Double.NaN, Double.NaN, Double.NaN}, Double.NaN), + Arguments.of(new double[] {Double.NEGATIVE_INFINITY, Double.MAX_VALUE}, + Double.NaN), + Arguments.of(new double[] {Double.POSITIVE_INFINITY, Double.POSITIVE_INFINITY, + Double.POSITIVE_INFINITY, Double.POSITIVE_INFINITY}, Double.NaN), + Arguments.of(new double[] {-Double.MAX_VALUE, Double.POSITIVE_INFINITY}, + Double.NaN) + ); + } + + @ParameterizedTest + @MethodSource(value = "testCombine") + void testCombine(double[] array1, double[] array2) { + final double[] combinedArray = TestHelper.concatenate(array1, array2); + final double expected = computeExpectedVariance(combinedArray); + Variance var1 = Variance.create(); + Variance var2 = Variance.create(); + Arrays.stream(array1).forEach(var1); + Arrays.stream(array2).forEach(var2); + final double var2BeforeCombine = var2.getAsDouble(); + var1.combine(var2); + TestHelper.assertEquals(expected, var1.getAsDouble(), 7, () -> "combine"); + Assertions.assertEquals(var2BeforeCombine, var2.getAsDouble()); + } + + @ParameterizedTest + @MethodSource(value = "testCombine") + void testCombineRandomOrder(double[] array1, double[] array2) { + UniformRandomProvider rng = TestHelper.createRNG(); + double[] data = TestHelper.concatenate(array1, array2); + int n = array1.length; + for (int i = 1; i <= 10; i++) { + for (int j = 1; j <= 10; j++) { + TestHelper.shuffle(rng, array1); + TestHelper.shuffle(rng, array2); + testCombine(array1, array2); + } + TestHelper.shuffle(rng, data); + System.arraycopy(data, 0, array1, 0, n); + System.arraycopy(data, n, array2, 0, array2.length); + testCombine(array1, array2); + } + } + + @ParameterizedTest + @MethodSource(value = "testCombine") + void testArrayOfArrays(double[] array1, double[] array2) { + final double[] combinedArray = TestHelper.concatenate(array1, array2); + final double expected = computeExpectedVariance(combinedArray); + final double[][] values = {array1, array2}; + double actual = Arrays.stream(values) + .map(Variance::of) + .reduce(Variance::combine) + .map(Variance::getAsDouble) + .orElseThrow(RuntimeException::new); + TestHelper.assertEquals(expected, actual, 2, () -> "array of arrays combined variance"); + } + + static Stream<Arguments> testCombine() { + return Stream.of( + Arguments.of(new double[] {}, new double[] {1}), + Arguments.of(new double[] {1}, new double[] {}), + Arguments.of(new double[] {}, new double[] {1, 7, -15, 3}), + Arguments.of(new double[] {0}, new double[] {0, 0.0}), + Arguments.of(new double[] {4, 8, -6, 3, 18}, new double[] {1, -7, 6}), + Arguments.of(new double[] {10, 8, 13, 9, 11, 14, 6, 4, 12, 7, 5}, new double[] {8, 8, 8, 8, 8, 8, 8, 19, 8, 8, 8}), + Arguments.of(new double[] {10, 8, 13, 9, 11, 14, 6, 4, 12, 7, 5}, new double[] {7.46, 6.77, 12.74, 7.11, 7.81, 8.84, 6.08, 5.39, 8.15, 6.42, 5.73}), + Arguments.of(new double[] {6.0, -1.32, -5.78, 8.967, 13.32, -9.67, 0.14, 7.321, 11.456, -3.111}, new double[] {2, 2, 2, 2}), + Arguments.of(new double[] {2.3}, new double[] {-42, 10, -88, 5, -17}), + Arguments.of(new double[] {-20, 34.983, -12.745, 28.12, -8.34, 42, -4, 16}, new double[] {3.14, 2.718, 1.414}), + Arguments.of(new double[] {12.5, 12.0, 11.8, 14.2, 14.9, 14.5, 21.0, 8.2, 10.3, 11.3, 14.1, 9.9}, new double[] {12.2, 12.0, 12.1, 11.0, 19.8, 11.0, 10.0, 8.8, 9.0, 12.3}), + Arguments.of(new double[] {-0.0}, new double[] {+0.0}), + Arguments.of(new double[] {0.0}, new double[] {-0.0}), + Arguments.of(new double[] {0.0}, new double[] {+0.0}), + Arguments.of(new double[] {10E-50, 5E-100}, new double[] {25E-200, 35.345E-50}), + Arguments.of(new double[] {Double.MAX_VALUE}, new double[] {Double.MAX_VALUE}), + Arguments.of(new double[] {-Double.MAX_VALUE}, new double[] {-Double.MAX_VALUE}), + Arguments.of(new double[] {-Double.MAX_VALUE, 1}, new double[] {1}), + Arguments.of(new double[] {Double.MAX_VALUE, 3.1415E153}, new double[] {}), + Arguments.of(new double[] {Double.MAX_VALUE}, new double[] {-Double.MAX_VALUE}), + Arguments.of(new double[] {1}, new double[] {-Double.MAX_VALUE}), + Arguments.of(new double[] {1, 1, 1}, new double[] {-Double.MAX_VALUE}), + Arguments.of(new double[] {Double.MAX_VALUE}, new double[] {1, 1E300}) + ); + } + + @ParameterizedTest + @MethodSource(value = "testCombineNonFinite") + void testCombineNonFinite(double[][] values, double expected) { + Variance var1 = Variance.create(); + Variance var2 = Variance.create(); + Arrays.stream(values[0]).forEach(var1); + Arrays.stream(values[1]).forEach(var2); + double mean2BeforeCombine = var2.getAsDouble(); + var1.combine(var2); + Assertions.assertEquals(expected, var1.getAsDouble(), "combine non-finite"); + Assertions.assertEquals(mean2BeforeCombine, var2.getAsDouble()); + } + + @ParameterizedTest + @MethodSource(value = "testCombineNonFinite") + void testCombineRandomOrderNonFinite(double[][] values, double expected) { + UniformRandomProvider rng = TestHelper.createRNG(); + double[] data = TestHelper.concatenate(values[0], values[1]); + int n = values[0].length; + for (int i = 1; i <= 10; i++) { + for (int j = 1; j <= 10; j++) { + TestHelper.shuffle(rng, values[0]); + TestHelper.shuffle(rng, values[1]); + testCombineNonFinite(values, expected); + } + TestHelper.shuffle(rng, data); + System.arraycopy(data, 0, values[0], 0, n); + System.arraycopy(data, n, values[1], 0, values[1].length); + testCombineNonFinite(values, expected); + } + } + + @ParameterizedTest + @MethodSource(value = "testCombineNonFinite") + void testArrayOfArraysNonFinite(double[][] values, double expected) { + double actual = Arrays.stream(values) + .map(Variance::of) + .reduce(Variance::combine) + .map(Variance::getAsDouble) + .orElseThrow(RuntimeException::new); + Assertions.assertEquals(expected, actual, "array of arrays combined variance non-finite"); + } + + static Stream<Arguments> testCombineNonFinite() { + return Stream.of( + Arguments.of(new double[][] {{}, {}}, Double.NaN), + Arguments.of(new double[][] {{Double.POSITIVE_INFINITY}, {Double.NEGATIVE_INFINITY}}, Double.NaN), + Arguments.of(new double[][] {{Double.POSITIVE_INFINITY}, {Double.POSITIVE_INFINITY}}, Double.NaN), + Arguments.of(new double[][] {{Double.NEGATIVE_INFINITY}, {Double.NEGATIVE_INFINITY}}, Double.NaN), + Arguments.of(new double[][] {{Double.POSITIVE_INFINITY}, {Double.MAX_VALUE}}, Double.NaN), + Arguments.of(new double[][] {{-Double.MAX_VALUE}, {Double.POSITIVE_INFINITY}}, Double.NaN), + Arguments.of(new double[][] {{Double.NEGATIVE_INFINITY}, {-Double.MIN_VALUE}}, Double.NaN), + Arguments.of(new double[][] {{Double.NaN, 34.56, 89.74}, {Double.NaN}}, Double.NaN), + Arguments.of(new double[][] {{34.56}, {Double.NaN, 89.74}}, Double.NaN), + Arguments.of(new double[][] {{34.56, 89.74}, {Double.NaN, Double.NaN}}, Double.NaN), + Arguments.of(new double[][] {{Double.NaN, 3.14, Double.NaN, Double.NaN}, {}}, Double.NaN), + Arguments.of(new double[][] {{Double.NaN, Double.NaN, Double.NaN}, {Double.NaN, Double.NaN, Double.NaN}}, Double.NaN), + Arguments.of(new double[][] {{Double.NEGATIVE_INFINITY, -Double.MAX_VALUE, -Double.MIN_VALUE}, {Double.MAX_VALUE, Double.MIN_VALUE}}, Double.NaN) + ); + } + + // Helper function to compute the expected value of Variance using BigDecimal. + static double computeExpectedVariance(double[] values) { + long n = values.length; + if (n == 1) { + return BigDecimal.ZERO.doubleValue(); + } + double mean = TestHelper.computeExpectedMean(values); Review Comment: This helper method should return the raw BigDecimal. Then you can convert it by rounding. ```java BigDecimal mean = TestHelper.computeExpectedMean(values).round(MathContext.DECIMAL128); ``` ########## commons-statistics-descriptive/src/test/java/org/apache/commons/statistics/descriptive/VarianceTest.java: ########## @@ -0,0 +1,320 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.commons.statistics.descriptive; + +import java.math.BigDecimal; +import java.math.MathContext; +import java.util.Arrays; +import java.util.stream.Stream; +import org.apache.commons.rng.UniformRandomProvider; +import org.junit.jupiter.api.Assertions; +import org.junit.jupiter.api.Test; +import org.junit.jupiter.params.ParameterizedTest; +import org.junit.jupiter.params.provider.Arguments; +import org.junit.jupiter.params.provider.MethodSource; + +/** + * Test for {@link Variance}. + */ +final class VarianceTest { + + @Test + void testEmpty() { + Variance var = Variance.create(); + Assertions.assertEquals(Double.NaN, var.getAsDouble()); + } + + @ParameterizedTest + @MethodSource(value = "testVariance") + void testVariance(double[] values) { + double expected = computeExpectedVariance(values); + Variance var = Variance.create(); + for (double value : values) { + var.accept(value); + } + TestHelper.assertEquals(expected, var.getAsDouble(), 7, () -> "variance"); + TestHelper.assertEquals(expected, Variance.of(values).getAsDouble(), 5, () -> "of (values)"); + } + + @ParameterizedTest + @MethodSource(value = "testVariance") + void testParallelStream(double[] values) { + double expected = computeExpectedVariance(values); + double actual = Arrays.stream(values) + .parallel() + .collect(Variance::create, Variance::accept, Variance::combine) + .getAsDouble(); + TestHelper.assertEquals(expected, actual, 7, () -> "parallel stream"); + } + + @ParameterizedTest + @MethodSource(value = "testVariance") + void testVarianceRandomOrder(double[] values) { + UniformRandomProvider rng = TestHelper.createRNG(); + for (int i = 1; i <= 10; i++) { + testVariance(TestHelper.shuffle(rng, values)); + testParallelStream(TestHelper.shuffle(rng, values)); + } + } + + static Stream<Arguments> testVariance() { + return Stream.of( + Arguments.of(new double[] {10, 8, 13, 9, 11, 14, 6, 4, 12, 7, 5}), + Arguments.of(new double[] {8.04, 6.95, 7.58, 8.81, 8.33, 9.96, 7.24, 4.26, 10.84, 4.82, 5.68}), + Arguments.of(new double[] {9.14, 8.14, 8.74, 8.77, 9.26, 8.10, 6.13, 3.10, 9.13, 7.26, 4.74, 7.46, 6.77, 12.74, 7.11, 7.81, 8.84, 6.08, 5.39, 8.15, 6.42, 5.73}), + Arguments.of(new double[] {8, 8, 8, 8, 8, 8, 8, 19, 8, 8, 8}), + Arguments.of(new double[] {6.58, 5.76, 7.71, 8.84, 8.47, 7.04, 5.25, 12.50, 5.56, 7.91, 6.89}), + Arguments.of(new double[] {0, 0, 0.0}), + Arguments.of(new double[] {1, -7, 6}), + Arguments.of(new double[] {1, 7, -15, 3}), + Arguments.of(new double[] {2, 2, 2, 2}), + Arguments.of(new double[] {2.3}), + Arguments.of(new double[] {3.14, 2.718, 1.414}), + Arguments.of(new double[] {12.5, 12.0, 11.8, 14.2, 14.9, 14.5, 21.0, 8.2, 10.3, 11.3, 14.1, 9.9, 12.2, 12.0, 12.1, 11.0, 19.8, 11.0, 10.0, 8.8, 9.0, 12.3}), + Arguments.of(new double[] {-0.0, +0.0}), + Arguments.of(new double[] {0.0, -0.0}), + Arguments.of(new double[] {0.0, +0.0}), + Arguments.of(new double[] {0.001, 0.0002, 0.00003, 10000.11, 0.000004}), + Arguments.of(new double[] {10E-50, 5E-100, 25E-200, 35.345E-50}), + // Overflow of the sum + Arguments.of(new double[] {Double.MAX_VALUE, Double.MAX_VALUE}), + Arguments.of(new double[] {-Double.MAX_VALUE, -Double.MAX_VALUE}), + Arguments.of(new double[] {Double.MAX_VALUE, 1}), + Arguments.of(new double[] {-Double.MAX_VALUE, 1, 1}), + Arguments.of(new double[] {-Double.MAX_VALUE, -1, 1}), + Arguments.of(new double[] {Double.MAX_VALUE, -1}), + Arguments.of(new double[] {Double.MAX_VALUE, -Double.MAX_VALUE}), + Arguments.of(new double[] {1, -Double.MAX_VALUE}), + Arguments.of(new double[] {1, 1, 1, -Double.MAX_VALUE}), + Arguments.of(new double[] {Double.MAX_VALUE, Double.MAX_VALUE / 2}) + ); + } + + @ParameterizedTest + @MethodSource(value = "testVarianceNonFinite") + void testVarianceNonFinite(double[] values, double expected) { + Variance var = Variance.create(); + for (double value : values) { + var.accept(value); + } + Assertions.assertEquals(expected, var.getAsDouble(), "variance non-finite"); + Assertions.assertEquals(expected, Variance.of(values).getAsDouble(), "of (values) non-finite"); + } + + @ParameterizedTest + @MethodSource(value = "testVarianceNonFinite") + void testParallelStreamNonFinite(double[] values, double expected) { + double ans = Arrays.stream(values) + .parallel() + .collect(Variance::create, Variance::accept, Variance::combine) + .getAsDouble(); + Assertions.assertEquals(expected, ans, "parallel stream non-finite"); + } + + @ParameterizedTest + @MethodSource(value = "testVarianceNonFinite") + void testVarianceRandomOrderNonFinite(double[] values, double expected) { + UniformRandomProvider rng = TestHelper.createRNG(); + for (int i = 1; i <= 10; i++) { + testVarianceNonFinite(TestHelper.shuffle(rng, values), expected); + testParallelStreamNonFinite(TestHelper.shuffle(rng, values), expected); + } + } + static Stream<Arguments> testVarianceNonFinite() { Review Comment: Add a newline above this method. Add a comment that the data should be the same as MeanTest.testMeanNonFinite. Given that all the expected values are NaN you could reuse MeanTest: ```java static Stream<Arguments> testVarianceNonFinite() { // Reuse the Mean data; we ignore the expected mean as the variance is always NaN return MeanTest.testMeanNonFinite(); } ``` These could also be moved to a TestData class. The expected value for the mean can be computed using DoubleStream since that will return the correct non-finite value from the sum divided by the length. ########## commons-statistics-descriptive/src/main/java/org/apache/commons/statistics/descriptive/SumOfSquaredDeviations.java: ########## @@ -0,0 +1,127 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.commons.statistics.descriptive; + +/** + * Computes the sum of squared deviations from the sample mean. This + * statistic is related to the second moment. + * + * <p> + * The following recursive updating formula is used: + * <p> + * Let <ul> + * <li> dev = (current obs - previous mean) </li> + * <li> n = number of observations (including current obs) </li> + * </ul> + * Then + * <p> + * new value = old value + dev^2 * (n - 1) / n. + * <p> + * + * Returns the sum of squared deviations of all values seen so far. + * + * <p><strong>Note that this implementation is not synchronized.</strong> If + * multiple threads access an instance of this class concurrently, and at least + * one of the threads invokes the <code>accept()</code> or + * <code>combine()</code> method, it must be synchronized externally. + * + * <p>However, it is safe to use <code>accept()</code> and <code>combine()</code> + * as <code>accumulator</code> and <code>combiner</code> functions of + * {@link java.util.stream.Collector Collector} on a parallel stream, + * because the parallel implementation of {@link java.util.stream.Stream#collect Stream.collect()} + * provides the necessary partitioning, isolation, and merging of results for + * safe and efficient parallel execution. + */ +class SumOfSquaredDeviations implements DoubleStatistic, DoubleStatisticAccumulator<SumOfSquaredDeviations> { Review Comment: I understand that this has been created by composition as the DoubleStatisticAccumulator interface cannot be implemented with different type arguments. However this is an internal class. Thus we could drop the implementation of DoubleStatisticAccumulator (keeping the actual implementation) and change this class to extend FirstMoment. This should allow simpler inheritance structure for the higher order moments. What we are working towards is computing statistics using the [moments (Wikipedia)](https://en.wikipedia.org/wiki/Moment_(mathematics)): ``` RawFirstMoment // SumOfDeviations is a precursor to central moment SumOfDeviations2 -> Variance SumOfDeviations3 -> Skewness SumOfDeviations4 -> Kurtosis // Optional SumOfDeviations5 -> Hyperskewness SumOfDeviations6 -> Hypertailedness ``` In this case inheritance for the sum of deviations makes sense. The actual public statistic implementations that use the sum of deviations will avoid the inheritance. Once we get into implementing an aggregator class then different statistics should be able to share the same underlying implementation of the sum of deviations. ########## commons-statistics-descriptive/src/test/java/org/apache/commons/statistics/descriptive/VarianceTest.java: ########## @@ -0,0 +1,320 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.commons.statistics.descriptive; + +import java.math.BigDecimal; +import java.math.MathContext; +import java.util.Arrays; +import java.util.stream.Stream; +import org.apache.commons.rng.UniformRandomProvider; +import org.junit.jupiter.api.Assertions; +import org.junit.jupiter.api.Test; +import org.junit.jupiter.params.ParameterizedTest; +import org.junit.jupiter.params.provider.Arguments; +import org.junit.jupiter.params.provider.MethodSource; + +/** + * Test for {@link Variance}. + */ +final class VarianceTest { + + @Test + void testEmpty() { + Variance var = Variance.create(); + Assertions.assertEquals(Double.NaN, var.getAsDouble()); + } + + @ParameterizedTest + @MethodSource(value = "testVariance") + void testVariance(double[] values) { + double expected = computeExpectedVariance(values); + Variance var = Variance.create(); + for (double value : values) { + var.accept(value); + } + TestHelper.assertEquals(expected, var.getAsDouble(), 7, () -> "variance"); + TestHelper.assertEquals(expected, Variance.of(values).getAsDouble(), 5, () -> "of (values)"); + } + + @ParameterizedTest + @MethodSource(value = "testVariance") + void testParallelStream(double[] values) { + double expected = computeExpectedVariance(values); + double actual = Arrays.stream(values) + .parallel() + .collect(Variance::create, Variance::accept, Variance::combine) + .getAsDouble(); + TestHelper.assertEquals(expected, actual, 7, () -> "parallel stream"); + } + + @ParameterizedTest + @MethodSource(value = "testVariance") + void testVarianceRandomOrder(double[] values) { + UniformRandomProvider rng = TestHelper.createRNG(); + for (int i = 1; i <= 10; i++) { + testVariance(TestHelper.shuffle(rng, values)); + testParallelStream(TestHelper.shuffle(rng, values)); + } + } + + static Stream<Arguments> testVariance() { + return Stream.of( + Arguments.of(new double[] {10, 8, 13, 9, 11, 14, 6, 4, 12, 7, 5}), + Arguments.of(new double[] {8.04, 6.95, 7.58, 8.81, 8.33, 9.96, 7.24, 4.26, 10.84, 4.82, 5.68}), + Arguments.of(new double[] {9.14, 8.14, 8.74, 8.77, 9.26, 8.10, 6.13, 3.10, 9.13, 7.26, 4.74, 7.46, 6.77, 12.74, 7.11, 7.81, 8.84, 6.08, 5.39, 8.15, 6.42, 5.73}), + Arguments.of(new double[] {8, 8, 8, 8, 8, 8, 8, 19, 8, 8, 8}), + Arguments.of(new double[] {6.58, 5.76, 7.71, 8.84, 8.47, 7.04, 5.25, 12.50, 5.56, 7.91, 6.89}), + Arguments.of(new double[] {0, 0, 0.0}), + Arguments.of(new double[] {1, -7, 6}), + Arguments.of(new double[] {1, 7, -15, 3}), + Arguments.of(new double[] {2, 2, 2, 2}), + Arguments.of(new double[] {2.3}), + Arguments.of(new double[] {3.14, 2.718, 1.414}), + Arguments.of(new double[] {12.5, 12.0, 11.8, 14.2, 14.9, 14.5, 21.0, 8.2, 10.3, 11.3, 14.1, 9.9, 12.2, 12.0, 12.1, 11.0, 19.8, 11.0, 10.0, 8.8, 9.0, 12.3}), + Arguments.of(new double[] {-0.0, +0.0}), + Arguments.of(new double[] {0.0, -0.0}), + Arguments.of(new double[] {0.0, +0.0}), + Arguments.of(new double[] {0.001, 0.0002, 0.00003, 10000.11, 0.000004}), + Arguments.of(new double[] {10E-50, 5E-100, 25E-200, 35.345E-50}), + // Overflow of the sum + Arguments.of(new double[] {Double.MAX_VALUE, Double.MAX_VALUE}), + Arguments.of(new double[] {-Double.MAX_VALUE, -Double.MAX_VALUE}), + Arguments.of(new double[] {Double.MAX_VALUE, 1}), + Arguments.of(new double[] {-Double.MAX_VALUE, 1, 1}), + Arguments.of(new double[] {-Double.MAX_VALUE, -1, 1}), + Arguments.of(new double[] {Double.MAX_VALUE, -1}), + Arguments.of(new double[] {Double.MAX_VALUE, -Double.MAX_VALUE}), + Arguments.of(new double[] {1, -Double.MAX_VALUE}), + Arguments.of(new double[] {1, 1, 1, -Double.MAX_VALUE}), + Arguments.of(new double[] {Double.MAX_VALUE, Double.MAX_VALUE / 2}) + ); + } + + @ParameterizedTest + @MethodSource(value = "testVarianceNonFinite") + void testVarianceNonFinite(double[] values, double expected) { + Variance var = Variance.create(); + for (double value : values) { + var.accept(value); + } + Assertions.assertEquals(expected, var.getAsDouble(), "variance non-finite"); + Assertions.assertEquals(expected, Variance.of(values).getAsDouble(), "of (values) non-finite"); + } + + @ParameterizedTest + @MethodSource(value = "testVarianceNonFinite") + void testParallelStreamNonFinite(double[] values, double expected) { + double ans = Arrays.stream(values) + .parallel() + .collect(Variance::create, Variance::accept, Variance::combine) + .getAsDouble(); + Assertions.assertEquals(expected, ans, "parallel stream non-finite"); + } + + @ParameterizedTest + @MethodSource(value = "testVarianceNonFinite") + void testVarianceRandomOrderNonFinite(double[] values, double expected) { + UniformRandomProvider rng = TestHelper.createRNG(); + for (int i = 1; i <= 10; i++) { + testVarianceNonFinite(TestHelper.shuffle(rng, values), expected); + testParallelStreamNonFinite(TestHelper.shuffle(rng, values), expected); + } + } + static Stream<Arguments> testVarianceNonFinite() { + return Stream.of( + Arguments.of(new double[] {}, Double.NaN), + Arguments.of(new double[] {Double.POSITIVE_INFINITY, Double.NEGATIVE_INFINITY}, + Double.NaN), + Arguments.of(new double[] {Double.POSITIVE_INFINITY, Double.POSITIVE_INFINITY}, + Double.NaN), + Arguments.of(new double[] {Double.NEGATIVE_INFINITY, Double.NEGATIVE_INFINITY}, + Double.NaN), + Arguments.of(new double[] {Double.POSITIVE_INFINITY, Double.MAX_VALUE}, + Double.NaN), + Arguments.of(new double[] {Double.NEGATIVE_INFINITY, -Double.MIN_VALUE}, + Double.NaN), + Arguments.of(new double[] {Double.NaN, +0.0d, -0.0d, Double.POSITIVE_INFINITY, + Double.NEGATIVE_INFINITY}, Double.NaN), + Arguments.of(new double[] {Double.NaN, 34.56, 89.74}, Double.NaN), + Arguments.of(new double[] {34.56, Double.NaN, 89.74}, Double.NaN), + Arguments.of(new double[] {34.56, 89.74, Double.NaN}, Double.NaN), + Arguments.of(new double[] {Double.NaN, 3.14, Double.NaN, Double.NaN}, + Double.NaN), + Arguments.of(new double[] {Double.NaN, Double.NaN, Double.NaN}, Double.NaN), + Arguments.of(new double[] {Double.NEGATIVE_INFINITY, Double.MAX_VALUE}, + Double.NaN), + Arguments.of(new double[] {Double.POSITIVE_INFINITY, Double.POSITIVE_INFINITY, + Double.POSITIVE_INFINITY, Double.POSITIVE_INFINITY}, Double.NaN), + Arguments.of(new double[] {-Double.MAX_VALUE, Double.POSITIVE_INFINITY}, + Double.NaN) + ); + } + + @ParameterizedTest + @MethodSource(value = "testCombine") + void testCombine(double[] array1, double[] array2) { + final double[] combinedArray = TestHelper.concatenate(array1, array2); + final double expected = computeExpectedVariance(combinedArray); + Variance var1 = Variance.create(); + Variance var2 = Variance.create(); + Arrays.stream(array1).forEach(var1); + Arrays.stream(array2).forEach(var2); + final double var2BeforeCombine = var2.getAsDouble(); + var1.combine(var2); + TestHelper.assertEquals(expected, var1.getAsDouble(), 7, () -> "combine"); + Assertions.assertEquals(var2BeforeCombine, var2.getAsDouble()); + } + + @ParameterizedTest + @MethodSource(value = "testCombine") + void testCombineRandomOrder(double[] array1, double[] array2) { + UniformRandomProvider rng = TestHelper.createRNG(); + double[] data = TestHelper.concatenate(array1, array2); + int n = array1.length; + for (int i = 1; i <= 10; i++) { + for (int j = 1; j <= 10; j++) { + TestHelper.shuffle(rng, array1); + TestHelper.shuffle(rng, array2); + testCombine(array1, array2); + } + TestHelper.shuffle(rng, data); + System.arraycopy(data, 0, array1, 0, n); + System.arraycopy(data, n, array2, 0, array2.length); + testCombine(array1, array2); + } + } + + @ParameterizedTest + @MethodSource(value = "testCombine") + void testArrayOfArrays(double[] array1, double[] array2) { + final double[] combinedArray = TestHelper.concatenate(array1, array2); + final double expected = computeExpectedVariance(combinedArray); + final double[][] values = {array1, array2}; + double actual = Arrays.stream(values) + .map(Variance::of) + .reduce(Variance::combine) + .map(Variance::getAsDouble) + .orElseThrow(RuntimeException::new); + TestHelper.assertEquals(expected, actual, 2, () -> "array of arrays combined variance"); + } + + static Stream<Arguments> testCombine() { + return Stream.of( + Arguments.of(new double[] {}, new double[] {1}), + Arguments.of(new double[] {1}, new double[] {}), + Arguments.of(new double[] {}, new double[] {1, 7, -15, 3}), + Arguments.of(new double[] {0}, new double[] {0, 0.0}), + Arguments.of(new double[] {4, 8, -6, 3, 18}, new double[] {1, -7, 6}), + Arguments.of(new double[] {10, 8, 13, 9, 11, 14, 6, 4, 12, 7, 5}, new double[] {8, 8, 8, 8, 8, 8, 8, 19, 8, 8, 8}), + Arguments.of(new double[] {10, 8, 13, 9, 11, 14, 6, 4, 12, 7, 5}, new double[] {7.46, 6.77, 12.74, 7.11, 7.81, 8.84, 6.08, 5.39, 8.15, 6.42, 5.73}), + Arguments.of(new double[] {6.0, -1.32, -5.78, 8.967, 13.32, -9.67, 0.14, 7.321, 11.456, -3.111}, new double[] {2, 2, 2, 2}), + Arguments.of(new double[] {2.3}, new double[] {-42, 10, -88, 5, -17}), + Arguments.of(new double[] {-20, 34.983, -12.745, 28.12, -8.34, 42, -4, 16}, new double[] {3.14, 2.718, 1.414}), + Arguments.of(new double[] {12.5, 12.0, 11.8, 14.2, 14.9, 14.5, 21.0, 8.2, 10.3, 11.3, 14.1, 9.9}, new double[] {12.2, 12.0, 12.1, 11.0, 19.8, 11.0, 10.0, 8.8, 9.0, 12.3}), + Arguments.of(new double[] {-0.0}, new double[] {+0.0}), + Arguments.of(new double[] {0.0}, new double[] {-0.0}), + Arguments.of(new double[] {0.0}, new double[] {+0.0}), + Arguments.of(new double[] {10E-50, 5E-100}, new double[] {25E-200, 35.345E-50}), + Arguments.of(new double[] {Double.MAX_VALUE}, new double[] {Double.MAX_VALUE}), + Arguments.of(new double[] {-Double.MAX_VALUE}, new double[] {-Double.MAX_VALUE}), + Arguments.of(new double[] {-Double.MAX_VALUE, 1}, new double[] {1}), + Arguments.of(new double[] {Double.MAX_VALUE, 3.1415E153}, new double[] {}), + Arguments.of(new double[] {Double.MAX_VALUE}, new double[] {-Double.MAX_VALUE}), + Arguments.of(new double[] {1}, new double[] {-Double.MAX_VALUE}), + Arguments.of(new double[] {1, 1, 1}, new double[] {-Double.MAX_VALUE}), + Arguments.of(new double[] {Double.MAX_VALUE}, new double[] {1, 1E300}) + ); + } + + @ParameterizedTest + @MethodSource(value = "testCombineNonFinite") + void testCombineNonFinite(double[][] values, double expected) { + Variance var1 = Variance.create(); + Variance var2 = Variance.create(); + Arrays.stream(values[0]).forEach(var1); + Arrays.stream(values[1]).forEach(var2); + double mean2BeforeCombine = var2.getAsDouble(); + var1.combine(var2); + Assertions.assertEquals(expected, var1.getAsDouble(), "combine non-finite"); + Assertions.assertEquals(mean2BeforeCombine, var2.getAsDouble()); + } + + @ParameterizedTest + @MethodSource(value = "testCombineNonFinite") + void testCombineRandomOrderNonFinite(double[][] values, double expected) { + UniformRandomProvider rng = TestHelper.createRNG(); + double[] data = TestHelper.concatenate(values[0], values[1]); + int n = values[0].length; + for (int i = 1; i <= 10; i++) { + for (int j = 1; j <= 10; j++) { + TestHelper.shuffle(rng, values[0]); + TestHelper.shuffle(rng, values[1]); + testCombineNonFinite(values, expected); + } + TestHelper.shuffle(rng, data); + System.arraycopy(data, 0, values[0], 0, n); + System.arraycopy(data, n, values[1], 0, values[1].length); + testCombineNonFinite(values, expected); + } + } + + @ParameterizedTest + @MethodSource(value = "testCombineNonFinite") + void testArrayOfArraysNonFinite(double[][] values, double expected) { + double actual = Arrays.stream(values) + .map(Variance::of) + .reduce(Variance::combine) + .map(Variance::getAsDouble) + .orElseThrow(RuntimeException::new); + Assertions.assertEquals(expected, actual, "array of arrays combined variance non-finite"); + } + + static Stream<Arguments> testCombineNonFinite() { Review Comment: Once again, this is a duplicate of the data in MeanTest. The expected value is always NaN. So we can move the data to a TestData class. ########## commons-statistics-descriptive/src/test/java/org/apache/commons/statistics/descriptive/VarianceTest.java: ########## @@ -0,0 +1,320 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.commons.statistics.descriptive; + +import java.math.BigDecimal; +import java.math.MathContext; +import java.util.Arrays; +import java.util.stream.Stream; +import org.apache.commons.rng.UniformRandomProvider; +import org.junit.jupiter.api.Assertions; +import org.junit.jupiter.api.Test; +import org.junit.jupiter.params.ParameterizedTest; +import org.junit.jupiter.params.provider.Arguments; +import org.junit.jupiter.params.provider.MethodSource; + +/** + * Test for {@link Variance}. + */ +final class VarianceTest { + + @Test + void testEmpty() { + Variance var = Variance.create(); + Assertions.assertEquals(Double.NaN, var.getAsDouble()); + } + + @ParameterizedTest + @MethodSource(value = "testVariance") + void testVariance(double[] values) { + double expected = computeExpectedVariance(values); + Variance var = Variance.create(); + for (double value : values) { + var.accept(value); + } + TestHelper.assertEquals(expected, var.getAsDouble(), 7, () -> "variance"); + TestHelper.assertEquals(expected, Variance.of(values).getAsDouble(), 5, () -> "of (values)"); + } + + @ParameterizedTest + @MethodSource(value = "testVariance") + void testParallelStream(double[] values) { + double expected = computeExpectedVariance(values); + double actual = Arrays.stream(values) + .parallel() + .collect(Variance::create, Variance::accept, Variance::combine) + .getAsDouble(); + TestHelper.assertEquals(expected, actual, 7, () -> "parallel stream"); + } + + @ParameterizedTest + @MethodSource(value = "testVariance") + void testVarianceRandomOrder(double[] values) { + UniformRandomProvider rng = TestHelper.createRNG(); + for (int i = 1; i <= 10; i++) { + testVariance(TestHelper.shuffle(rng, values)); + testParallelStream(TestHelper.shuffle(rng, values)); + } + } + + static Stream<Arguments> testVariance() { + return Stream.of( + Arguments.of(new double[] {10, 8, 13, 9, 11, 14, 6, 4, 12, 7, 5}), + Arguments.of(new double[] {8.04, 6.95, 7.58, 8.81, 8.33, 9.96, 7.24, 4.26, 10.84, 4.82, 5.68}), + Arguments.of(new double[] {9.14, 8.14, 8.74, 8.77, 9.26, 8.10, 6.13, 3.10, 9.13, 7.26, 4.74, 7.46, 6.77, 12.74, 7.11, 7.81, 8.84, 6.08, 5.39, 8.15, 6.42, 5.73}), + Arguments.of(new double[] {8, 8, 8, 8, 8, 8, 8, 19, 8, 8, 8}), + Arguments.of(new double[] {6.58, 5.76, 7.71, 8.84, 8.47, 7.04, 5.25, 12.50, 5.56, 7.91, 6.89}), + Arguments.of(new double[] {0, 0, 0.0}), + Arguments.of(new double[] {1, -7, 6}), + Arguments.of(new double[] {1, 7, -15, 3}), + Arguments.of(new double[] {2, 2, 2, 2}), + Arguments.of(new double[] {2.3}), + Arguments.of(new double[] {3.14, 2.718, 1.414}), + Arguments.of(new double[] {12.5, 12.0, 11.8, 14.2, 14.9, 14.5, 21.0, 8.2, 10.3, 11.3, 14.1, 9.9, 12.2, 12.0, 12.1, 11.0, 19.8, 11.0, 10.0, 8.8, 9.0, 12.3}), + Arguments.of(new double[] {-0.0, +0.0}), + Arguments.of(new double[] {0.0, -0.0}), + Arguments.of(new double[] {0.0, +0.0}), + Arguments.of(new double[] {0.001, 0.0002, 0.00003, 10000.11, 0.000004}), + Arguments.of(new double[] {10E-50, 5E-100, 25E-200, 35.345E-50}), + // Overflow of the sum + Arguments.of(new double[] {Double.MAX_VALUE, Double.MAX_VALUE}), + Arguments.of(new double[] {-Double.MAX_VALUE, -Double.MAX_VALUE}), + Arguments.of(new double[] {Double.MAX_VALUE, 1}), + Arguments.of(new double[] {-Double.MAX_VALUE, 1, 1}), + Arguments.of(new double[] {-Double.MAX_VALUE, -1, 1}), + Arguments.of(new double[] {Double.MAX_VALUE, -1}), + Arguments.of(new double[] {Double.MAX_VALUE, -Double.MAX_VALUE}), + Arguments.of(new double[] {1, -Double.MAX_VALUE}), + Arguments.of(new double[] {1, 1, 1, -Double.MAX_VALUE}), + Arguments.of(new double[] {Double.MAX_VALUE, Double.MAX_VALUE / 2}) + ); + } + + @ParameterizedTest + @MethodSource(value = "testVarianceNonFinite") + void testVarianceNonFinite(double[] values, double expected) { + Variance var = Variance.create(); + for (double value : values) { + var.accept(value); + } + Assertions.assertEquals(expected, var.getAsDouble(), "variance non-finite"); + Assertions.assertEquals(expected, Variance.of(values).getAsDouble(), "of (values) non-finite"); + } + + @ParameterizedTest + @MethodSource(value = "testVarianceNonFinite") + void testParallelStreamNonFinite(double[] values, double expected) { + double ans = Arrays.stream(values) + .parallel() + .collect(Variance::create, Variance::accept, Variance::combine) + .getAsDouble(); + Assertions.assertEquals(expected, ans, "parallel stream non-finite"); + } + + @ParameterizedTest + @MethodSource(value = "testVarianceNonFinite") + void testVarianceRandomOrderNonFinite(double[] values, double expected) { + UniformRandomProvider rng = TestHelper.createRNG(); + for (int i = 1; i <= 10; i++) { + testVarianceNonFinite(TestHelper.shuffle(rng, values), expected); + testParallelStreamNonFinite(TestHelper.shuffle(rng, values), expected); + } + } + static Stream<Arguments> testVarianceNonFinite() { + return Stream.of( + Arguments.of(new double[] {}, Double.NaN), + Arguments.of(new double[] {Double.POSITIVE_INFINITY, Double.NEGATIVE_INFINITY}, + Double.NaN), + Arguments.of(new double[] {Double.POSITIVE_INFINITY, Double.POSITIVE_INFINITY}, + Double.NaN), + Arguments.of(new double[] {Double.NEGATIVE_INFINITY, Double.NEGATIVE_INFINITY}, + Double.NaN), + Arguments.of(new double[] {Double.POSITIVE_INFINITY, Double.MAX_VALUE}, + Double.NaN), + Arguments.of(new double[] {Double.NEGATIVE_INFINITY, -Double.MIN_VALUE}, + Double.NaN), + Arguments.of(new double[] {Double.NaN, +0.0d, -0.0d, Double.POSITIVE_INFINITY, + Double.NEGATIVE_INFINITY}, Double.NaN), + Arguments.of(new double[] {Double.NaN, 34.56, 89.74}, Double.NaN), + Arguments.of(new double[] {34.56, Double.NaN, 89.74}, Double.NaN), + Arguments.of(new double[] {34.56, 89.74, Double.NaN}, Double.NaN), + Arguments.of(new double[] {Double.NaN, 3.14, Double.NaN, Double.NaN}, + Double.NaN), + Arguments.of(new double[] {Double.NaN, Double.NaN, Double.NaN}, Double.NaN), + Arguments.of(new double[] {Double.NEGATIVE_INFINITY, Double.MAX_VALUE}, + Double.NaN), + Arguments.of(new double[] {Double.POSITIVE_INFINITY, Double.POSITIVE_INFINITY, + Double.POSITIVE_INFINITY, Double.POSITIVE_INFINITY}, Double.NaN), + Arguments.of(new double[] {-Double.MAX_VALUE, Double.POSITIVE_INFINITY}, + Double.NaN) + ); + } + + @ParameterizedTest + @MethodSource(value = "testCombine") + void testCombine(double[] array1, double[] array2) { + final double[] combinedArray = TestHelper.concatenate(array1, array2); + final double expected = computeExpectedVariance(combinedArray); + Variance var1 = Variance.create(); + Variance var2 = Variance.create(); + Arrays.stream(array1).forEach(var1); + Arrays.stream(array2).forEach(var2); + final double var2BeforeCombine = var2.getAsDouble(); + var1.combine(var2); + TestHelper.assertEquals(expected, var1.getAsDouble(), 7, () -> "combine"); + Assertions.assertEquals(var2BeforeCombine, var2.getAsDouble()); + } + + @ParameterizedTest + @MethodSource(value = "testCombine") + void testCombineRandomOrder(double[] array1, double[] array2) { + UniformRandomProvider rng = TestHelper.createRNG(); + double[] data = TestHelper.concatenate(array1, array2); + int n = array1.length; + for (int i = 1; i <= 10; i++) { + for (int j = 1; j <= 10; j++) { + TestHelper.shuffle(rng, array1); + TestHelper.shuffle(rng, array2); + testCombine(array1, array2); + } + TestHelper.shuffle(rng, data); + System.arraycopy(data, 0, array1, 0, n); + System.arraycopy(data, n, array2, 0, array2.length); + testCombine(array1, array2); + } + } + + @ParameterizedTest + @MethodSource(value = "testCombine") + void testArrayOfArrays(double[] array1, double[] array2) { + final double[] combinedArray = TestHelper.concatenate(array1, array2); + final double expected = computeExpectedVariance(combinedArray); + final double[][] values = {array1, array2}; + double actual = Arrays.stream(values) + .map(Variance::of) + .reduce(Variance::combine) + .map(Variance::getAsDouble) + .orElseThrow(RuntimeException::new); + TestHelper.assertEquals(expected, actual, 2, () -> "array of arrays combined variance"); + } + + static Stream<Arguments> testCombine() { Review Comment: Move this data to `TestData.testCombineValues`. This will have to use a Stream of Arguments. ########## commons-statistics-descriptive/src/test/java/org/apache/commons/statistics/descriptive/MeanTest.java: ########## @@ -310,13 +308,4 @@ static Stream<Arguments> testCombineNonFinite() { Arguments.of(new double[][] {{Double.NEGATIVE_INFINITY, -Double.MAX_VALUE, -Double.MIN_VALUE}, {Double.MAX_VALUE, Double.MIN_VALUE}}, Double.NEGATIVE_INFINITY) ); } - - // Helper function to compute the expected value of Mean using BigDecimal. - private static double computeExpected(double[] values) { Review Comment: You could leave this function here and have the ported function return the BigDecimal result. This then becomes: ```java private static double computeExpected(double[] values) { return TestHelper.computedExpectedMean(values).doubleValue(); } ``` ########## commons-statistics-descriptive/src/main/java/org/apache/commons/statistics/descriptive/Variance.java: ########## @@ -0,0 +1,233 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.commons.statistics.descriptive; + +/** + * Computes the variance of a set of values. By default, the + * "sample variance" is computed. The definitional formula for sample + * variance is: + * <p> + * sum((x_i - mean)^2) / (n - 1) + * <p>This formula does not have good numerical properties, so this + * implementation does not use it to compute the statistic. + * <ul> + * <li> The {@link #accept(double)} method computes the variance using + * updating formulae based on West's algorithm, as described in + * <a href="http://doi.acm.org/10.1145/359146.359152"> Chan, T. F. and + * J. G. Lewis 1979, <i>Communications of the ACM</i>, + * vol. 22 no. 9, pp. 526-531.</a></li> + * + * <li> The {@link #of(double...)} method leverages the fact that it has the + * full array of values in memory to execute a two-pass algorithm. + * Specifically, this method uses the "corrected two-pass algorithm" from + * Chan, Golub, Levesque, <i>Algorithms for Computing the Sample Variance</i>, + * American Statistician, vol. 37, no. 3 (1983) pp. 242-247.</li></ul> + * + * Note that adding values using {@code accept} and then executing {@code getAsDouble} will + * sometimes give a different, less accurate, result than executing + * {@code of} with the full array of values. The former approach + * should only be used when the full array of values is not available. + * + * <p> + * Returns <code>Double.NaN</code> if no data values have been added and + * returns <code>0</code> if there is just one finite value in the data set. + * Note that <code>Double.NaN</code> may also be returned if the input includes + * <code>Double.NaN</code> and / or infinite values. + * + * <p>This class is designed to work with (though does not require) + * {@linkplain java.util.stream streams}. + * + * <p><strong>Note that this implementation is not synchronized.</strong> If + * multiple threads access an instance of this class concurrently, and at least + * one of the threads invokes the <code>accept()</code> or + * <code>combine()</code> method, it must be synchronized externally. + * + * <p>However, it is safe to use <code>accept()</code> and <code>combine()</code> + * as <code>accumulator</code> and <code>combiner</code> functions of + * {@link java.util.stream.Collector Collector} on a parallel stream, + * because the parallel implementation of {@link java.util.stream.Stream#collect Stream.collect()} + * provides the necessary partitioning, isolation, and merging of results for + * safe and efficient parallel execution. + * + * @since 1.1 + */ +public abstract class Variance implements DoubleStatistic, DoubleStatisticAccumulator<Variance> { + + /** + * Create a Variance instance. + */ + Variance() { + // No-op + } + + /** + * Creates a {@code Variance} implementation which does not store the input value(s) it consumes. + * + * <p>The result is <code>NaN</code> if: + * <ul> + * <li>no values have been added,</li> + * <li>any of the values is <code>NaN</code>, or</li> + * <li>an infinite value of either sign is encountered</li> + * </ul> + * + * @return {@code Variance} implementation. + */ + public static Variance create() { + return new StorelessSampleVariance(); + } + + /** + * Returns a {@code Variance} instance that has the variance of all input values, or <code>NaN</code> + * if: + * <ul> + * <li>the input array is empty,</li> + * <li>any of the values is <code>NaN</code>, or</li> + * <li>an infinite value of either sign is encountered</li> + * </ul> + * + * <p>Note: {@code Variance} computed using {@link Variance#accept Variance.accept()} may be different + * from this variance. + * + * <p>See {@link Variance} for details on the computing algorithm. + * + * @param values Values. + * @return {@code Variance} instance. + */ + public static Variance of(double... values) { + final double mean = Mean.of(values).getAsDouble(); + double accum = 0.0; + double dev; + double accum2 = 0.0; + double squaredDevSum; + for (final double value : values) { + dev = value - mean; + accum += dev * dev; + accum2 += dev; + } + final double accum2Squared = accum2 * accum2; + final long n = values.length; + // The sum of squared deviations is accum - (accum2 * accum2 / n). + // To prevent squaredDevSum from spuriously attaining a NaN value + // when accum2Squared (which implies accum is also infinite) is infinite, assign it + // an infinite value which is its intended value. + if (accum2Squared == Double.POSITIVE_INFINITY) { + squaredDevSum = Double.POSITIVE_INFINITY; + } else { + squaredDevSum = accum - (accum2 * accum2 / n); + } + return StorelessSampleVariance.create(squaredDevSum, mean, n, accum2 + (mean * n)); + } + + /** + * Updates the state of the statistic to reflect the addition of {@code value}. + * @param value Value. + */ + @Override + public abstract void accept(double value); + + /** + * Gets the variance of all input values. + * + * <p>The result is <code>NaN</code> if : + * <ul> + * <li>the input array is empty,</li> + * <li>any of the values is <code>NaN</code>, or</li> + * <li>an infinite value of either sign is encountered</li> + * </ul> + * + * <p>The result is <code>0</code> if there is just one finite value in the data set. + * + * @return {@code Variance} of all values seen so far. + */ + @Override + public abstract double getAsDouble(); + + /** {@inheritDoc} */ + @Override + public abstract Variance combine(Variance other); + + /** + * {@code Variance} implementation that does not store the input value(s) processed so far. + */ + private static class StorelessSampleVariance extends Variance { + + /** + * An instance of {@link SumOfSquaredDeviations}, which is used to + * compute the variance. + */ + private final SumOfSquaredDeviations squaredDeviationSum; + + /** + * Creates a StorelessVariance instance with the sum of squared + * deviations from the mean. + * + * @param squaredDevSum Sum of squared deviations. + * @param mean Mean of values. + * @param n Number of values. + * @param sumOfValues Sum of values. + */ + private StorelessSampleVariance(double squaredDevSum, double mean, long n, double sumOfValues) { + squaredDeviationSum = new SumOfSquaredDeviations(squaredDevSum, mean, n, sumOfValues); + } + + /** + * Create a SumOfSquaredDeviations instance. + */ + StorelessSampleVariance() { + squaredDeviationSum = new SumOfSquaredDeviations(); + } + + /** + * Creates a StorelessVariance instance with the sum of squared + * deviations from the mean. + * + * @param squaredDevSum Sum of squared deviations. + * @param mean Mean of values. + * @param n Number of values. + * @param sumOfValues Sum of values. + * @return A StorelessVariance instance. + */ + static StorelessSampleVariance create(double squaredDevSum, double mean, long n, double sumOfValues) { + return new StorelessSampleVariance(squaredDevSum, mean, n, sumOfValues); + } + + @Override + public void accept(double value) { + squaredDeviationSum.accept(value); + } + + @Override + public double getAsDouble() { + final double sumOfSquaredDev = squaredDeviationSum.getAsDouble(); + final double n = squaredDeviationSum.getN(); + if (n == 0) { + return Double.NaN; + } else if (n == 1 && Double.isFinite(sumOfSquaredDev)) { + return 0d; Review Comment: No reason to use 0d here. This is only relevant when you wish for the zero to be used as double for double arithmetic, e.g. `1 / 0 != 1 / 0d` (the LHS will throw an arithmetic exception). ########## commons-statistics-descriptive/src/test/java/org/apache/commons/statistics/descriptive/VarianceTest.java: ########## @@ -0,0 +1,320 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.commons.statistics.descriptive; + +import java.math.BigDecimal; +import java.math.MathContext; +import java.util.Arrays; +import java.util.stream.Stream; +import org.apache.commons.rng.UniformRandomProvider; +import org.junit.jupiter.api.Assertions; +import org.junit.jupiter.api.Test; +import org.junit.jupiter.params.ParameterizedTest; +import org.junit.jupiter.params.provider.Arguments; +import org.junit.jupiter.params.provider.MethodSource; + +/** + * Test for {@link Variance}. + */ +final class VarianceTest { + + @Test + void testEmpty() { + Variance var = Variance.create(); + Assertions.assertEquals(Double.NaN, var.getAsDouble()); + } + + @ParameterizedTest + @MethodSource(value = "testVariance") + void testVariance(double[] values) { + double expected = computeExpectedVariance(values); + Variance var = Variance.create(); + for (double value : values) { + var.accept(value); + } + TestHelper.assertEquals(expected, var.getAsDouble(), 7, () -> "variance"); + TestHelper.assertEquals(expected, Variance.of(values).getAsDouble(), 5, () -> "of (values)"); + } + + @ParameterizedTest + @MethodSource(value = "testVariance") + void testParallelStream(double[] values) { + double expected = computeExpectedVariance(values); + double actual = Arrays.stream(values) + .parallel() + .collect(Variance::create, Variance::accept, Variance::combine) + .getAsDouble(); + TestHelper.assertEquals(expected, actual, 7, () -> "parallel stream"); + } + + @ParameterizedTest + @MethodSource(value = "testVariance") + void testVarianceRandomOrder(double[] values) { + UniformRandomProvider rng = TestHelper.createRNG(); + for (int i = 1; i <= 10; i++) { + testVariance(TestHelper.shuffle(rng, values)); + testParallelStream(TestHelper.shuffle(rng, values)); + } + } + + static Stream<Arguments> testVariance() { Review Comment: It may be better here to use MeanTest (since the data is the same): ```java static Stream<Arguments> testVariance() { return MeanTest.testMean(); } ``` My preferred option would be to move the stream method to a new class: `TestData` with `static Stream<double[]> testValues()`. Note that you do not have to use JUnit `Arguments` when the test method only has one argument. You can simply stream the objects. JUnit will map the stream elements to the test argument. Note: When I made the above change I also had the extra test case for Mean included. That fails: ``` Double.MAX_VALUE, Double.MAX_VALUE, -Double.MAX_VALUE expected: Infinity actual: NaN ``` This highlights that we really should reuse the same data on all the statistics, and it should be in one place. ########## commons-statistics-descriptive/src/main/java/org/apache/commons/statistics/descriptive/Variance.java: ########## @@ -0,0 +1,233 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.commons.statistics.descriptive; + +/** + * Computes the variance of a set of values. By default, the + * "sample variance" is computed. The definitional formula for sample + * variance is: + * <p> + * sum((x_i - mean)^2) / (n - 1) + * <p>This formula does not have good numerical properties, so this + * implementation does not use it to compute the statistic. + * <ul> + * <li> The {@link #accept(double)} method computes the variance using + * updating formulae based on West's algorithm, as described in + * <a href="http://doi.acm.org/10.1145/359146.359152"> Chan, T. F. and + * J. G. Lewis 1979, <i>Communications of the ACM</i>, + * vol. 22 no. 9, pp. 526-531.</a></li> + * + * <li> The {@link #of(double...)} method leverages the fact that it has the + * full array of values in memory to execute a two-pass algorithm. + * Specifically, this method uses the "corrected two-pass algorithm" from + * Chan, Golub, Levesque, <i>Algorithms for Computing the Sample Variance</i>, + * American Statistician, vol. 37, no. 3 (1983) pp. 242-247.</li></ul> + * + * Note that adding values using {@code accept} and then executing {@code getAsDouble} will + * sometimes give a different, less accurate, result than executing + * {@code of} with the full array of values. The former approach + * should only be used when the full array of values is not available. + * + * <p> + * Returns <code>Double.NaN</code> if no data values have been added and + * returns <code>0</code> if there is just one finite value in the data set. + * Note that <code>Double.NaN</code> may also be returned if the input includes + * <code>Double.NaN</code> and / or infinite values. + * + * <p>This class is designed to work with (though does not require) + * {@linkplain java.util.stream streams}. + * + * <p><strong>Note that this implementation is not synchronized.</strong> If + * multiple threads access an instance of this class concurrently, and at least + * one of the threads invokes the <code>accept()</code> or + * <code>combine()</code> method, it must be synchronized externally. + * + * <p>However, it is safe to use <code>accept()</code> and <code>combine()</code> + * as <code>accumulator</code> and <code>combiner</code> functions of + * {@link java.util.stream.Collector Collector} on a parallel stream, + * because the parallel implementation of {@link java.util.stream.Stream#collect Stream.collect()} + * provides the necessary partitioning, isolation, and merging of results for + * safe and efficient parallel execution. + * + * @since 1.1 + */ +public abstract class Variance implements DoubleStatistic, DoubleStatisticAccumulator<Variance> { + + /** + * Create a Variance instance. + */ + Variance() { + // No-op + } + + /** + * Creates a {@code Variance} implementation which does not store the input value(s) it consumes. + * + * <p>The result is <code>NaN</code> if: + * <ul> + * <li>no values have been added,</li> + * <li>any of the values is <code>NaN</code>, or</li> + * <li>an infinite value of either sign is encountered</li> + * </ul> + * + * @return {@code Variance} implementation. + */ + public static Variance create() { + return new StorelessSampleVariance(); + } + + /** + * Returns a {@code Variance} instance that has the variance of all input values, or <code>NaN</code> + * if: + * <ul> + * <li>the input array is empty,</li> + * <li>any of the values is <code>NaN</code>, or</li> + * <li>an infinite value of either sign is encountered</li> + * </ul> + * + * <p>Note: {@code Variance} computed using {@link Variance#accept Variance.accept()} may be different + * from this variance. + * + * <p>See {@link Variance} for details on the computing algorithm. + * + * @param values Values. + * @return {@code Variance} instance. + */ + public static Variance of(double... values) { + final double mean = Mean.of(values).getAsDouble(); + double accum = 0.0; + double dev; + double accum2 = 0.0; + double squaredDevSum; + for (final double value : values) { + dev = value - mean; + accum += dev * dev; + accum2 += dev; + } + final double accum2Squared = accum2 * accum2; + final long n = values.length; + // The sum of squared deviations is accum - (accum2 * accum2 / n). + // To prevent squaredDevSum from spuriously attaining a NaN value + // when accum2Squared (which implies accum is also infinite) is infinite, assign it + // an infinite value which is its intended value. + if (accum2Squared == Double.POSITIVE_INFINITY) { + squaredDevSum = Double.POSITIVE_INFINITY; + } else { + squaredDevSum = accum - (accum2 * accum2 / n); + } + return StorelessSampleVariance.create(squaredDevSum, mean, n, accum2 + (mean * n)); + } + + /** + * Updates the state of the statistic to reflect the addition of {@code value}. + * @param value Value. + */ + @Override + public abstract void accept(double value); + + /** + * Gets the variance of all input values. + * + * <p>The result is <code>NaN</code> if : + * <ul> + * <li>the input array is empty,</li> + * <li>any of the values is <code>NaN</code>, or</li> + * <li>an infinite value of either sign is encountered</li> + * </ul> + * + * <p>The result is <code>0</code> if there is just one finite value in the data set. + * + * @return {@code Variance} of all values seen so far. + */ + @Override + public abstract double getAsDouble(); + + /** {@inheritDoc} */ + @Override + public abstract Variance combine(Variance other); + + /** + * {@code Variance} implementation that does not store the input value(s) processed so far. + */ + private static class StorelessSampleVariance extends Variance { + + /** + * An instance of {@link SumOfSquaredDeviations}, which is used to + * compute the variance. + */ + private final SumOfSquaredDeviations squaredDeviationSum; + + /** + * Creates a StorelessVariance instance with the sum of squared + * deviations from the mean. + * + * @param squaredDevSum Sum of squared deviations. + * @param mean Mean of values. + * @param n Number of values. + * @param sumOfValues Sum of values. + */ + private StorelessSampleVariance(double squaredDevSum, double mean, long n, double sumOfValues) { + squaredDeviationSum = new SumOfSquaredDeviations(squaredDevSum, mean, n, sumOfValues); + } + + /** + * Create a SumOfSquaredDeviations instance. + */ + StorelessSampleVariance() { + squaredDeviationSum = new SumOfSquaredDeviations(); + } + + /** + * Creates a StorelessVariance instance with the sum of squared + * deviations from the mean. + * + * @param squaredDevSum Sum of squared deviations. + * @param mean Mean of values. + * @param n Number of values. + * @param sumOfValues Sum of values. + * @return A StorelessVariance instance. + */ + static StorelessSampleVariance create(double squaredDevSum, double mean, long n, double sumOfValues) { + return new StorelessSampleVariance(squaredDevSum, mean, n, sumOfValues); + } + + @Override + public void accept(double value) { + squaredDeviationSum.accept(value); + } + + @Override + public double getAsDouble() { + final double sumOfSquaredDev = squaredDeviationSum.getAsDouble(); + final double n = squaredDeviationSum.getN(); + if (n == 0) { + return Double.NaN; + } else if (n == 1 && Double.isFinite(sumOfSquaredDev)) { + return 0d; + } + return sumOfSquaredDev / (n - 1); + } + + @Override + public Variance combine(Variance other) { + final StorelessSampleVariance that = (StorelessSampleVariance) other; + squaredDeviationSum.combine(that.squaredDeviationSum); + return this; + } + Review Comment: Remove empty line ########## commons-statistics-descriptive/src/test/java/org/apache/commons/statistics/descriptive/VarianceTest.java: ########## @@ -0,0 +1,320 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.commons.statistics.descriptive; + +import java.math.BigDecimal; +import java.math.MathContext; +import java.util.Arrays; +import java.util.stream.Stream; +import org.apache.commons.rng.UniformRandomProvider; +import org.junit.jupiter.api.Assertions; +import org.junit.jupiter.api.Test; +import org.junit.jupiter.params.ParameterizedTest; +import org.junit.jupiter.params.provider.Arguments; +import org.junit.jupiter.params.provider.MethodSource; + +/** + * Test for {@link Variance}. + */ +final class VarianceTest { + + @Test + void testEmpty() { + Variance var = Variance.create(); + Assertions.assertEquals(Double.NaN, var.getAsDouble()); + } + Review Comment: In contrast to all the other tests there is no `testNan()` method here. I would add it for completeness. ########## commons-statistics-descriptive/src/test/java/org/apache/commons/statistics/descriptive/VarianceTest.java: ########## @@ -0,0 +1,320 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.commons.statistics.descriptive; + +import java.math.BigDecimal; +import java.math.MathContext; +import java.util.Arrays; +import java.util.stream.Stream; +import org.apache.commons.rng.UniformRandomProvider; +import org.junit.jupiter.api.Assertions; +import org.junit.jupiter.api.Test; +import org.junit.jupiter.params.ParameterizedTest; +import org.junit.jupiter.params.provider.Arguments; +import org.junit.jupiter.params.provider.MethodSource; + +/** + * Test for {@link Variance}. + */ +final class VarianceTest { + + @Test + void testEmpty() { + Variance var = Variance.create(); + Assertions.assertEquals(Double.NaN, var.getAsDouble()); + } + + @ParameterizedTest + @MethodSource(value = "testVariance") + void testVariance(double[] values) { + double expected = computeExpectedVariance(values); + Variance var = Variance.create(); + for (double value : values) { + var.accept(value); + } + TestHelper.assertEquals(expected, var.getAsDouble(), 7, () -> "variance"); + TestHelper.assertEquals(expected, Variance.of(values).getAsDouble(), 5, () -> "of (values)"); + } + + @ParameterizedTest + @MethodSource(value = "testVariance") + void testParallelStream(double[] values) { + double expected = computeExpectedVariance(values); + double actual = Arrays.stream(values) + .parallel() + .collect(Variance::create, Variance::accept, Variance::combine) + .getAsDouble(); + TestHelper.assertEquals(expected, actual, 7, () -> "parallel stream"); + } + + @ParameterizedTest + @MethodSource(value = "testVariance") + void testVarianceRandomOrder(double[] values) { + UniformRandomProvider rng = TestHelper.createRNG(); + for (int i = 1; i <= 10; i++) { + testVariance(TestHelper.shuffle(rng, values)); + testParallelStream(TestHelper.shuffle(rng, values)); + } + } + + static Stream<Arguments> testVariance() { + return Stream.of( + Arguments.of(new double[] {10, 8, 13, 9, 11, 14, 6, 4, 12, 7, 5}), + Arguments.of(new double[] {8.04, 6.95, 7.58, 8.81, 8.33, 9.96, 7.24, 4.26, 10.84, 4.82, 5.68}), + Arguments.of(new double[] {9.14, 8.14, 8.74, 8.77, 9.26, 8.10, 6.13, 3.10, 9.13, 7.26, 4.74, 7.46, 6.77, 12.74, 7.11, 7.81, 8.84, 6.08, 5.39, 8.15, 6.42, 5.73}), + Arguments.of(new double[] {8, 8, 8, 8, 8, 8, 8, 19, 8, 8, 8}), + Arguments.of(new double[] {6.58, 5.76, 7.71, 8.84, 8.47, 7.04, 5.25, 12.50, 5.56, 7.91, 6.89}), + Arguments.of(new double[] {0, 0, 0.0}), + Arguments.of(new double[] {1, -7, 6}), + Arguments.of(new double[] {1, 7, -15, 3}), + Arguments.of(new double[] {2, 2, 2, 2}), + Arguments.of(new double[] {2.3}), + Arguments.of(new double[] {3.14, 2.718, 1.414}), + Arguments.of(new double[] {12.5, 12.0, 11.8, 14.2, 14.9, 14.5, 21.0, 8.2, 10.3, 11.3, 14.1, 9.9, 12.2, 12.0, 12.1, 11.0, 19.8, 11.0, 10.0, 8.8, 9.0, 12.3}), + Arguments.of(new double[] {-0.0, +0.0}), + Arguments.of(new double[] {0.0, -0.0}), + Arguments.of(new double[] {0.0, +0.0}), + Arguments.of(new double[] {0.001, 0.0002, 0.00003, 10000.11, 0.000004}), + Arguments.of(new double[] {10E-50, 5E-100, 25E-200, 35.345E-50}), + // Overflow of the sum + Arguments.of(new double[] {Double.MAX_VALUE, Double.MAX_VALUE}), + Arguments.of(new double[] {-Double.MAX_VALUE, -Double.MAX_VALUE}), + Arguments.of(new double[] {Double.MAX_VALUE, 1}), + Arguments.of(new double[] {-Double.MAX_VALUE, 1, 1}), + Arguments.of(new double[] {-Double.MAX_VALUE, -1, 1}), + Arguments.of(new double[] {Double.MAX_VALUE, -1}), + Arguments.of(new double[] {Double.MAX_VALUE, -Double.MAX_VALUE}), + Arguments.of(new double[] {1, -Double.MAX_VALUE}), + Arguments.of(new double[] {1, 1, 1, -Double.MAX_VALUE}), + Arguments.of(new double[] {Double.MAX_VALUE, Double.MAX_VALUE / 2}) + ); + } + + @ParameterizedTest + @MethodSource(value = "testVarianceNonFinite") + void testVarianceNonFinite(double[] values, double expected) { + Variance var = Variance.create(); + for (double value : values) { + var.accept(value); + } + Assertions.assertEquals(expected, var.getAsDouble(), "variance non-finite"); + Assertions.assertEquals(expected, Variance.of(values).getAsDouble(), "of (values) non-finite"); + } + + @ParameterizedTest + @MethodSource(value = "testVarianceNonFinite") + void testParallelStreamNonFinite(double[] values, double expected) { + double ans = Arrays.stream(values) + .parallel() + .collect(Variance::create, Variance::accept, Variance::combine) + .getAsDouble(); + Assertions.assertEquals(expected, ans, "parallel stream non-finite"); + } + + @ParameterizedTest + @MethodSource(value = "testVarianceNonFinite") + void testVarianceRandomOrderNonFinite(double[] values, double expected) { + UniformRandomProvider rng = TestHelper.createRNG(); + for (int i = 1; i <= 10; i++) { + testVarianceNonFinite(TestHelper.shuffle(rng, values), expected); + testParallelStreamNonFinite(TestHelper.shuffle(rng, values), expected); + } + } + static Stream<Arguments> testVarianceNonFinite() { + return Stream.of( + Arguments.of(new double[] {}, Double.NaN), + Arguments.of(new double[] {Double.POSITIVE_INFINITY, Double.NEGATIVE_INFINITY}, + Double.NaN), + Arguments.of(new double[] {Double.POSITIVE_INFINITY, Double.POSITIVE_INFINITY}, + Double.NaN), + Arguments.of(new double[] {Double.NEGATIVE_INFINITY, Double.NEGATIVE_INFINITY}, + Double.NaN), + Arguments.of(new double[] {Double.POSITIVE_INFINITY, Double.MAX_VALUE}, + Double.NaN), + Arguments.of(new double[] {Double.NEGATIVE_INFINITY, -Double.MIN_VALUE}, + Double.NaN), + Arguments.of(new double[] {Double.NaN, +0.0d, -0.0d, Double.POSITIVE_INFINITY, + Double.NEGATIVE_INFINITY}, Double.NaN), + Arguments.of(new double[] {Double.NaN, 34.56, 89.74}, Double.NaN), + Arguments.of(new double[] {34.56, Double.NaN, 89.74}, Double.NaN), + Arguments.of(new double[] {34.56, 89.74, Double.NaN}, Double.NaN), + Arguments.of(new double[] {Double.NaN, 3.14, Double.NaN, Double.NaN}, + Double.NaN), + Arguments.of(new double[] {Double.NaN, Double.NaN, Double.NaN}, Double.NaN), + Arguments.of(new double[] {Double.NEGATIVE_INFINITY, Double.MAX_VALUE}, + Double.NaN), + Arguments.of(new double[] {Double.POSITIVE_INFINITY, Double.POSITIVE_INFINITY, + Double.POSITIVE_INFINITY, Double.POSITIVE_INFINITY}, Double.NaN), + Arguments.of(new double[] {-Double.MAX_VALUE, Double.POSITIVE_INFINITY}, + Double.NaN) + ); + } + + @ParameterizedTest + @MethodSource(value = "testCombine") + void testCombine(double[] array1, double[] array2) { + final double[] combinedArray = TestHelper.concatenate(array1, array2); + final double expected = computeExpectedVariance(combinedArray); + Variance var1 = Variance.create(); + Variance var2 = Variance.create(); + Arrays.stream(array1).forEach(var1); + Arrays.stream(array2).forEach(var2); + final double var2BeforeCombine = var2.getAsDouble(); + var1.combine(var2); + TestHelper.assertEquals(expected, var1.getAsDouble(), 7, () -> "combine"); + Assertions.assertEquals(var2BeforeCombine, var2.getAsDouble()); + } + + @ParameterizedTest + @MethodSource(value = "testCombine") + void testCombineRandomOrder(double[] array1, double[] array2) { + UniformRandomProvider rng = TestHelper.createRNG(); + double[] data = TestHelper.concatenate(array1, array2); + int n = array1.length; + for (int i = 1; i <= 10; i++) { + for (int j = 1; j <= 10; j++) { + TestHelper.shuffle(rng, array1); + TestHelper.shuffle(rng, array2); + testCombine(array1, array2); + } + TestHelper.shuffle(rng, data); + System.arraycopy(data, 0, array1, 0, n); + System.arraycopy(data, n, array2, 0, array2.length); + testCombine(array1, array2); + } + } + + @ParameterizedTest + @MethodSource(value = "testCombine") + void testArrayOfArrays(double[] array1, double[] array2) { + final double[] combinedArray = TestHelper.concatenate(array1, array2); + final double expected = computeExpectedVariance(combinedArray); + final double[][] values = {array1, array2}; + double actual = Arrays.stream(values) + .map(Variance::of) + .reduce(Variance::combine) + .map(Variance::getAsDouble) + .orElseThrow(RuntimeException::new); + TestHelper.assertEquals(expected, actual, 2, () -> "array of arrays combined variance"); + } + + static Stream<Arguments> testCombine() { + return Stream.of( + Arguments.of(new double[] {}, new double[] {1}), + Arguments.of(new double[] {1}, new double[] {}), + Arguments.of(new double[] {}, new double[] {1, 7, -15, 3}), + Arguments.of(new double[] {0}, new double[] {0, 0.0}), + Arguments.of(new double[] {4, 8, -6, 3, 18}, new double[] {1, -7, 6}), + Arguments.of(new double[] {10, 8, 13, 9, 11, 14, 6, 4, 12, 7, 5}, new double[] {8, 8, 8, 8, 8, 8, 8, 19, 8, 8, 8}), + Arguments.of(new double[] {10, 8, 13, 9, 11, 14, 6, 4, 12, 7, 5}, new double[] {7.46, 6.77, 12.74, 7.11, 7.81, 8.84, 6.08, 5.39, 8.15, 6.42, 5.73}), + Arguments.of(new double[] {6.0, -1.32, -5.78, 8.967, 13.32, -9.67, 0.14, 7.321, 11.456, -3.111}, new double[] {2, 2, 2, 2}), + Arguments.of(new double[] {2.3}, new double[] {-42, 10, -88, 5, -17}), + Arguments.of(new double[] {-20, 34.983, -12.745, 28.12, -8.34, 42, -4, 16}, new double[] {3.14, 2.718, 1.414}), + Arguments.of(new double[] {12.5, 12.0, 11.8, 14.2, 14.9, 14.5, 21.0, 8.2, 10.3, 11.3, 14.1, 9.9}, new double[] {12.2, 12.0, 12.1, 11.0, 19.8, 11.0, 10.0, 8.8, 9.0, 12.3}), + Arguments.of(new double[] {-0.0}, new double[] {+0.0}), + Arguments.of(new double[] {0.0}, new double[] {-0.0}), + Arguments.of(new double[] {0.0}, new double[] {+0.0}), + Arguments.of(new double[] {10E-50, 5E-100}, new double[] {25E-200, 35.345E-50}), + Arguments.of(new double[] {Double.MAX_VALUE}, new double[] {Double.MAX_VALUE}), + Arguments.of(new double[] {-Double.MAX_VALUE}, new double[] {-Double.MAX_VALUE}), + Arguments.of(new double[] {-Double.MAX_VALUE, 1}, new double[] {1}), + Arguments.of(new double[] {Double.MAX_VALUE, 3.1415E153}, new double[] {}), + Arguments.of(new double[] {Double.MAX_VALUE}, new double[] {-Double.MAX_VALUE}), + Arguments.of(new double[] {1}, new double[] {-Double.MAX_VALUE}), + Arguments.of(new double[] {1, 1, 1}, new double[] {-Double.MAX_VALUE}), + Arguments.of(new double[] {Double.MAX_VALUE}, new double[] {1, 1E300}) + ); + } + + @ParameterizedTest + @MethodSource(value = "testCombineNonFinite") + void testCombineNonFinite(double[][] values, double expected) { + Variance var1 = Variance.create(); + Variance var2 = Variance.create(); + Arrays.stream(values[0]).forEach(var1); + Arrays.stream(values[1]).forEach(var2); + double mean2BeforeCombine = var2.getAsDouble(); + var1.combine(var2); + Assertions.assertEquals(expected, var1.getAsDouble(), "combine non-finite"); + Assertions.assertEquals(mean2BeforeCombine, var2.getAsDouble()); + } + + @ParameterizedTest + @MethodSource(value = "testCombineNonFinite") + void testCombineRandomOrderNonFinite(double[][] values, double expected) { + UniformRandomProvider rng = TestHelper.createRNG(); + double[] data = TestHelper.concatenate(values[0], values[1]); + int n = values[0].length; + for (int i = 1; i <= 10; i++) { + for (int j = 1; j <= 10; j++) { + TestHelper.shuffle(rng, values[0]); + TestHelper.shuffle(rng, values[1]); + testCombineNonFinite(values, expected); + } + TestHelper.shuffle(rng, data); + System.arraycopy(data, 0, values[0], 0, n); + System.arraycopy(data, n, values[1], 0, values[1].length); + testCombineNonFinite(values, expected); + } + } + + @ParameterizedTest + @MethodSource(value = "testCombineNonFinite") + void testArrayOfArraysNonFinite(double[][] values, double expected) { + double actual = Arrays.stream(values) + .map(Variance::of) + .reduce(Variance::combine) + .map(Variance::getAsDouble) + .orElseThrow(RuntimeException::new); + Assertions.assertEquals(expected, actual, "array of arrays combined variance non-finite"); + } + + static Stream<Arguments> testCombineNonFinite() { + return Stream.of( + Arguments.of(new double[][] {{}, {}}, Double.NaN), + Arguments.of(new double[][] {{Double.POSITIVE_INFINITY}, {Double.NEGATIVE_INFINITY}}, Double.NaN), + Arguments.of(new double[][] {{Double.POSITIVE_INFINITY}, {Double.POSITIVE_INFINITY}}, Double.NaN), + Arguments.of(new double[][] {{Double.NEGATIVE_INFINITY}, {Double.NEGATIVE_INFINITY}}, Double.NaN), + Arguments.of(new double[][] {{Double.POSITIVE_INFINITY}, {Double.MAX_VALUE}}, Double.NaN), + Arguments.of(new double[][] {{-Double.MAX_VALUE}, {Double.POSITIVE_INFINITY}}, Double.NaN), + Arguments.of(new double[][] {{Double.NEGATIVE_INFINITY}, {-Double.MIN_VALUE}}, Double.NaN), + Arguments.of(new double[][] {{Double.NaN, 34.56, 89.74}, {Double.NaN}}, Double.NaN), + Arguments.of(new double[][] {{34.56}, {Double.NaN, 89.74}}, Double.NaN), + Arguments.of(new double[][] {{34.56, 89.74}, {Double.NaN, Double.NaN}}, Double.NaN), + Arguments.of(new double[][] {{Double.NaN, 3.14, Double.NaN, Double.NaN}, {}}, Double.NaN), + Arguments.of(new double[][] {{Double.NaN, Double.NaN, Double.NaN}, {Double.NaN, Double.NaN, Double.NaN}}, Double.NaN), + Arguments.of(new double[][] {{Double.NEGATIVE_INFINITY, -Double.MAX_VALUE, -Double.MIN_VALUE}, {Double.MAX_VALUE, Double.MIN_VALUE}}, Double.NaN) + ); + } + + // Helper function to compute the expected value of Variance using BigDecimal. + static double computeExpectedVariance(double[] values) { + long n = values.length; + if (n == 1) { + return BigDecimal.ZERO.doubleValue(); + } + double mean = TestHelper.computeExpectedMean(values); + BigDecimal bd = BigDecimal.ZERO; + for (double value : values) { + BigDecimal bdDiff = BigDecimal.valueOf(value) + .subtract(BigDecimal.valueOf(mean)); + bdDiff = bdDiff.multiply(bdDiff); + bd = bd.add(bdDiff); Review Comment: It may be more efficient to use bdDiff.pow(2). This internally can use BigInteger.square and can factor out the common power of 2 in the underlying integer. ########## commons-statistics-descriptive/src/test/java/org/apache/commons/statistics/descriptive/VarianceTest.java: ########## @@ -0,0 +1,320 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.commons.statistics.descriptive; + +import java.math.BigDecimal; +import java.math.MathContext; +import java.util.Arrays; +import java.util.stream.Stream; +import org.apache.commons.rng.UniformRandomProvider; +import org.junit.jupiter.api.Assertions; +import org.junit.jupiter.api.Test; +import org.junit.jupiter.params.ParameterizedTest; +import org.junit.jupiter.params.provider.Arguments; +import org.junit.jupiter.params.provider.MethodSource; + +/** + * Test for {@link Variance}. + */ +final class VarianceTest { + + @Test + void testEmpty() { + Variance var = Variance.create(); + Assertions.assertEquals(Double.NaN, var.getAsDouble()); + } + + @ParameterizedTest + @MethodSource(value = "testVariance") + void testVariance(double[] values) { + double expected = computeExpectedVariance(values); + Variance var = Variance.create(); + for (double value : values) { + var.accept(value); + } + TestHelper.assertEquals(expected, var.getAsDouble(), 7, () -> "variance"); Review Comment: These ULP values may have to be lowered when you correct the computation of the expected variance. Using BigDecimal.valueOf(double) may have introduced quite a difference. ########## commons-statistics-descriptive/src/test/java/org/apache/commons/statistics/descriptive/VarianceTest.java: ########## @@ -0,0 +1,320 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.commons.statistics.descriptive; + +import java.math.BigDecimal; +import java.math.MathContext; +import java.util.Arrays; +import java.util.stream.Stream; +import org.apache.commons.rng.UniformRandomProvider; +import org.junit.jupiter.api.Assertions; +import org.junit.jupiter.api.Test; +import org.junit.jupiter.params.ParameterizedTest; +import org.junit.jupiter.params.provider.Arguments; +import org.junit.jupiter.params.provider.MethodSource; + +/** + * Test for {@link Variance}. + */ +final class VarianceTest { + + @Test + void testEmpty() { + Variance var = Variance.create(); + Assertions.assertEquals(Double.NaN, var.getAsDouble()); + } + + @ParameterizedTest + @MethodSource(value = "testVariance") + void testVariance(double[] values) { + double expected = computeExpectedVariance(values); + Variance var = Variance.create(); + for (double value : values) { + var.accept(value); + } + TestHelper.assertEquals(expected, var.getAsDouble(), 7, () -> "variance"); + TestHelper.assertEquals(expected, Variance.of(values).getAsDouble(), 5, () -> "of (values)"); + } + + @ParameterizedTest + @MethodSource(value = "testVariance") + void testParallelStream(double[] values) { + double expected = computeExpectedVariance(values); + double actual = Arrays.stream(values) + .parallel() + .collect(Variance::create, Variance::accept, Variance::combine) + .getAsDouble(); + TestHelper.assertEquals(expected, actual, 7, () -> "parallel stream"); + } + + @ParameterizedTest + @MethodSource(value = "testVariance") + void testVarianceRandomOrder(double[] values) { + UniformRandomProvider rng = TestHelper.createRNG(); + for (int i = 1; i <= 10; i++) { + testVariance(TestHelper.shuffle(rng, values)); + testParallelStream(TestHelper.shuffle(rng, values)); + } + } + + static Stream<Arguments> testVariance() { + return Stream.of( + Arguments.of(new double[] {10, 8, 13, 9, 11, 14, 6, 4, 12, 7, 5}), + Arguments.of(new double[] {8.04, 6.95, 7.58, 8.81, 8.33, 9.96, 7.24, 4.26, 10.84, 4.82, 5.68}), + Arguments.of(new double[] {9.14, 8.14, 8.74, 8.77, 9.26, 8.10, 6.13, 3.10, 9.13, 7.26, 4.74, 7.46, 6.77, 12.74, 7.11, 7.81, 8.84, 6.08, 5.39, 8.15, 6.42, 5.73}), + Arguments.of(new double[] {8, 8, 8, 8, 8, 8, 8, 19, 8, 8, 8}), + Arguments.of(new double[] {6.58, 5.76, 7.71, 8.84, 8.47, 7.04, 5.25, 12.50, 5.56, 7.91, 6.89}), + Arguments.of(new double[] {0, 0, 0.0}), + Arguments.of(new double[] {1, -7, 6}), + Arguments.of(new double[] {1, 7, -15, 3}), + Arguments.of(new double[] {2, 2, 2, 2}), + Arguments.of(new double[] {2.3}), + Arguments.of(new double[] {3.14, 2.718, 1.414}), + Arguments.of(new double[] {12.5, 12.0, 11.8, 14.2, 14.9, 14.5, 21.0, 8.2, 10.3, 11.3, 14.1, 9.9, 12.2, 12.0, 12.1, 11.0, 19.8, 11.0, 10.0, 8.8, 9.0, 12.3}), + Arguments.of(new double[] {-0.0, +0.0}), + Arguments.of(new double[] {0.0, -0.0}), + Arguments.of(new double[] {0.0, +0.0}), + Arguments.of(new double[] {0.001, 0.0002, 0.00003, 10000.11, 0.000004}), + Arguments.of(new double[] {10E-50, 5E-100, 25E-200, 35.345E-50}), + // Overflow of the sum + Arguments.of(new double[] {Double.MAX_VALUE, Double.MAX_VALUE}), + Arguments.of(new double[] {-Double.MAX_VALUE, -Double.MAX_VALUE}), + Arguments.of(new double[] {Double.MAX_VALUE, 1}), + Arguments.of(new double[] {-Double.MAX_VALUE, 1, 1}), + Arguments.of(new double[] {-Double.MAX_VALUE, -1, 1}), + Arguments.of(new double[] {Double.MAX_VALUE, -1}), + Arguments.of(new double[] {Double.MAX_VALUE, -Double.MAX_VALUE}), + Arguments.of(new double[] {1, -Double.MAX_VALUE}), + Arguments.of(new double[] {1, 1, 1, -Double.MAX_VALUE}), + Arguments.of(new double[] {Double.MAX_VALUE, Double.MAX_VALUE / 2}) + ); + } + + @ParameterizedTest + @MethodSource(value = "testVarianceNonFinite") + void testVarianceNonFinite(double[] values, double expected) { + Variance var = Variance.create(); + for (double value : values) { + var.accept(value); + } + Assertions.assertEquals(expected, var.getAsDouble(), "variance non-finite"); + Assertions.assertEquals(expected, Variance.of(values).getAsDouble(), "of (values) non-finite"); + } + + @ParameterizedTest + @MethodSource(value = "testVarianceNonFinite") + void testParallelStreamNonFinite(double[] values, double expected) { + double ans = Arrays.stream(values) + .parallel() + .collect(Variance::create, Variance::accept, Variance::combine) + .getAsDouble(); + Assertions.assertEquals(expected, ans, "parallel stream non-finite"); + } + + @ParameterizedTest + @MethodSource(value = "testVarianceNonFinite") + void testVarianceRandomOrderNonFinite(double[] values, double expected) { + UniformRandomProvider rng = TestHelper.createRNG(); + for (int i = 1; i <= 10; i++) { + testVarianceNonFinite(TestHelper.shuffle(rng, values), expected); + testParallelStreamNonFinite(TestHelper.shuffle(rng, values), expected); + } + } + static Stream<Arguments> testVarianceNonFinite() { + return Stream.of( + Arguments.of(new double[] {}, Double.NaN), + Arguments.of(new double[] {Double.POSITIVE_INFINITY, Double.NEGATIVE_INFINITY}, + Double.NaN), + Arguments.of(new double[] {Double.POSITIVE_INFINITY, Double.POSITIVE_INFINITY}, + Double.NaN), + Arguments.of(new double[] {Double.NEGATIVE_INFINITY, Double.NEGATIVE_INFINITY}, + Double.NaN), + Arguments.of(new double[] {Double.POSITIVE_INFINITY, Double.MAX_VALUE}, + Double.NaN), + Arguments.of(new double[] {Double.NEGATIVE_INFINITY, -Double.MIN_VALUE}, + Double.NaN), + Arguments.of(new double[] {Double.NaN, +0.0d, -0.0d, Double.POSITIVE_INFINITY, + Double.NEGATIVE_INFINITY}, Double.NaN), + Arguments.of(new double[] {Double.NaN, 34.56, 89.74}, Double.NaN), + Arguments.of(new double[] {34.56, Double.NaN, 89.74}, Double.NaN), + Arguments.of(new double[] {34.56, 89.74, Double.NaN}, Double.NaN), + Arguments.of(new double[] {Double.NaN, 3.14, Double.NaN, Double.NaN}, + Double.NaN), + Arguments.of(new double[] {Double.NaN, Double.NaN, Double.NaN}, Double.NaN), + Arguments.of(new double[] {Double.NEGATIVE_INFINITY, Double.MAX_VALUE}, + Double.NaN), + Arguments.of(new double[] {Double.POSITIVE_INFINITY, Double.POSITIVE_INFINITY, + Double.POSITIVE_INFINITY, Double.POSITIVE_INFINITY}, Double.NaN), + Arguments.of(new double[] {-Double.MAX_VALUE, Double.POSITIVE_INFINITY}, + Double.NaN) + ); + } + + @ParameterizedTest + @MethodSource(value = "testCombine") + void testCombine(double[] array1, double[] array2) { + final double[] combinedArray = TestHelper.concatenate(array1, array2); + final double expected = computeExpectedVariance(combinedArray); + Variance var1 = Variance.create(); + Variance var2 = Variance.create(); + Arrays.stream(array1).forEach(var1); + Arrays.stream(array2).forEach(var2); + final double var2BeforeCombine = var2.getAsDouble(); + var1.combine(var2); + TestHelper.assertEquals(expected, var1.getAsDouble(), 7, () -> "combine"); + Assertions.assertEquals(var2BeforeCombine, var2.getAsDouble()); + } + + @ParameterizedTest + @MethodSource(value = "testCombine") + void testCombineRandomOrder(double[] array1, double[] array2) { + UniformRandomProvider rng = TestHelper.createRNG(); + double[] data = TestHelper.concatenate(array1, array2); + int n = array1.length; + for (int i = 1; i <= 10; i++) { + for (int j = 1; j <= 10; j++) { + TestHelper.shuffle(rng, array1); + TestHelper.shuffle(rng, array2); + testCombine(array1, array2); + } + TestHelper.shuffle(rng, data); + System.arraycopy(data, 0, array1, 0, n); + System.arraycopy(data, n, array2, 0, array2.length); + testCombine(array1, array2); + } + } + + @ParameterizedTest + @MethodSource(value = "testCombine") + void testArrayOfArrays(double[] array1, double[] array2) { + final double[] combinedArray = TestHelper.concatenate(array1, array2); + final double expected = computeExpectedVariance(combinedArray); + final double[][] values = {array1, array2}; + double actual = Arrays.stream(values) + .map(Variance::of) + .reduce(Variance::combine) + .map(Variance::getAsDouble) + .orElseThrow(RuntimeException::new); + TestHelper.assertEquals(expected, actual, 2, () -> "array of arrays combined variance"); + } + + static Stream<Arguments> testCombine() { + return Stream.of( + Arguments.of(new double[] {}, new double[] {1}), + Arguments.of(new double[] {1}, new double[] {}), + Arguments.of(new double[] {}, new double[] {1, 7, -15, 3}), + Arguments.of(new double[] {0}, new double[] {0, 0.0}), + Arguments.of(new double[] {4, 8, -6, 3, 18}, new double[] {1, -7, 6}), + Arguments.of(new double[] {10, 8, 13, 9, 11, 14, 6, 4, 12, 7, 5}, new double[] {8, 8, 8, 8, 8, 8, 8, 19, 8, 8, 8}), + Arguments.of(new double[] {10, 8, 13, 9, 11, 14, 6, 4, 12, 7, 5}, new double[] {7.46, 6.77, 12.74, 7.11, 7.81, 8.84, 6.08, 5.39, 8.15, 6.42, 5.73}), + Arguments.of(new double[] {6.0, -1.32, -5.78, 8.967, 13.32, -9.67, 0.14, 7.321, 11.456, -3.111}, new double[] {2, 2, 2, 2}), + Arguments.of(new double[] {2.3}, new double[] {-42, 10, -88, 5, -17}), + Arguments.of(new double[] {-20, 34.983, -12.745, 28.12, -8.34, 42, -4, 16}, new double[] {3.14, 2.718, 1.414}), + Arguments.of(new double[] {12.5, 12.0, 11.8, 14.2, 14.9, 14.5, 21.0, 8.2, 10.3, 11.3, 14.1, 9.9}, new double[] {12.2, 12.0, 12.1, 11.0, 19.8, 11.0, 10.0, 8.8, 9.0, 12.3}), + Arguments.of(new double[] {-0.0}, new double[] {+0.0}), + Arguments.of(new double[] {0.0}, new double[] {-0.0}), + Arguments.of(new double[] {0.0}, new double[] {+0.0}), + Arguments.of(new double[] {10E-50, 5E-100}, new double[] {25E-200, 35.345E-50}), + Arguments.of(new double[] {Double.MAX_VALUE}, new double[] {Double.MAX_VALUE}), + Arguments.of(new double[] {-Double.MAX_VALUE}, new double[] {-Double.MAX_VALUE}), + Arguments.of(new double[] {-Double.MAX_VALUE, 1}, new double[] {1}), + Arguments.of(new double[] {Double.MAX_VALUE, 3.1415E153}, new double[] {}), + Arguments.of(new double[] {Double.MAX_VALUE}, new double[] {-Double.MAX_VALUE}), + Arguments.of(new double[] {1}, new double[] {-Double.MAX_VALUE}), + Arguments.of(new double[] {1, 1, 1}, new double[] {-Double.MAX_VALUE}), + Arguments.of(new double[] {Double.MAX_VALUE}, new double[] {1, 1E300}) + ); + } + + @ParameterizedTest + @MethodSource(value = "testCombineNonFinite") + void testCombineNonFinite(double[][] values, double expected) { + Variance var1 = Variance.create(); + Variance var2 = Variance.create(); + Arrays.stream(values[0]).forEach(var1); + Arrays.stream(values[1]).forEach(var2); + double mean2BeforeCombine = var2.getAsDouble(); + var1.combine(var2); + Assertions.assertEquals(expected, var1.getAsDouble(), "combine non-finite"); + Assertions.assertEquals(mean2BeforeCombine, var2.getAsDouble()); + } + + @ParameterizedTest + @MethodSource(value = "testCombineNonFinite") + void testCombineRandomOrderNonFinite(double[][] values, double expected) { + UniformRandomProvider rng = TestHelper.createRNG(); + double[] data = TestHelper.concatenate(values[0], values[1]); + int n = values[0].length; + for (int i = 1; i <= 10; i++) { + for (int j = 1; j <= 10; j++) { + TestHelper.shuffle(rng, values[0]); + TestHelper.shuffle(rng, values[1]); + testCombineNonFinite(values, expected); + } + TestHelper.shuffle(rng, data); + System.arraycopy(data, 0, values[0], 0, n); + System.arraycopy(data, n, values[1], 0, values[1].length); + testCombineNonFinite(values, expected); + } + } + + @ParameterizedTest + @MethodSource(value = "testCombineNonFinite") + void testArrayOfArraysNonFinite(double[][] values, double expected) { + double actual = Arrays.stream(values) + .map(Variance::of) + .reduce(Variance::combine) + .map(Variance::getAsDouble) + .orElseThrow(RuntimeException::new); + Assertions.assertEquals(expected, actual, "array of arrays combined variance non-finite"); + } + + static Stream<Arguments> testCombineNonFinite() { + return Stream.of( + Arguments.of(new double[][] {{}, {}}, Double.NaN), + Arguments.of(new double[][] {{Double.POSITIVE_INFINITY}, {Double.NEGATIVE_INFINITY}}, Double.NaN), + Arguments.of(new double[][] {{Double.POSITIVE_INFINITY}, {Double.POSITIVE_INFINITY}}, Double.NaN), + Arguments.of(new double[][] {{Double.NEGATIVE_INFINITY}, {Double.NEGATIVE_INFINITY}}, Double.NaN), + Arguments.of(new double[][] {{Double.POSITIVE_INFINITY}, {Double.MAX_VALUE}}, Double.NaN), + Arguments.of(new double[][] {{-Double.MAX_VALUE}, {Double.POSITIVE_INFINITY}}, Double.NaN), + Arguments.of(new double[][] {{Double.NEGATIVE_INFINITY}, {-Double.MIN_VALUE}}, Double.NaN), + Arguments.of(new double[][] {{Double.NaN, 34.56, 89.74}, {Double.NaN}}, Double.NaN), + Arguments.of(new double[][] {{34.56}, {Double.NaN, 89.74}}, Double.NaN), + Arguments.of(new double[][] {{34.56, 89.74}, {Double.NaN, Double.NaN}}, Double.NaN), + Arguments.of(new double[][] {{Double.NaN, 3.14, Double.NaN, Double.NaN}, {}}, Double.NaN), + Arguments.of(new double[][] {{Double.NaN, Double.NaN, Double.NaN}, {Double.NaN, Double.NaN, Double.NaN}}, Double.NaN), + Arguments.of(new double[][] {{Double.NEGATIVE_INFINITY, -Double.MAX_VALUE, -Double.MIN_VALUE}, {Double.MAX_VALUE, Double.MIN_VALUE}}, Double.NaN) + ); + } + + // Helper function to compute the expected value of Variance using BigDecimal. + static double computeExpectedVariance(double[] values) { + long n = values.length; + if (n == 1) { + return BigDecimal.ZERO.doubleValue(); + } + double mean = TestHelper.computeExpectedMean(values); + BigDecimal bd = BigDecimal.ZERO; + for (double value : values) { + BigDecimal bdDiff = BigDecimal.valueOf(value) Review Comment: Do not use `BigDecimal.valueOf`. This uses a text representation of the double. To get the exact representation you should use `new BigDecimal`. Note: Computing the BigDecimal mean should be done outside the loop. ########## commons-statistics-descriptive/src/test/java/org/apache/commons/statistics/descriptive/TestHelper.java: ########## @@ -43,6 +45,19 @@ static double[] concatenate(double[]... arrays) { .toArray(); } + /** + * Helper function to compute the expected value of Mean using BigDecimal. + * @param values Values. + * @return Mean of values. + */ + static double computeExpectedMean(double[] values) { Review Comment: I would return the BigDecimal here. Add in the javadoc that the result is rounded to DECIMAL128 precision. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@commons.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org