[
https://issues.apache.org/jira/browse/STATISTICS-54?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17707700#comment-17707700
]
Anirudh Joshi commented on STATISTICS-54:
-----------------------------------------
Thanks Alex and Gilles for your inputs. It really gave me clarity on a few
things I was confused about. I had a few questions about some of our choices
# When we say "We would incorporate the functionality directly into a new
module in the Statistics project" this means we would create a new module in
the [commons-statistics|https://github.com/apache/commons-statistics] project,
porting over the StorelessUnivariateStatistic currently implemented in
[commons-math-legacy/stat/descriptive|https://github.com/apache/commons-math/tree/master/commons-math-legacy/src/main/java/org/apache/commons/math4/legacy/stat/descriptive]
package. Is my understanding correct ?
# Among the StorelessUnivariateStatistic implementations we have, not all are
part of SummaryStatistics class (E.g. Kurtosis, PSquarePercentile). Would the
goal of this project be to implement collectors on all implementations of
StorelessUnivariateStatistic or just the statistic values we compute as part of
SummaryStatistics ?
# About the question of whether we need "a separate stream version for each of
the statistics", I would like to know if we plan to support callers passing
custom `statistic implementation` while computing SummaryStatistics ? If the
goal is to support this, I feel it is better we have a stream version for each
individual statistic and make the implementation a composition of
SummaryStatistics class. That way we can allow for callers to supply their own
custom implementations if need be. But as Gilles noted, this would duplicate a
lot of computations between classes (classic example being Variance and
StandardDeviation where each would compute the same `square of deviations from
the mean` separately) and this would be sub optimal in my opinion. May be we
can group the common functionality in a separate class and share this between
the depending classes (SumOfSquareDeviationFromMean class for my example above
and initialize this class inside the SummaryStatistics and initialize both
Variance and StandardDeviation classes with this instance if called from
SummaryStatistics). I would like to know your thoughts on this.
{code:java}
public final class SumOfSquareDeviationFromMean {
// Implementation
}
public class Variance implements DoubleSupplier {
public static Variance of(double... values);
public static Variance create(); // Could provide an implementation choice
public static Variance with(SumOfSquareDeviationFromMean); // called from
SummaryStatistics
public Variance add(double);
public Variance add(Variance);
public double getAsDouble();
}
public class StandardDeviation implements DoubleSupplier {
public static StandardDeviation of(double... values);
public static StandardDeviation create(); // Could provide an
implementation choice
public static StandardDeviation with(SumOfSquareDeviationFromMean); //
called from SummaryStatistics
public StandardDeviation add(double);
public StandardDeviation add(StandardDeviation);
public StandardDeviation getAsDouble();
}{code}
Please let me know. Thanks again for your inputs. This is helping me gain more
clarity into the work to be done.
> [GSoC] Summary statistics API for Java 8 streams
> ------------------------------------------------
>
> Key: STATISTICS-54
> URL: https://issues.apache.org/jira/browse/STATISTICS-54
> Project: Commons Statistics
> Issue Type: Wish
> Components: descriptive
> Reporter: Alex Herbert
> Priority: Minor
> Labels: full-time, gsoc, gsoc2022, gsoc2023
> Fix For: 1.0
>
>
> Placeholder for tasks that could be undertaken in this year's
> [GSoC|https://summerofcode.withgoogle.com/].
> Ideas:
> - Design an updated summary statistics API for use with Java 8 streams based
> on the summary statistic implementations in the Commons Math
> {{stat.descriptive}} package including {{{}moments{}}}, {{rank}} and
> {{summary}} sub-packages.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)