[ 
https://issues.apache.org/jira/browse/STATISTICS-54?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17707700#comment-17707700
 ] 

Anirudh Joshi commented on STATISTICS-54:
-----------------------------------------

Thanks Alex and Gilles for your inputs. It really gave me clarity on a few 
things I was confused about. I had a few questions about some of our choices
 # When we say "We would incorporate the functionality directly into a new 
module in the Statistics project" this means we would create a new module in 
the [commons-statistics|https://github.com/apache/commons-statistics] project, 
porting over the StorelessUnivariateStatistic currently implemented in 
[commons-math-legacy/stat/descriptive|https://github.com/apache/commons-math/tree/master/commons-math-legacy/src/main/java/org/apache/commons/math4/legacy/stat/descriptive]
 package. Is my understanding correct ?
 # Among the StorelessUnivariateStatistic implementations we have, not all are 
part of SummaryStatistics class (E.g. Kurtosis, PSquarePercentile). Would the 
goal of this project be to implement collectors on all implementations of 
StorelessUnivariateStatistic or just the statistic values we compute as part of 
SummaryStatistics ?
 # About the question of whether we need "a separate stream version for each of 
the statistics", I would like to know if we plan to support callers passing 
custom `statistic implementation` while computing SummaryStatistics ? If the 
goal is to support this, I feel it is better we have a stream version for each 
individual statistic and make the implementation a composition of 
SummaryStatistics class. That way we can allow for callers to supply their own 
custom implementations if need be. But as Gilles noted, this would duplicate a 
lot of computations between classes (classic example being Variance and 
StandardDeviation where each would compute the same `square of deviations from 
the mean` separately) and this would be sub optimal in my opinion. May be we 
can group the common functionality in a separate class and share this between 
the depending classes (SumOfSquareDeviationFromMean class for my example above 
and initialize this class inside the SummaryStatistics and initialize both 
Variance and StandardDeviation classes with this instance if called from 
SummaryStatistics). I would like to know your thoughts on this.

{code:java}
public final class SumOfSquareDeviationFromMean {
// Implementation
}

public class Variance implements DoubleSupplier {
    public static Variance of(double... values);
    public static Variance create(); // Could provide an implementation choice
    public static Variance with(SumOfSquareDeviationFromMean); // called from 
SummaryStatistics    
    public Variance add(double);
    public Variance add(Variance);
    public double getAsDouble();
}

public class StandardDeviation implements DoubleSupplier { 
    public static StandardDeviation of(double... values); 
    public static StandardDeviation create(); // Could provide an 
implementation choice
    public static StandardDeviation with(SumOfSquareDeviationFromMean); // 
called from SummaryStatistics
    public StandardDeviation add(double); 
    public StandardDeviation add(StandardDeviation); 
    public StandardDeviation getAsDouble(); 
}{code}
 

Please let me know. Thanks again for your inputs. This is helping me gain more 
clarity into the work to be done.

> [GSoC] Summary statistics API for Java 8 streams
> ------------------------------------------------
>
>                 Key: STATISTICS-54
>                 URL: https://issues.apache.org/jira/browse/STATISTICS-54
>             Project: Commons Statistics
>          Issue Type: Wish
>          Components: descriptive
>            Reporter: Alex Herbert
>            Priority: Minor
>              Labels: full-time, gsoc, gsoc2022, gsoc2023
>             Fix For: 1.0
>
>
> Placeholder for tasks that could be undertaken in this year's 
> [GSoC|https://summerofcode.withgoogle.com/].
> Ideas:
> - Design an updated summary statistics API for use with Java 8 streams based 
> on the summary statistic implementations in the Commons Math 
> {{stat.descriptive}} package including {{{}moments{}}}, {{rank}} and 
> {{summary}} sub-packages.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to