[ 
https://issues.apache.org/jira/browse/STATISTICS-54?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17707624#comment-17707624
 ] 

Alex Herbert commented on STATISTICS-54:
----------------------------------------

Hi,

Thanks for your interest in the project. Your proof of concept is how the usage 
would work for a single statistic. However note that the project would not 
create accumulators for the code in Commons Math. We would incorporate the 
functionality directly into a new module in the Statistics project. Each 
storeless univariate statistic can potentially be used as a stream collector if 
it can be combined with another instances of itself, e.g.:
{code:java}
public class Mean implements DoubleSupplier {
    public static Mean of(double... values);
    public static Mean create(); // Could provide an implementation choice
    public Mean add(double);
    public Mean add(Mean);
    public double getAsDouble();
}

double mean = Arrays.stream(new double[]{1.0, 2.0, 3.0, 4.0, -1.0})
        .collect(Mean::create, Mean::add, Mean::add)
        .getAsDouble();{code}
The subject of how to combine multiple statistics from the same stream is open 
to discussion, e.g. the following can support a set of available statistics; 
all implementations are chosen from an enum:
{code:java}
// Supported statistics
public enum Statistic {
    Mean;
} 

public class DoubleStatisticSummary implements ToDoubleFunction<Statistic> {
    public static DoubleStatisticSummary of(Statistic... statistics);
    public DoubleStatisticSummary add(double v);
    public DoubleStatisticSummary add(DoubleStatisticSummary other);
    public double applyAsDouble(Statistic s);
}

DoubleStatisticSummary s = Arrays.stream(new double[]{1.0, 2.0, 3.0, 4.0, -1.0})
    .collect(() -> StatisticSummary.of(Statistic.Mean),
             StatisticSummary::add, StatisticSummary::add);
double mean = s.get(Statistic.Mean);
{code}
Internal implementation TBD.

However the summary does not allow extension for a user to add their own 
collector for the double values into the summary.

I think the project would be several parts:
 # Implement collectors for all applicable univariate statistics
 # Allow collectors to work on streams of individual values, or all values at a 
single time (e.g. an array)
 # Add support for a combination of collectors
 # Expand support to IntStream and LongStream

 

> [GSoC] Summary statistics API for Java 8 streams
> ------------------------------------------------
>
>                 Key: STATISTICS-54
>                 URL: https://issues.apache.org/jira/browse/STATISTICS-54
>             Project: Commons Statistics
>          Issue Type: Wish
>          Components: descriptive
>            Reporter: Alex Herbert
>            Priority: Minor
>              Labels: full-time, gsoc, gsoc2022, gsoc2023
>             Fix For: 1.0
>
>
> Placeholder for tasks that could be undertaken in this year's 
> [GSoC|https://summerofcode.withgoogle.com/].
> Ideas:
> - Design an updated summary statistics API for use with Java 8 streams based 
> on the summary statistic implementations in the Commons Math 
> {{stat.descriptive}} package including {{{}moments{}}}, {{rank}} and 
> {{summary}} sub-packages.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to