[
https://issues.apache.org/jira/browse/STATISTICS-54?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17707624#comment-17707624
]
Alex Herbert commented on STATISTICS-54:
----------------------------------------
Hi,
Thanks for your interest in the project. Your proof of concept is how the usage
would work for a single statistic. However note that the project would not
create accumulators for the code in Commons Math. We would incorporate the
functionality directly into a new module in the Statistics project. Each
storeless univariate statistic can potentially be used as a stream collector if
it can be combined with another instances of itself, e.g.:
{code:java}
public class Mean implements DoubleSupplier {
public static Mean of(double... values);
public static Mean create(); // Could provide an implementation choice
public Mean add(double);
public Mean add(Mean);
public double getAsDouble();
}
double mean = Arrays.stream(new double[]{1.0, 2.0, 3.0, 4.0, -1.0})
.collect(Mean::create, Mean::add, Mean::add)
.getAsDouble();{code}
The subject of how to combine multiple statistics from the same stream is open
to discussion, e.g. the following can support a set of available statistics;
all implementations are chosen from an enum:
{code:java}
// Supported statistics
public enum Statistic {
Mean;
}
public class DoubleStatisticSummary implements ToDoubleFunction<Statistic> {
public static DoubleStatisticSummary of(Statistic... statistics);
public DoubleStatisticSummary add(double v);
public DoubleStatisticSummary add(DoubleStatisticSummary other);
public double applyAsDouble(Statistic s);
}
DoubleStatisticSummary s = Arrays.stream(new double[]{1.0, 2.0, 3.0, 4.0, -1.0})
.collect(() -> StatisticSummary.of(Statistic.Mean),
StatisticSummary::add, StatisticSummary::add);
double mean = s.get(Statistic.Mean);
{code}
Internal implementation TBD.
However the summary does not allow extension for a user to add their own
collector for the double values into the summary.
I think the project would be several parts:
# Implement collectors for all applicable univariate statistics
# Allow collectors to work on streams of individual values, or all values at a
single time (e.g. an array)
# Add support for a combination of collectors
# Expand support to IntStream and LongStream
> [GSoC] Summary statistics API for Java 8 streams
> ------------------------------------------------
>
> Key: STATISTICS-54
> URL: https://issues.apache.org/jira/browse/STATISTICS-54
> Project: Commons Statistics
> Issue Type: Wish
> Components: descriptive
> Reporter: Alex Herbert
> Priority: Minor
> Labels: full-time, gsoc, gsoc2022, gsoc2023
> Fix For: 1.0
>
>
> Placeholder for tasks that could be undertaken in this year's
> [GSoC|https://summerofcode.withgoogle.com/].
> Ideas:
> - Design an updated summary statistics API for use with Java 8 streams based
> on the summary statistic implementations in the Commons Math
> {{stat.descriptive}} package including {{{}moments{}}}, {{rank}} and
> {{summary}} sub-packages.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)