Hello.

Le mar. 28 mai 2019 à 20:36, Alex Herbert <alex.d.herb...@gmail.com> a écrit :
>
>
>
> > On 28 May 2019, at 18:09, Eric Barnhill <ericbarnh...@gmail.com> wrote:
> >
> > The previous commons-math interface for descriptive statistics used a
> > paradigm of constructing classes for various statistical functions and
> > calling evaluate(). Example
> >
> > Mean mean = new Mean();
> > double mn = mean.evaluate(double[])
> >
> > I wrote this type of code all through grad school and always found it
> > unnecessarily bulky.  To me these summary statistics are classic use cases
> > for static methods:
> >
> > double mean .= Mean.evaluate(double[])
> >
> > I don't have any particular problem with the evaluate() syntax.
> >
> > I looked over the old Math 4 API to see if there were any benefits to the
> > previous class-oriented approach that we might not want to lose. But I
> > don't think there were, the functionality outside of evaluate() is minimal.
>
> A quick check shows that evaluate comes from UnivariateStatistic. This has 
> some more methods that add little to an instance view of the computation:
>
> double evaluate(double[] values) throws MathIllegalArgumentException;
> double evaluate(double[] values, int begin, int length) throws 
> MathIllegalArgumentException;
> UnivariateStatistic copy();
>
> However it is extended by StorelessUnivariateStatistic which adds methods to 
> update the statistic:
>
> void increment(double d);
> void incrementAll(double[] values) throws MathIllegalArgumentException;
> void incrementAll(double[] values, int start, int length) throws 
> MathIllegalArgumentException;
> double getResult();
> long getN();
> void clear();
> StorelessUnivariateStatistic copy();
>
> This type of functionality would be lost by static methods.
>
> If you are moving to a functional interface type pattern for each statistic 
> then you will lose the other functionality possible with an instance state, 
> namely updating with more values or combining instances.
>
> So this is a question of whether updating a statistic is required after the 
> first computation.
>
> Will there be an alternative in the library for a map-reduce type operation 
> using instances that can be combined using Stream.collect:
>
>     <R> R collect(Supplier<R> supplier,
>                   ObjDoubleConsumer<R> accumulator,
>                   BiConsumer<R, R> combiner);
>
> Here <R> would be Mean:
>
> double mean = Arrays.stream(new double[1000]).collect(Mean::new, Mean::add, 
> Mean::add).getMean() with:
>
> void add(double);
> void add(Mean);
> double getMean();
>
> (Untested code)
>
> >
> > Finally we should consider whether we really need a separate class for each
> > statistic at all. Do we want to call:
> >
> > Mean.evaluate()
> >
> > or
> >
> > SummaryStats.mean()
> >
> > or maybe
> >
> > Stats.mean() ?
> >
> > The last being nice and compact.
> >
> > Let's make a decision so our esteemed mentee Virendra knows in what
> > direction to take his work this summer. :)
>

I'm not sure I understand the implicit conclusions of this conversation
and the other one there:
    https://markmail.org/message/7dmyhzuy6lublyb5

Do we agree that the core issue is *not* how to compute a mean, or a
median, or a fourth moment, but how any and all of those can be
computed seamlessly through a functional API (stream)?

As Alex pointed out, a useful functionality is the ability to "combine"
instances, e.g. if data are collected by several threads.
A potential use-case is the retrieval of the current value of (any)
statistical quantities while the data continues to be collected.

An initial idea would be:
public interface StatQuantity {
    public double value(double[]); // For "basic" usage.
    public double value(DoubleStream); // For "advanced" usage.
}

public class StatCollection {
    /** Specify which quantities this collection will hold/compute. */
    public StatCollection(Map<String, StatQuantity> stats) { /*... */ }

    /**
     * Start a worker thread.
     * @param data Values for which the stat quantities must be computed.
     */
    public void startCollector(DoubleStream data) { /* ... */ }

    /** Combine current state of workers. */
    public void collect() { /* ... */ }

    /** @return the current (combined) value of a named quantity. */
    public double get(String name) { /* ... */ }

    private StatCollector implements Callable {
        StatCollector(DoubleStream data) { /* ... */ }
    }
}

This is all totally untested, very partial, and probably wrong-headed but
I thought that we were looking at this kind of refactoring.

Regards,
Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to