On Mon, 31 Jan 2022 at 15:06, Gilles Sadowski <gillese...@gmail.com> wrote:
>
> Hello.
>
> Le jeu. 27 janv. 2022 à 18:09, Alex Herbert <alex.d.herb...@gmail.com> a 
> écrit :
> >
> > I would be willing to go through GSOC again.
>
> Thanks; I know that back in 2020, it had been a disproportionate
> amount of work...
>
> > I think that the
> > statistics component could again serve as a project. There are some
> > packages in Math that could be moved to make use of the updated
> > distributions (e.g. math.stat.inference)
>
> That would be great, although I seem to notice that there
> might be some dependency issues...
>
> > or perhaps a reworking of the
> > math.stat.descriptive package to support using them with streams.
>
> +1
>
> > In the last iteration (GSOC 2020) we failed to get enough of a picture
> > of the competence of candidates in the 'bonding phase' before places
> > were formally allocated. I think we should require that a candidate
> > can:
> >
> > - Open a PR on GitHub to add a feature in the topic area. It should be
> > of non-trivial complexity and delivered to a quality ready to merge.
>
> Do you think that the above "stream support" could be that task?

Yes. A simple class to compute a summary statistic such as:

public interface Statistic<R> {
    void add(R x);
}
public interface DoubleStatistic<R> extends Statistic<R>,
DoubleConsumer, DoubleSupplier {
    // Composite interface
}

public Mean implements DoubleStatistic<Mean> {
  static Mean create();
  // Overrides
  public void accept(double x);
  public void add(Mean m);
  public double getAsDouble();
}

Used as:

DoubleStream s;
double u = s.collect(Mean::create, Mean::accept, Mean::add).getAsDouble();

The implementation(s) can be updated and expanded later using
different underlying algorithms (simple sum, extended precision sum,
rolling mean) by passing a choice to the create method.

The project will involve how to move from this simple statistic to
supporting IntStream, LongStream, DoubleStream as appropriate and
allow combining statistics efficiently to obtain a customised summary
statistic, perhaps by enum.

This is for the StorelessUnivariateStatistic in Commons Math. A more
detailed examination of the existing functionality would be required
and use cases generated for each to understand how these can be
supported in streams.

>
> > - Show knowledge of the topic area beyond this single feature,
> > demonstrating ability to continue to significantly contribute through
> > a 3 month period in the subject area.
>
> That seems more fuzzy to define and assess (?).

I agree; choosing candidates is a fuzzy area. This was meant to
summarise my understanding of how we chose candidates last time. It is
based on their proposal submitted to GSOC but also impressions from
the bonding period.

>
> Some ideas (for "Commons Math"):
> 1. Redesign and modularization of the "ml" package
>   -> main goal: enable multi-thread usage
> 2. Abstracting the linear algebra utilities
>   -> main goal: allow (runtime?) switch to alternative implementations
> 3. Redesign and modularization of the "random" package
>   -> main goal: general support of low-discrepancy sequences
> 4. Refactoring and modularization of the "special" package
>  -> main goal: ensure accuracy and performance and better API,
>      add other functions (?).
>
> > Without this set of skills there will be little progress in the formal
> > code period.
>
> :-}
>
> Shall we open a "GSoC 2022" report in each concerned JIRA project?

Yes. I think we just create some tickets and tag them with the
appropriate tag (GSOC 2022 ?). There should be some left over from
last time to repurpose or use as templates for new ones.

Alex

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to