Re: [All] GSoC 2022

2022-02-28 Thread Alex Herbert
I have posted two ideas for GSoC mini projects under:

https://issues.apache.org/jira/browse/STATISTICS-54
https://issues.apache.org/jira/browse/NUMBERS-186

Alex


Re: [All] GSoC 2022

2022-02-25 Thread Gilles Sadowski
Le ven. 25 févr. 2022 à 04:39, Matt Juntunen
 a écrit :
>
> I just added a similar placeholder issue for geometry:

Thanks!
I've added GEOMETRY-144 to the list.

Regards,
Gilles

> [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [All] GSoC 2022

2022-02-24 Thread Matt Juntunen
I just added a similar placeholder issue for geometry:
https://issues.apache.org/jira/browse/GEOMETRY-145. I hope those are
the kinds of ideas we're going for here.

Regards,
Matt J

On Wed, Feb 23, 2022 at 12:05 PM Gilles Sadowski  wrote:
>
> Ping.
>
> Nothing for "Geometry", "Statistics", ... (?)
> ;-)
>
> Regards,
> Gilles
>
> Le mer. 9 févr. 2022 à 14:57, Gilles Sadowski  a écrit :
> >
> > Hi.
> >
> > >>> [...]
> > > > >
> > > > > Shall we open a "GSoC 2022" report in each concerned JIRA project?
> > > >
> > > > Yes. I think we just create some tickets and tag them with the
> > > > appropriate tag (GSOC 2022 ?). There should be some left over from
> > > > last time to repurpose or use as templates for new ones.
> > >
> > > Actually, I was thinking of creating one global "GSoC 2022" issue
> > > in each component, that would list all the topics and a complete
> > > description of their respective goal,
> >
> > Done for "Commons Math":
> >https://issues.apache.org/jira/browse/MATH-1641
> >
> > Gilles
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [All] GSoC 2022

2022-02-23 Thread Gilles Sadowski
Ping.

Nothing for "Geometry", "Statistics", ... (?)
;-)

Regards,
Gilles

Le mer. 9 févr. 2022 à 14:57, Gilles Sadowski  a écrit :
>
> Hi.
>
> >>> [...]
> > > >
> > > > Shall we open a "GSoC 2022" report in each concerned JIRA project?
> > >
> > > Yes. I think we just create some tickets and tag them with the
> > > appropriate tag (GSOC 2022 ?). There should be some left over from
> > > last time to repurpose or use as templates for new ones.
> >
> > Actually, I was thinking of creating one global "GSoC 2022" issue
> > in each component, that would list all the topics and a complete
> > description of their respective goal,
>
> Done for "Commons Math":
>https://issues.apache.org/jira/browse/MATH-1641
>
> Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [All] GSoC 2022

2022-02-09 Thread Gilles Sadowski
Hi.

>>> [...]
> > >
> > > Shall we open a "GSoC 2022" report in each concerned JIRA project?
> >
> > Yes. I think we just create some tickets and tag them with the
> > appropriate tag (GSOC 2022 ?). There should be some left over from
> > last time to repurpose or use as templates for new ones.
>
> Actually, I was thinking of creating one global "GSoC 2022" issue
> in each component, that would list all the topics and a complete
> description of their respective goal,

Done for "Commons Math":
   https://issues.apache.org/jira/browse/MATH-1641

Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [All] GSoC 2022

2022-02-02 Thread Gilles Sadowski
Hello.

Le mer. 2 févr. 2022 à 10:47, Alex Herbert  a écrit :
>
> On Mon, 31 Jan 2022 at 15:06, Gilles Sadowski  wrote:
> >
> > Hello.
> >
> > Le jeu. 27 janv. 2022 à 18:09, Alex Herbert  a 
> > écrit :
> > >
> > > I would be willing to go through GSOC again.
> >
> > Thanks; I know that back in 2020, it had been a disproportionate
> > amount of work...
> >
> > > I think that the
> > > statistics component could again serve as a project. There are some
> > > packages in Math that could be moved to make use of the updated
> > > distributions (e.g. math.stat.inference)
> >
> > That would be great, although I seem to notice that there
> > might be some dependency issues...
> >
> > > or perhaps a reworking of the
> > > math.stat.descriptive package to support using them with streams.
> >
> > +1
> >
> > > In the last iteration (GSOC 2020) we failed to get enough of a picture
> > > of the competence of candidates in the 'bonding phase' before places
> > > were formally allocated. I think we should require that a candidate
> > > can:
> > >
> > > - Open a PR on GitHub to add a feature in the topic area. It should be
> > > of non-trivial complexity and delivered to a quality ready to merge.
> >
> > Do you think that the above "stream support" could be that task?
>
> Yes. A simple class to compute a summary statistic such as:
>
> public interface Statistic {
> void add(R x);
> }
> public interface DoubleStatistic extends Statistic,
> DoubleConsumer, DoubleSupplier {
> // Composite interface
> }
>
> public Mean implements DoubleStatistic {
>   static Mean create();
>   // Overrides
>   public void accept(double x);
>   public void add(Mean m);
>   public double getAsDouble();
> }
>
> Used as:
>
> DoubleStream s;
> double u = s.collect(Mean::create, Mean::accept, Mean::add).getAsDouble();

To simplify the above, would we also provide
---CUT---
public Mean ... {
  // ...
  public static double collect(DoubleStream s) {
return s.collect(Mean::create, Mean::accept, Mean::add).getAsDouble();
  }
}
---CUT---

>
> The implementation(s) can be updated and expanded later using
> different underlying algorithms (simple sum, extended precision sum,
> rolling mean) by passing a choice to the create method.
>
> The project will involve how to move from this simple statistic to
> supporting IntStream, LongStream, DoubleStream as appropriate and
> allow combining statistics efficiently to obtain a customised summary
> statistic, perhaps by enum.
>
> This is for the StorelessUnivariateStatistic in Commons Math. A more
> detailed examination of the existing functionality would be required
> and use cases generated for each to understand how these can be
> supported in streams.

This study could be indeed started in the "bonding" period and will
fairly clearly indicate the candidate's potential.

> >
> > > - Show knowledge of the topic area beyond this single feature,
> > > demonstrating ability to continue to significantly contribute through
> > > a 3 month period in the subject area.
> >
> > That seems more fuzzy to define and assess (?).
>
> I agree; choosing candidates is a fuzzy area. This was meant to
> summarise my understanding of how we chose candidates last time. It is
> based on their proposal submitted to GSOC but also impressions from
> the bonding period.

As you noted in your post-GSoc 2020 suggestions, the issue
stemmed from not having a concrete way to evaluate the bonding
period.
This should be solved (for "[Statistics]") with your proposal above.

I'd be glad to get help with defining concrete tasks for the ideas
below. :-)

> >
> > Some ideas (for "Commons Math"):
> > 1. Redesign and modularization of the "ml" package
> >   -> main goal: enable multi-thread usage
> > 2. Abstracting the linear algebra utilities
> >   -> main goal: allow (runtime?) switch to alternative implementations
> > 3. Redesign and modularization of the "random" package
> >   -> main goal: general support of low-discrepancy sequences
> > 4. Refactoring and modularization of the "special" package
> >  -> main goal: ensure accuracy and performance and better API,
> >  add other functions (?).
> >
> > > Without this set of skills there will be little progress in the formal
> > > code period.
> >
> > :-}
> >
> > Shall we open a "GSoC 2022" report in each concerned JIRA project?
>
> Yes. I think we just create some tickets and tag them with the
> appropriate tag (GSOC 2022 ?). There should be some left over from
> last time to repurpose or use as templates for new ones.

Actually, I was thinking of creating one global "GSoC 2022" issue
in each component, that would list all the topics and a complete
description of their respective goal, and then sub-tasks (or linked
issues) for more specific discussions (once the topic is taken on
by at least one candidate).
I mean that we should separate the JIRA "new feature" report from
the report that tracks GSoC activity.  That way, we will be to close
the GSoC ticket when the time comes, and re

Re: [All] GSoC 2022

2022-02-02 Thread Alex Herbert
On Mon, 31 Jan 2022 at 15:06, Gilles Sadowski  wrote:
>
> Hello.
>
> Le jeu. 27 janv. 2022 à 18:09, Alex Herbert  a 
> écrit :
> >
> > I would be willing to go through GSOC again.
>
> Thanks; I know that back in 2020, it had been a disproportionate
> amount of work...
>
> > I think that the
> > statistics component could again serve as a project. There are some
> > packages in Math that could be moved to make use of the updated
> > distributions (e.g. math.stat.inference)
>
> That would be great, although I seem to notice that there
> might be some dependency issues...
>
> > or perhaps a reworking of the
> > math.stat.descriptive package to support using them with streams.
>
> +1
>
> > In the last iteration (GSOC 2020) we failed to get enough of a picture
> > of the competence of candidates in the 'bonding phase' before places
> > were formally allocated. I think we should require that a candidate
> > can:
> >
> > - Open a PR on GitHub to add a feature in the topic area. It should be
> > of non-trivial complexity and delivered to a quality ready to merge.
>
> Do you think that the above "stream support" could be that task?

Yes. A simple class to compute a summary statistic such as:

public interface Statistic {
void add(R x);
}
public interface DoubleStatistic extends Statistic,
DoubleConsumer, DoubleSupplier {
// Composite interface
}

public Mean implements DoubleStatistic {
  static Mean create();
  // Overrides
  public void accept(double x);
  public void add(Mean m);
  public double getAsDouble();
}

Used as:

DoubleStream s;
double u = s.collect(Mean::create, Mean::accept, Mean::add).getAsDouble();

The implementation(s) can be updated and expanded later using
different underlying algorithms (simple sum, extended precision sum,
rolling mean) by passing a choice to the create method.

The project will involve how to move from this simple statistic to
supporting IntStream, LongStream, DoubleStream as appropriate and
allow combining statistics efficiently to obtain a customised summary
statistic, perhaps by enum.

This is for the StorelessUnivariateStatistic in Commons Math. A more
detailed examination of the existing functionality would be required
and use cases generated for each to understand how these can be
supported in streams.

>
> > - Show knowledge of the topic area beyond this single feature,
> > demonstrating ability to continue to significantly contribute through
> > a 3 month period in the subject area.
>
> That seems more fuzzy to define and assess (?).

I agree; choosing candidates is a fuzzy area. This was meant to
summarise my understanding of how we chose candidates last time. It is
based on their proposal submitted to GSOC but also impressions from
the bonding period.

>
> Some ideas (for "Commons Math"):
> 1. Redesign and modularization of the "ml" package
>   -> main goal: enable multi-thread usage
> 2. Abstracting the linear algebra utilities
>   -> main goal: allow (runtime?) switch to alternative implementations
> 3. Redesign and modularization of the "random" package
>   -> main goal: general support of low-discrepancy sequences
> 4. Refactoring and modularization of the "special" package
>  -> main goal: ensure accuracy and performance and better API,
>  add other functions (?).
>
> > Without this set of skills there will be little progress in the formal
> > code period.
>
> :-}
>
> Shall we open a "GSoC 2022" report in each concerned JIRA project?

Yes. I think we just create some tickets and tag them with the
appropriate tag (GSOC 2022 ?). There should be some left over from
last time to repurpose or use as templates for new ones.

Alex

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [All] GSoC 2022

2022-01-31 Thread Gilles Sadowski
Hello.

Le jeu. 27 janv. 2022 à 18:09, Alex Herbert  a écrit :
>
> I would be willing to go through GSOC again.

Thanks; I know that back in 2020, it had been a disproportionate
amount of work...

> I think that the
> statistics component could again serve as a project. There are some
> packages in Math that could be moved to make use of the updated
> distributions (e.g. math.stat.inference)

That would be great, although I seem to notice that there
might be some dependency issues...

> or perhaps a reworking of the
> math.stat.descriptive package to support using them with streams.

+1

> In the last iteration (GSOC 2020) we failed to get enough of a picture
> of the competence of candidates in the 'bonding phase' before places
> were formally allocated. I think we should require that a candidate
> can:
>
> - Open a PR on GitHub to add a feature in the topic area. It should be
> of non-trivial complexity and delivered to a quality ready to merge.

Do you think that the above "stream support" could be that task?

> - Show knowledge of the topic area beyond this single feature,
> demonstrating ability to continue to significantly contribute through
> a 3 month period in the subject area.

That seems more fuzzy to define and assess (?).

Some ideas (for "Commons Math"):
1. Redesign and modularization of the "ml" package
  -> main goal: enable multi-thread usage
2. Abstracting the linear algebra utilities
  -> main goal: allow (runtime?) switch to alternative implementations
3. Redesign and modularization of the "random" package
  -> main goal: general support of low-discrepancy sequences
4. Refactoring and modularization of the "special" package
 -> main goal: ensure accuracy and performance and better API,
 add other functions (?).

> Without this set of skills there will be little progress in the formal
> code period.

:-}

Shall we open a "GSoC 2022" report in each concerned JIRA project?

Regards,
Gilles

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [All] GSoC 2022

2022-01-27 Thread Alex Herbert
I would be willing to go through GSOC again. I think that the
statistics component could again serve as a project. There are some
packages in Math that could be moved to make use of the updated
distributions (e.g. math.stat.inference) or perhaps a reworking of the
math.stat.descriptive package to support using them with streams.

In the last iteration (GSOC 2020) we failed to get enough of a picture
of the competence of candidates in the 'bonding phase' before places
were formally allocated. I think we should require that a candidate
can:

- Open a PR on GitHub to add a feature in the topic area. It should be
of non-trivial complexity and delivered to a quality ready to merge.
- Show knowledge of the topic area beyond this single feature,
demonstrating ability to continue to significantly contribute through
a 3 month period in the subject area.

Without this set of skills there will be little progress in the formal
code period.

Alex

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [All] GSoC 2022

2022-01-26 Thread Matt Sicker
I think this would be a great idea. There's even some potential work
that can be done related to fuzz testing if we want to expand our
OSS-Fuzz coverage. I imagine we have plenty of interesting Jira
tickets that could make for GSoC projects, too.

On Wed, Jan 26, 2022 at 7:16 AM Gilles Sadowski  wrote:
>
> Hello.
>
> Do we want to come up with a common (Commons) call for
> contributions?
> [I.e. identify components/areas/features where help would
> be appreciated, who can provide "mentorship" (even with
> strictly limited available time), what level of proficiency (with
> Java and tooling) is required from candidates and how to
> assess it (before they are selected).]
>
> Thanks,
> Gilles
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org