Re: [All] GSoC 2022
I have posted two ideas for GSoC mini projects under: https://issues.apache.org/jira/browse/STATISTICS-54 https://issues.apache.org/jira/browse/NUMBERS-186 Alex
Re: [All] GSoC 2022
Le ven. 25 févr. 2022 à 04:39, Matt Juntunen a écrit : > > I just added a similar placeholder issue for geometry: Thanks! I've added GEOMETRY-144 to the list. Regards, Gilles > [...] - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [All] GSoC 2022
I just added a similar placeholder issue for geometry: https://issues.apache.org/jira/browse/GEOMETRY-145. I hope those are the kinds of ideas we're going for here. Regards, Matt J On Wed, Feb 23, 2022 at 12:05 PM Gilles Sadowski wrote: > > Ping. > > Nothing for "Geometry", "Statistics", ... (?) > ;-) > > Regards, > Gilles > > Le mer. 9 févr. 2022 à 14:57, Gilles Sadowski a écrit : > > > > Hi. > > > > >>> [...] > > > > > > > > > > Shall we open a "GSoC 2022" report in each concerned JIRA project? > > > > > > > > Yes. I think we just create some tickets and tag them with the > > > > appropriate tag (GSOC 2022 ?). There should be some left over from > > > > last time to repurpose or use as templates for new ones. > > > > > > Actually, I was thinking of creating one global "GSoC 2022" issue > > > in each component, that would list all the topics and a complete > > > description of their respective goal, > > > > Done for "Commons Math": > >https://issues.apache.org/jira/browse/MATH-1641 > > > > Gilles > > - > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [All] GSoC 2022
Ping. Nothing for "Geometry", "Statistics", ... (?) ;-) Regards, Gilles Le mer. 9 févr. 2022 à 14:57, Gilles Sadowski a écrit : > > Hi. > > >>> [...] > > > > > > > > Shall we open a "GSoC 2022" report in each concerned JIRA project? > > > > > > Yes. I think we just create some tickets and tag them with the > > > appropriate tag (GSOC 2022 ?). There should be some left over from > > > last time to repurpose or use as templates for new ones. > > > > Actually, I was thinking of creating one global "GSoC 2022" issue > > in each component, that would list all the topics and a complete > > description of their respective goal, > > Done for "Commons Math": >https://issues.apache.org/jira/browse/MATH-1641 > > Gilles - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [All] GSoC 2022
Hi. >>> [...] > > > > > > Shall we open a "GSoC 2022" report in each concerned JIRA project? > > > > Yes. I think we just create some tickets and tag them with the > > appropriate tag (GSOC 2022 ?). There should be some left over from > > last time to repurpose or use as templates for new ones. > > Actually, I was thinking of creating one global "GSoC 2022" issue > in each component, that would list all the topics and a complete > description of their respective goal, Done for "Commons Math": https://issues.apache.org/jira/browse/MATH-1641 Gilles - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [All] GSoC 2022
Hello. Le mer. 2 févr. 2022 à 10:47, Alex Herbert a écrit : > > On Mon, 31 Jan 2022 at 15:06, Gilles Sadowski wrote: > > > > Hello. > > > > Le jeu. 27 janv. 2022 à 18:09, Alex Herbert a > > écrit : > > > > > > I would be willing to go through GSOC again. > > > > Thanks; I know that back in 2020, it had been a disproportionate > > amount of work... > > > > > I think that the > > > statistics component could again serve as a project. There are some > > > packages in Math that could be moved to make use of the updated > > > distributions (e.g. math.stat.inference) > > > > That would be great, although I seem to notice that there > > might be some dependency issues... > > > > > or perhaps a reworking of the > > > math.stat.descriptive package to support using them with streams. > > > > +1 > > > > > In the last iteration (GSOC 2020) we failed to get enough of a picture > > > of the competence of candidates in the 'bonding phase' before places > > > were formally allocated. I think we should require that a candidate > > > can: > > > > > > - Open a PR on GitHub to add a feature in the topic area. It should be > > > of non-trivial complexity and delivered to a quality ready to merge. > > > > Do you think that the above "stream support" could be that task? > > Yes. A simple class to compute a summary statistic such as: > > public interface Statistic { > void add(R x); > } > public interface DoubleStatistic extends Statistic, > DoubleConsumer, DoubleSupplier { > // Composite interface > } > > public Mean implements DoubleStatistic { > static Mean create(); > // Overrides > public void accept(double x); > public void add(Mean m); > public double getAsDouble(); > } > > Used as: > > DoubleStream s; > double u = s.collect(Mean::create, Mean::accept, Mean::add).getAsDouble(); To simplify the above, would we also provide ---CUT--- public Mean ... { // ... public static double collect(DoubleStream s) { return s.collect(Mean::create, Mean::accept, Mean::add).getAsDouble(); } } ---CUT--- > > The implementation(s) can be updated and expanded later using > different underlying algorithms (simple sum, extended precision sum, > rolling mean) by passing a choice to the create method. > > The project will involve how to move from this simple statistic to > supporting IntStream, LongStream, DoubleStream as appropriate and > allow combining statistics efficiently to obtain a customised summary > statistic, perhaps by enum. > > This is for the StorelessUnivariateStatistic in Commons Math. A more > detailed examination of the existing functionality would be required > and use cases generated for each to understand how these can be > supported in streams. This study could be indeed started in the "bonding" period and will fairly clearly indicate the candidate's potential. > > > > > - Show knowledge of the topic area beyond this single feature, > > > demonstrating ability to continue to significantly contribute through > > > a 3 month period in the subject area. > > > > That seems more fuzzy to define and assess (?). > > I agree; choosing candidates is a fuzzy area. This was meant to > summarise my understanding of how we chose candidates last time. It is > based on their proposal submitted to GSOC but also impressions from > the bonding period. As you noted in your post-GSoc 2020 suggestions, the issue stemmed from not having a concrete way to evaluate the bonding period. This should be solved (for "[Statistics]") with your proposal above. I'd be glad to get help with defining concrete tasks for the ideas below. :-) > > > > Some ideas (for "Commons Math"): > > 1. Redesign and modularization of the "ml" package > > -> main goal: enable multi-thread usage > > 2. Abstracting the linear algebra utilities > > -> main goal: allow (runtime?) switch to alternative implementations > > 3. Redesign and modularization of the "random" package > > -> main goal: general support of low-discrepancy sequences > > 4. Refactoring and modularization of the "special" package > > -> main goal: ensure accuracy and performance and better API, > > add other functions (?). > > > > > Without this set of skills there will be little progress in the formal > > > code period. > > > > :-} > > > > Shall we open a "GSoC 2022" report in each concerned JIRA project? > > Yes. I think we just create some tickets and tag them with the > appropriate tag (GSOC 2022 ?). There should be some left over from > last time to repurpose or use as templates for new ones. Actually, I was thinking of creating one global "GSoC 2022" issue in each component, that would list all the topics and a complete description of their respective goal, and then sub-tasks (or linked issues) for more specific discussions (once the topic is taken on by at least one candidate). I mean that we should separate the JIRA "new feature" report from the report that tracks GSoC activity. That way, we will be to close the GSoC ticket when the time comes, and re
Re: [All] GSoC 2022
On Mon, 31 Jan 2022 at 15:06, Gilles Sadowski wrote: > > Hello. > > Le jeu. 27 janv. 2022 à 18:09, Alex Herbert a > écrit : > > > > I would be willing to go through GSOC again. > > Thanks; I know that back in 2020, it had been a disproportionate > amount of work... > > > I think that the > > statistics component could again serve as a project. There are some > > packages in Math that could be moved to make use of the updated > > distributions (e.g. math.stat.inference) > > That would be great, although I seem to notice that there > might be some dependency issues... > > > or perhaps a reworking of the > > math.stat.descriptive package to support using them with streams. > > +1 > > > In the last iteration (GSOC 2020) we failed to get enough of a picture > > of the competence of candidates in the 'bonding phase' before places > > were formally allocated. I think we should require that a candidate > > can: > > > > - Open a PR on GitHub to add a feature in the topic area. It should be > > of non-trivial complexity and delivered to a quality ready to merge. > > Do you think that the above "stream support" could be that task? Yes. A simple class to compute a summary statistic such as: public interface Statistic { void add(R x); } public interface DoubleStatistic extends Statistic, DoubleConsumer, DoubleSupplier { // Composite interface } public Mean implements DoubleStatistic { static Mean create(); // Overrides public void accept(double x); public void add(Mean m); public double getAsDouble(); } Used as: DoubleStream s; double u = s.collect(Mean::create, Mean::accept, Mean::add).getAsDouble(); The implementation(s) can be updated and expanded later using different underlying algorithms (simple sum, extended precision sum, rolling mean) by passing a choice to the create method. The project will involve how to move from this simple statistic to supporting IntStream, LongStream, DoubleStream as appropriate and allow combining statistics efficiently to obtain a customised summary statistic, perhaps by enum. This is for the StorelessUnivariateStatistic in Commons Math. A more detailed examination of the existing functionality would be required and use cases generated for each to understand how these can be supported in streams. > > > - Show knowledge of the topic area beyond this single feature, > > demonstrating ability to continue to significantly contribute through > > a 3 month period in the subject area. > > That seems more fuzzy to define and assess (?). I agree; choosing candidates is a fuzzy area. This was meant to summarise my understanding of how we chose candidates last time. It is based on their proposal submitted to GSOC but also impressions from the bonding period. > > Some ideas (for "Commons Math"): > 1. Redesign and modularization of the "ml" package > -> main goal: enable multi-thread usage > 2. Abstracting the linear algebra utilities > -> main goal: allow (runtime?) switch to alternative implementations > 3. Redesign and modularization of the "random" package > -> main goal: general support of low-discrepancy sequences > 4. Refactoring and modularization of the "special" package > -> main goal: ensure accuracy and performance and better API, > add other functions (?). > > > Without this set of skills there will be little progress in the formal > > code period. > > :-} > > Shall we open a "GSoC 2022" report in each concerned JIRA project? Yes. I think we just create some tickets and tag them with the appropriate tag (GSOC 2022 ?). There should be some left over from last time to repurpose or use as templates for new ones. Alex - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [All] GSoC 2022
Hello. Le jeu. 27 janv. 2022 à 18:09, Alex Herbert a écrit : > > I would be willing to go through GSOC again. Thanks; I know that back in 2020, it had been a disproportionate amount of work... > I think that the > statistics component could again serve as a project. There are some > packages in Math that could be moved to make use of the updated > distributions (e.g. math.stat.inference) That would be great, although I seem to notice that there might be some dependency issues... > or perhaps a reworking of the > math.stat.descriptive package to support using them with streams. +1 > In the last iteration (GSOC 2020) we failed to get enough of a picture > of the competence of candidates in the 'bonding phase' before places > were formally allocated. I think we should require that a candidate > can: > > - Open a PR on GitHub to add a feature in the topic area. It should be > of non-trivial complexity and delivered to a quality ready to merge. Do you think that the above "stream support" could be that task? > - Show knowledge of the topic area beyond this single feature, > demonstrating ability to continue to significantly contribute through > a 3 month period in the subject area. That seems more fuzzy to define and assess (?). Some ideas (for "Commons Math"): 1. Redesign and modularization of the "ml" package -> main goal: enable multi-thread usage 2. Abstracting the linear algebra utilities -> main goal: allow (runtime?) switch to alternative implementations 3. Redesign and modularization of the "random" package -> main goal: general support of low-discrepancy sequences 4. Refactoring and modularization of the "special" package -> main goal: ensure accuracy and performance and better API, add other functions (?). > Without this set of skills there will be little progress in the formal > code period. :-} Shall we open a "GSoC 2022" report in each concerned JIRA project? Regards, Gilles - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [All] GSoC 2022
I would be willing to go through GSOC again. I think that the statistics component could again serve as a project. There are some packages in Math that could be moved to make use of the updated distributions (e.g. math.stat.inference) or perhaps a reworking of the math.stat.descriptive package to support using them with streams. In the last iteration (GSOC 2020) we failed to get enough of a picture of the competence of candidates in the 'bonding phase' before places were formally allocated. I think we should require that a candidate can: - Open a PR on GitHub to add a feature in the topic area. It should be of non-trivial complexity and delivered to a quality ready to merge. - Show knowledge of the topic area beyond this single feature, demonstrating ability to continue to significantly contribute through a 3 month period in the subject area. Without this set of skills there will be little progress in the formal code period. Alex - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [All] GSoC 2022
I think this would be a great idea. There's even some potential work that can be done related to fuzz testing if we want to expand our OSS-Fuzz coverage. I imagine we have plenty of interesting Jira tickets that could make for GSoC projects, too. On Wed, Jan 26, 2022 at 7:16 AM Gilles Sadowski wrote: > > Hello. > > Do we want to come up with a common (Commons) call for > contributions? > [I.e. identify components/areas/features where help would > be appreciated, who can provide "mentorship" (even with > strictly limited available time), what level of proficiency (with > Java and tooling) is required from candidates and how to > assess it (before they are selected).] > > Thanks, > Gilles > > - > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org