What I have now found, doing a bit of background research for this, is that
there is a well-developed pure Java machine learning library called WEKA (
https://www.cs.waikato.ac.nz/~ml/weka/) . It seems to have good
institutional support and be well maintained. LIke I had in mind, the
syntax is pretty intuitive and similar in style to Scikit-Learn. There is a
nice tutorial using it that can be found at
https://tech.io/playgrounds/3771/machine-learning-with-java---part-1-linear-regression
which illustrates this. I don't know what I would want to do differently,
that Weka hasn't already done, other than its targeting of Java 8. So I
think it would probably be re-inventing the wheel to try to get something
similar started here.

I will re-focus my mind on trying to get some momentum for the stats
functions, which is what I had in mind last summer. I do think if healthy
momentum can build for stats functions, there is a natural fit for a fair
amount of machine learning to be incorporated including our own mothballed
clustering and neural net libraries.

Eric




On Mon, Mar 11, 2019 at 5:28 PM Bruno P. Kinoshita <ki...@apache.org> wrote:

>  Sounds like an interesting idea Eric. I wonder if we would get some
> dogfooding through projects like Apache OpenNLP (one that I know uses ML in
> Java).
>
> CheersBruno
>
>     On Tuesday, 12 March 2019, 1:24:24 pm NZDT, Eric Barnhill <
> ericbarnh...@gmail.com> wrote:
>
>  On Sat, Mar 9, 2019 at 4:56 PM Gilles Sadowski <gillese...@gmail.com>
> wrote:
>
> > Hi Eric.
> >
> > Le ven. 8 mars 2019 à 22:22, Eric Barnhill <ericbarnh...@gmail.com> a
> > écrit :
> > >
> > > I am definitely willing to mentor development of the stats libraries
> as I
> > > was last year. Now that I work more in data science I am happy to also
> > > mentor the ML library
> >
> > What are you referring to?
> >
>
> Commons-math had a machine learning library. Now that I look it over it is
> really a bit emaciated. Still, I think there is an opportunity here to get
> some components up to date that could be pretty widely used, rethinking the
> structure and grammar of the library to echo Python's highly successful
> scikit-learn and Keras libraries.
>
> There are a lot of young people who are interested in getting into data
> science, we might get a good candidate or two looking to distinguish
> themselves. Also Java is such an important language in data science and
> engineering, even if a lot of the ML model building to date is in R and
> Python, so it is a great language for someone entering ML to know.
>
>
> > You have to register as a mentor. :-)
> >
>
> Sent.
>
>
> >
> > Then, read and follow the guidelines:
> >  http://community.apache.org/guide-to-being-a-mentor.html
> >
> > What should be done ASAP is tag existing, or new issues,
> > with the appropriate label so that tasks will appear here:
> >    http://s.apache.org/gsoc2019ideas
>
>
> Will do tomorrow, hopefully is not too late.

Reply via email to