What I have now found, doing a bit of background research for this, is that there is a well-developed pure Java machine learning library called WEKA ( https://www.cs.waikato.ac.nz/~ml/weka/) . It seems to have good institutional support and be well maintained. LIke I had in mind, the syntax is pretty intuitive and similar in style to Scikit-Learn. There is a nice tutorial using it that can be found at https://tech.io/playgrounds/3771/machine-learning-with-java---part-1-linear-regression which illustrates this. I don't know what I would want to do differently, that Weka hasn't already done, other than its targeting of Java 8. So I think it would probably be re-inventing the wheel to try to get something similar started here.
I will re-focus my mind on trying to get some momentum for the stats functions, which is what I had in mind last summer. I do think if healthy momentum can build for stats functions, there is a natural fit for a fair amount of machine learning to be incorporated including our own mothballed clustering and neural net libraries. Eric On Mon, Mar 11, 2019 at 5:28 PM Bruno P. Kinoshita <ki...@apache.org> wrote: > Sounds like an interesting idea Eric. I wonder if we would get some > dogfooding through projects like Apache OpenNLP (one that I know uses ML in > Java). > > CheersBruno > > On Tuesday, 12 March 2019, 1:24:24 pm NZDT, Eric Barnhill < > ericbarnh...@gmail.com> wrote: > > On Sat, Mar 9, 2019 at 4:56 PM Gilles Sadowski <gillese...@gmail.com> > wrote: > > > Hi Eric. > > > > Le ven. 8 mars 2019 à 22:22, Eric Barnhill <ericbarnh...@gmail.com> a > > écrit : > > > > > > I am definitely willing to mentor development of the stats libraries > as I > > > was last year. Now that I work more in data science I am happy to also > > > mentor the ML library > > > > What are you referring to? > > > > Commons-math had a machine learning library. Now that I look it over it is > really a bit emaciated. Still, I think there is an opportunity here to get > some components up to date that could be pretty widely used, rethinking the > structure and grammar of the library to echo Python's highly successful > scikit-learn and Keras libraries. > > There are a lot of young people who are interested in getting into data > science, we might get a good candidate or two looking to distinguish > themselves. Also Java is such an important language in data science and > engineering, even if a lot of the ML model building to date is in R and > Python, so it is a great language for someone entering ML to know. > > > > You have to register as a mentor. :-) > > > > Sent. > > > > > > Then, read and follow the guidelines: > > http://community.apache.org/guide-to-being-a-mentor.html > > > > What should be done ASAP is tag existing, or new issues, > > with the appropriate label so that tasks will appear here: > > http://s.apache.org/gsoc2019ideas > > > Will do tomorrow, hopefully is not too late.