Awesome!! I don't think a PhD in math/ML is required at all for this little venture. Mainly just a knowledge of basic BLAS operations (Matrix A %*% Matrix B, Matrix A %*% Vector, etc.)
The keys to success here are going to be: - Making CPP/Scala/SQL all talk to each other (no big deal... lol). - Being able to work with respective communities and tap their knowledge. To create bindings in Mahout see: https://github.com/apache/mahout/tree/master/flink/src/main/scala/org/apache/mahout/flinkbindings/blas https://github.com/apache/mahout/tree/master/spark/src/main/scala/org/apache/mahout/sparkbindings/blas https://github.com/apache/mahout/tree/master/h2o/src/main/java/org/apache/mahout/h2obindings/ops Those types of operations need to be implemented. As I dig around on MADlib a little more I find this: http://madlib.incubator.apache.org/docs/latest/group__grp__matrix.html Again, I'm just sticking my toes in the water- but it appears that most of the 'hard' stuff is done, just need a wrapper. There will either need to be a way to serialize a Mahout Vector to look like a MADlib vector (easy way) or MADlib will need to implement Mahout Vectors ( much more convoluted but adds GPU acceleration to MADlib). Also need to figure out the MADlib equivalent of a MapBlock like operation. ( Apply anonymous function to each row ). Having never worked with MADlib nor written my own bindings in Mahout- everyone is encouraged to chime in and sharp shoot my naivety in thinking this isn't going to be too painful :) Trevor Grant Data Scientist https://github.com/rawkintrevo http://stackexchange.com/users/3002022/rawkintrevo http://trevorgrant.org *"Fortunate is he, who is able to know the causes of things." -Virgil* On Sun, May 21, 2017 at 6:08 PM, Jim Jagielski <j...@jagunet.com> wrote: > ME ME. I *really* want to get more involved in both, but as > a serious interested volunteer (this would be all on my copious > amounts of free time)! This area intrigues me and would love > to be able to hack on it, but am by no means a PhD in ML. > > > On May 19, 2017, at 2:05 AM, Trevor Grant <trevor.d.gr...@gmail.com> > wrote: > > > > Saw a really awesome shark tank talk today at ApacheCon. > > > > Had a conversation after and wanted to follow up. > > > > The Apache MADlib-incubator project is Machine Learning on SQL. (also > close > > to graduation as I understand) > > > > The Apache Mahout project is engine neutral roll your own machine > learning > > / statistical algorithms (with a quickly increasing cannon of 'precanned' > > algorithms). > > > > (Both projects have a lot of other cool tricks, but let's table that for > > now). > > > > Based on a one off discussion, it is highly likely that the 'hard part' > of > > writing engine bindings in Mahout, has already been done by MADlib as a > > course of business. (That is linear algebra like operations on 'matrices' > > backed by SQL). > > > > Mahout also brings some cool things like GPU acceleration to the table. > > (FYI Mahout GPU, as I understand is CPP at the low level, just to get > your > > wheels turning) (MADlib project, Mahout uses JavaCPP and other Java > > wrappers for CPP libraries at the very low level for implementing GPU > > acceleration) > > > > There are numerous more benefits I can think of- but that's the high > level > > so everyone on each project gets the jist of it. > > > > I think an integration (MADLib based SQL bindings, for lack of better > term) > > is a potentially an easy win that would yield big advantages for both > > projects, and would like to propose some exploratory collaboration. > > > > "Roll your own GPU accelerated statistical algorithms on PostgreSQL and > > other SQL engines- brought to you by Apache Mahout+ Apache > > MADlib-incubator" - or Apache MADlib-incubator + Apache Mahout, depending > > on who is giving the conference talk ;) > > > > Encouraging anyone interested to sign up for the appropriate dev list. > >