Awesome!!

I don't think a PhD in math/ML is required at all for this little venture.
Mainly just a knowledge of basic BLAS operations (Matrix A %*% Matrix B,
Matrix A %*% Vector, etc.)

The keys to success here are going to be:
- Making CPP/Scala/SQL all talk to each other (no big deal... lol).
- Being able to work with respective communities and tap their knowledge.

To create bindings in Mahout see:
https://github.com/apache/mahout/tree/master/flink/src/main/scala/org/apache/mahout/flinkbindings/blas
https://github.com/apache/mahout/tree/master/spark/src/main/scala/org/apache/mahout/sparkbindings/blas

https://github.com/apache/mahout/tree/master/h2o/src/main/java/org/apache/mahout/h2obindings/ops

Those types of operations need to be implemented.

As I dig around on MADlib a little more I find this:
http://madlib.incubator.apache.org/docs/latest/group__grp__matrix.html

Again, I'm just sticking my toes in the water- but it appears that most of
the 'hard' stuff is done, just need a wrapper.  There will either need to
be a way to serialize a Mahout Vector to look like a MADlib vector (easy
way) or MADlib will need to implement Mahout Vectors ( much more convoluted
but adds GPU acceleration to MADlib).  Also need to figure out the MADlib
equivalent of a MapBlock like operation. ( Apply anonymous function to each
row ).

Having never worked with MADlib nor written my own bindings in Mahout-
everyone is encouraged to chime in and sharp shoot my naivety in thinking
this isn't going to be too painful :)


Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Sun, May 21, 2017 at 6:08 PM, Jim Jagielski <j...@jagunet.com> wrote:

> ME ME. I *really* want to get more involved in both, but as
> a serious interested volunteer (this would be all on my copious
> amounts of free time)! This area intrigues me and would love
> to be able to hack on it, but am by no means a PhD in ML.
>
> > On May 19, 2017, at 2:05 AM, Trevor Grant <trevor.d.gr...@gmail.com>
> wrote:
> >
> > Saw a really awesome shark tank talk today at ApacheCon.
> >
> > Had a conversation after and wanted to follow up.
> >
> > The Apache MADlib-incubator project is Machine Learning on SQL. (also
> close
> > to graduation as I understand)
> >
> > The Apache Mahout project is engine neutral roll your own machine
> learning
> > / statistical algorithms (with a quickly increasing cannon of 'precanned'
> > algorithms).
> >
> > (Both projects have a lot of other cool tricks, but let's table that for
> > now).
> >
> > Based on a one off discussion, it is highly likely that the 'hard part'
> of
> > writing engine bindings in Mahout, has already been done by MADlib as a
> > course of business. (That is linear algebra like operations on 'matrices'
> > backed by SQL).
> >
> > Mahout also brings some cool things like GPU acceleration to the table.
> > (FYI Mahout GPU, as I understand is CPP at the low level, just to get
> your
> > wheels turning) (MADlib project, Mahout uses JavaCPP and other Java
> > wrappers for CPP libraries at the very low level for implementing GPU
> > acceleration)
> >
> > There are numerous more benefits I can think of- but that's the high
> level
> > so everyone on each project gets the jist of it.
> >
> > I think an integration (MADLib based SQL bindings, for lack of better
> term)
> > is a potentially an easy win that would yield big advantages for both
> > projects, and would like to propose some exploratory collaboration.
> >
> > "Roll your own GPU accelerated statistical algorithms on PostgreSQL and
> > other SQL engines- brought to you by Apache Mahout+ Apache
> > MADlib-incubator" - or Apache MADlib-incubator + Apache Mahout, depending
> > on who is giving the conference talk ;)
> >
> > Encouraging anyone interested to sign up for the appropriate dev list.
>
>

Reply via email to