Saw a really awesome shark tank talk today at ApacheCon. Had a conversation after and wanted to follow up.
The Apache MADlib-incubator project is Machine Learning on SQL. (also close to graduation as I understand) The Apache Mahout project is engine neutral roll your own machine learning / statistical algorithms (with a quickly increasing cannon of 'precanned' algorithms). (Both projects have a lot of other cool tricks, but let's table that for now). Based on a one off discussion, it is highly likely that the 'hard part' of writing engine bindings in Mahout, has already been done by MADlib as a course of business. (That is linear algebra like operations on 'matrices' backed by SQL). Mahout also brings some cool things like GPU acceleration to the table. (FYI Mahout GPU, as I understand is CPP at the low level, just to get your wheels turning) (MADlib project, Mahout uses JavaCPP and other Java wrappers for CPP libraries at the very low level for implementing GPU acceleration) There are numerous more benefits I can think of- but that's the high level so everyone on each project gets the jist of it. I think an integration (MADLib based SQL bindings, for lack of better term) is a potentially an easy win that would yield big advantages for both projects, and would like to propose some exploratory collaboration. "Roll your own GPU accelerated statistical algorithms on PostgreSQL and other SQL engines- brought to you by Apache Mahout+ Apache MADlib-incubator" - or Apache MADlib-incubator + Apache Mahout, depending on who is giving the conference talk ;) Encouraging anyone interested to sign up for the appropriate dev list.