Hey all, The folks from Apache Mahout were interested in leveraging Zeppelin as they seek visualization options for their project.
I've also seen https://issues.apache.org/jira/browse/ZEPPELIN-116 Below I have some links to some very basic setup instructions but do have the basics of the integration done. (see links below copies from email to Mahout dev) So there is a next steps discussion on how to integrate I wanted to open up here. If you are fairly experienced Zeppelin user, the integration isn't really too difficult. - Install Mahout - Make a new spark 'terp - Set some env. variables - Set some terp variables - Add some dependencies - Declare some imports in your first paragraph. Based on the Apache Zeppelin of last November, I'd say post a tutorial and call it a day. Now with the advent of Zeppelin Hub, multiple users, non shared notebooks, etc. etc. I feel like a more 'real' integration makes more sense. Also, I think Mahout+Zeppelin is an important integration bc (IMHO) Zeppelin's target audiance is non-super-technically-advanced data scientists and Mahout is a really nice Flink/Spark Machine Learning Package. The integration to date is only for Spark bc there are a couple of developments on the Flink side that are holding up integrating Mahout into the Flink REPL Shell, but it will be coming soon to Flink as well (with a standard R-like DSL for both Spark and Flink). Currently I'm just piggy-backing on the Spark shell, but it might make more sense to have a Mahout interpreter and as a setting point it at either Spark or Flink and load appropriate bindings from there. So yeah, wanted to get some community weigh in before I go forward, and I invite any interested to join dev@mahout and take part in the discussion there as well. best, tg For those who just can't wait to rock-and-roll with Mahout+Zeppelin: Links: https://github.com/rawkintrevo/mahout-zeppelin Supposing you have a somewhat recent version of Zeppelin 0.6 with sparkr support running already, you may import the following raw notes directly into Zeppelin (be sure to follow the instructions on the readme.md at the link above, the one in the notebook looks similar but is slightly incomplete): https://raw.githubusercontent.com/rawkintrevo/mahout-zeppelin/master/%5BMAHOUT%5D%5BPROVING-GROUNDS%5DLinear%20Regression%20in%20Spark.json https://raw.githubusercontent.com/rawkintrevo/mahout-zeppelin/master/%5BMAHOUT%5D%5BPROVING-GROUNDS%5DSpark-Mahout%2Bggplot2.json Trevor Grant Data Scientist https://github.com/rawkintrevo http://stackexchange.com/users/3002022/rawkintrevo http://trevorgrant.org *"Fortunate is he, who is able to know the causes of things." -Virgil*