Hey all,

The folks from Apache Mahout were interested in leveraging Zeppelin as they
seek visualization options for their project.

I've also seen
https://issues.apache.org/jira/browse/ZEPPELIN-116

Below I have some links to some very basic setup instructions but do have
the basics of the integration done. (see links below copies from email to
Mahout dev)

So there is a next steps discussion on how to integrate I wanted to open up
here.

If you are fairly experienced Zeppelin user, the integration isn't really
too difficult.
- Install Mahout
- Make a new spark 'terp
- Set some env. variables
- Set some terp variables
- Add some dependencies
- Declare some imports in your first paragraph.

Based on the Apache Zeppelin of last November, I'd say post a tutorial and
call it a day.

Now with the advent of Zeppelin Hub, multiple users, non shared notebooks,
etc. etc. I feel like a more 'real' integration makes more sense.  Also, I
think Mahout+Zeppelin is an important integration bc (IMHO) Zeppelin's
target audiance is non-super-technically-advanced data scientists and
Mahout is a really nice Flink/Spark Machine Learning Package.  The
integration to date is only for Spark bc there are a couple of developments
on the Flink side that are holding up integrating Mahout into the Flink
REPL Shell, but it will be coming soon to Flink as well (with a standard
R-like DSL for both Spark and Flink).

Currently I'm just piggy-backing on the Spark shell, but it might make more
sense to have a Mahout interpreter and as a setting point it at either
Spark or Flink and load appropriate bindings from there.

So yeah, wanted to get some community weigh in before I go forward, and I
invite any interested to join dev@mahout and take part in the discussion
there as well.

best,
tg

For those who just can't wait to rock-and-roll with Mahout+Zeppelin:

Links:
https://github.com/rawkintrevo/mahout-zeppelin

Supposing you have a somewhat recent version of Zeppelin 0.6 with sparkr
support running already, you may import the following raw notes directly
into Zeppelin (be sure to follow the instructions on the readme.md at the
link above, the one in the notebook looks similar but is slightly
incomplete):

https://raw.githubusercontent.com/rawkintrevo/mahout-zeppelin/master/%5BMAHOUT%5D%5BPROVING-GROUNDS%5DLinear%20Regression%20in%20Spark.json

https://raw.githubusercontent.com/rawkintrevo/mahout-zeppelin/master/%5BMAHOUT%5D%5BPROVING-GROUNDS%5DSpark-Mahout%2Bggplot2.json



Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*

Reply via email to