Agreed, thoroughly enjoying the blog post.

On 5/19/16 12:01 AM, Andrew Palumbo wrote:
Well done, Trevor!  I've not yet had a chance to try this in zeppelin but I 
just read the blog which is great!

-------- Original message --------
From: Trevor Grant <trevor.d.gr...@gmail.com>
Date: 05/18/2016 2:44 PM (GMT-05:00)
To: dev@mahout.apache.org
Subject: Re: Future Mahout - Zeppelin work

Ah thank you.

Fixing now.


Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Wed, May 18, 2016 at 1:04 PM, Andrew Palumbo <ap....@outlook.com> wrote:

Hey Trevor- Just refreshed your readme.  The jar that I mentioned is
actually:


/home/username/.m2/repository/org/apache/mahout/mahout-spark_2.10/0.12.1-SNAPSHOT/mahout-spark_2.10-0.12.1-SNAPSHOT-dependency-reduced.jar

rather than:


/home/username/.m2/repository/org/apache/mahout/mahout-spark-shell_2.10/0.12.1-SNAPSHOT/mahout-spark_2.10-0.12.1-SNAPSHOT-dependency-reduced.jar

(In the spark module that is)
________________________________________
From: Trevor Grant <trevor.d.gr...@gmail.com>
Sent: Wednesday, May 18, 2016 11:02:43 AM
To: dev@mahout.apache.org
Subject: Re: Future Mahout - Zeppelin work

ah yes- I remember you pointing that out to me too.

I got side tracked yesterday for most of the day on an adventure in getting
Zeppelin to work right after I accidently updated to the new snapshot (free
hint: the secret was to clear my cache *face-palm*)

I'm going to add that dependency to the readme.md now.

thanks,
tg

Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Wed, May 18, 2016 at 9:59 AM, Andrew Palumbo <ap....@outlook.com>
wrote:

Trevor this is very cool- I have not been able to look at it closely yet
but just a small point: I believe that you'll also need to add the

mahout-spark_2.10-0.12.1-SNAPSHOT-dependency-reduced.jar

For things like the classification stats, confusion matrix, and t-digest.

Andy

________________________________________
From: Trevor Grant <trevor.d.gr...@gmail.com>
Sent: Wednesday, May 18, 2016 10:47:21 AM
To: dev@mahout.apache.org
Subject: Re: Future Mahout - Zeppelin work

I still need to update my readme/env per Pat's comments below, however
with
out further ado, I present two notebooks that integrate Mahout + Spark +
Zeppelin + ggplot2

https://github.com/rawkintrevo/mahout-zeppelin

Supposing you have a somewhat recent version of Zeppelin 0.6 with sparkr
support running already, you may import the following raw notes directly
into Zeppelin:



https://raw.githubusercontent.com/rawkintrevo/mahout-zeppelin/master/%5BMAHOUT%5D%5BPROVING-GROUNDS%5DLinear%20Regression%20in%20Spark.json


https://raw.githubusercontent.com/rawkintrevo/mahout-zeppelin/master/%5BMAHOUT%5D%5BPROVING-GROUNDS%5DSpark-Mahout%2Bggplot2.json
So my thoughs on next steps, which I'm positing only as a starting point
for discussion, and are in no particular order of importance:

- Blog on HOWTO for everyman (assumes no familiarity with Mahout, and
only
enough familiarity with Zeppelin to have Zeppelin + SparkR support)
- Some syntactic sugar somewhere in Mahout to convert a matrix into a tsv
string. (with some sanity, eg a sample of a matrix)
- Figure out with Zeppelin community what deeper integration feels like -
e.g. build-profile vs. tutorial
   - I think the case for making a build-profile is that Zeppelin is first
and foremost a datascience tool for non technical users.
   - If we go that route I'll need some more support finding out what is
the
absolute minimum 'bare-bones' mahout we can include, e.g. does the user
have to have mahout installed? To be discussed.
- Add matplotlib (python) "support" -> paragraph showing how to do the
same
thing in Python.

The basic deal here is we are:
1) Setting up a standard Zeppelin Spark Interpretter to act like a Mahout
interpretter
     - This is taken care of by setting some env. variables, adding some
dependencies, and importing relevent packages
2) do mahout things as you do
3) export table to tsv string, which is passed to a resource pool
    - This could be done to a disk if you didn't have zeppelin
4) read the tsv from the resource pool (or disk if you didn't have
zeppelin) in R (python soon) and create a <plot package of your choice>

To Pat's point- this is a kind of clumsy pipeline, however the Zeppelin
wrapper at least makes it *feel* less so.


Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Tue, May 17, 2016 at 1:17 PM, Pat Ferrel <p...@occamsmachete.com>
wrote:
Seems like there is plenty to use in ggplot or python but the pipeline
is
a little convoluted (so maybe no need for Angular integration). To get
graphics out of Mahout it would be nice to not require knowledge of R
and/or python. Knowing Mahout is already bad enough but I guess the API
from the Mahout side for plotting could be Scala syntactic sugar. What
and
how this all is installed and setup is the next question.

BTW this is what I use elsewhere (Mahout as a lib to this code)

     "spark.serializer": "org.apache.spark.serializer.KryoSerializer",
     "spark.kryo.registrator":
"org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator",
     "spark.kryo.referenceTracking": "false",
     "spark.kryoserializer.buffer": "300m”,

afaik you will only see if Kryo is working when you have to serialize a
mahout specific data type like vector of drm, something registered with
Kryo.


On May 16, 2016, at 6:18 PM, Trevor Grant <trevor.d.gr...@gmail.com>
wrote:

As a quick recap- we're trying to leverage Zeppelin for charting.

It seems as though this can be achieved by
- Adding properties to the Spark Interpreter
- Adding dependency jars to the spark interpreter
- importing in a spark paragraph

All seems to be working well, but I've fooled myself into thinking
things
were 'working' before because I wasn't actually integrating. Lower I
will
outline the imports/properties, please look over and tell me if I'm
theoretically missing anything.

The next phase for me will be
1) Convert a matrix to some sort of serializable object that I can
easily
unpack from R
2) use Zeppelin's resource buffers to pass the object
3) collect the object in an R paragraph, convert it to a dataframe then
map
using ggplot

Once I have a working prototype I will work add some syntactic sugar to
prepare the matrix from the scala side and pass to zeppelin (using
resource
pools so the same functionality can be reused in Flink) and an R
library
containing some functions which will pull the data out of the resource
pool
and spit out a dataframe.

Once its in a Dataframe in R- go nuts with any plotting package you
like.
Likewise, it should be possible to do the same thing with matplotlib
and
python (https://gist.github.com/andershammar/9070e0f6916a0fbda7a5)

All of this doesn't necessarily require any changing of the Zeppelin
source
code, and isn't very intrusive or difficult to set up, I'll make a blog
post but its almost a text book entry tutorial on using imports in
Zeppelin. (e.g. a tutorial would be just as at home on the Zeppelin
site
as
it would on the Mahout site).

Now, there has been some talk of using Zeppelin's angularJS.  Things
get
a
little more harry in that case, but we could make an optional build
profile
that would make zeppelin recognize matrices at tables and expose all of
the
built in charting features of Zeppelin.

If you're not adding a bunch of custom charts to Zeppelin (which would
be
somewhat tedious), you're going to end up with a lot of examples where
you
create a table in Mahout/Spark pass it to AngularJS then some AngularJS
code charts it for you.  At that point however, you're doing just as
much
work, if not more than it would be to simply pass to R or Python and
let
ggplot or matlibplot do the work for you.

Finally, I haven't run into any errors yet using Kyro (which in part is
what makes me fear I'm not doing this right... it was too easy...) If
anything seems redundant or missing, please call it out.

Add Properties to Spark interp:

spark.kryo.registrator
org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator
spark.serializer org.apache.spark.serializer.KryoSerializer

Add artifacts (need to change these to maven not local, also need to
add/change one jar per below, however this does run):



/home/trevor/.m2/repository/org/apache/mahout/mahout-math/0.12.1-SNAPSHOT/mahout-math-0.12.1-SNAPSHOT.jar

/home/trevor/.m2/repository/org/apache/mahout/mahout-math-scala_2.10/0.12.1-SNAPSHOT/mahout-math-scala_2.10-0.12.1-SNAPSHOT.jar

/home/trevor/.m2/repository/org/apache/mahout/mahout-spark_2.10/0.12.1-SNAPSHOT/mahout-spark_2.10-0.12.1-SNAPSHOT.jar

/home/trevor/.m2/repository/org/apache/mahout/mahout-spark-shell_2.10/0.12.1-SNAPSHOT/mahout-spark-shell_2.10-0.12.1-SNAPSHOT.jar
Add following code to first paragraph of notebook:
```
%spark
import org.apache.mahout.math._
import org.apache.mahout.math.scalabindings._
import org.apache.mahout.math.drm._
import org.apache.mahout.math.scalabindings.RLikeOps._
import org.apache.mahout.math.drm.RLikeDrmOps._
import org.apache.mahout.sparkbindings._

implicit val sdc:
org.apache.mahout.sparkbindings.SparkDistributedContext =
sc2sdc(sc)
```



Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Mon, May 16, 2016 at 6:42 PM, Pat Ferrel <p...@occamsmachete.com>
wrote:
Creating an mc used to do some Kryo setup, like registering
serializers
or
serializer factories IIRC. Also there is the Spark conf for
allocating
memory for the Kryo buffer. Look at the code in the mc creation code
in
the
Spark package helpers. All can be done in straight Spark and passed
in
to
create the mc when needed. Again from old weak brain cells but I
think
that
is part of what makes the Mahout shell different than teh Spark shell
plus
imports, it auto-creates the mc instead of or along with an sc.

When I get back to my computer I can check.

On May 16, 2016, at 3:40 PM, Andrew Palumbo <ap....@outlook.com>
wrote:
Trevor,

Could you post any kryo errors that you may be having?

________________________________
From: Andrew Palumbo <ap....@outlook.com>
Sent: Monday, May 16, 2016 6:25:07 PM
To: mahout
Subject: Future Mahout - Zeppelin work




To Dmitriy's point, I agree ggplot is def the priority,  The mahout
plots
are at this point are really just a POC, but at some point we may be
want
to integrate some data transformation features into the mahout plots
classes so they're really more future work.


long story short:


OK. I'll read through the examples and try to do something with some
data, then do a ggplot and/or an angular plot on it (probably
ggplot).
I'll do a quick tutorial. Then I'll reopen discussion on that
Zeppelin
issue about weather we want to go ahead and add another interpreter.


Souds Great.


Thank you.

________________________________
From: Trevor Grant <trevor.d.gr...@gmail.com>
Sent: Monday, May 16, 2016 5:49:17 PM
To: Dmitriy Lyubimov
Cc: Andrew Palumbo; Pat Ferrel; Suneel Marthi
Subject: Re: Intro - Future Mahout - Zeppelin work

I just signed up for dev, should i just reply all and cc dev or
start a
new thread?

Trevor Grant
Data Scientist
https://github.com/rawkintrevo
[https://avatars3.githubusercontent.com/u/5852441?v=3&s=400]<
https://github.com/rawkintrevo>

rawkintrevo (Trevor Grant) · GitHub<https://github.com/rawkintrevo>
github.com
rawkintrevo has 12 repositories written in Python, Batchfile, and R.
Follow their code on GitHub.


http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

"Fortunate is he, who is able to know the causes of things."  -Virgil


On Mon, May 16, 2016 at 4:46 PM, Dmitriy Lyubimov <dlie...@gmail.com
<mailto:dlie...@gmail.com>> wrote:
fwiw ggplot2 is pretty darn advanced:) i am a bit skeptical smile
would
have something that ggplot2 would not, the other way around is much
more
expected by me:)

anyhow if ggplot2 and matplotlib are available in Zeppelin without
major
limitations, it sounds like Zeppelin should be an all around very
nice
venue then.

On Mon, May 16, 2016 at 2:42 PM, Andrew Palumbo <ap....@outlook.com
<mailto:ap....@outlook.com>> wrote:

yeah we should probably move this over to dev@


sorry- answering a question from a couple emails back on the thread.


If possible,  I think it would be great to eventually have both
(native
mahout/smile plots and ggplot), since in the future we're going to be
adding more visualization features rather than simple scatter plots
etc
that may not be covered by ggplot.


That's why we were thinking about using angular and the pngs.


But what youre saying in your last email would be great!


Thank you!


________________________________
From: Trevor Grant <trevor.d.gr...@gmail.com<mailto:
trevor.d.gr...@gmail.com>>
Sent: Monday, May 16, 2016 5:33:12 PM
To: Andrew Palumbo
Cc: Pat Ferrel; Suneel Marthi; Dmitriy Lyubimov

Subject: Re: Intro - Future Mahout - Zeppelin work

I somehow replied to your last email without seeing it...

OK. I'll read through the examples and try to do something with some
data,
then do a ggplot and/or an angular plot on it (probably ggplot).

I'll do a quick tutorial. Then I'll reopen discussion on that
Zeppelin
issue about weather we want to go ahead and add another interpreter.

Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

"Fortunate is he, who is able to know the causes of things."  -Virgil


On Mon, May 16, 2016 at 4:26 PM, Trevor Grant <
trevor.d.gr...@gmail.com
<mailto:trevor.d.gr...@gmail.com>> wrote:
sorry for double email but are you thinking visualization should be a
library internal to mahout or should we leverage zeppelins
visualization
capabilities?

Also, should we move this discussion to dev?

tg


Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

"Fortunate is he, who is able to know the causes of things."  -Virgil


On Mon, May 16, 2016 at 4:14 PM, Andrew Palumbo <ap....@outlook.com
<mailto:ap....@outlook.com>> wrote:

Sorry- to be a little more clear,  Part of what we're trying to is to
get
the new plotting features integrated with Zeppelin. We plan on adding
more
advanced plotting.


________________________________
From: Andrew Palumbo <ap....@outlook.com<mailto:ap....@outlook.com>>
Sent: Monday, May 16, 2016 5:04:49 PM
To: Pat Ferrel; Trevor Grant
Cc: Suneel Marthi; Dmitriy Lyubimov
Subject: Re: Intro - Future Mahout - Zeppelin work


Awesome!


most of the hard work was done by Dmitriy[??] , I've just reworked
it a
couple of times to keep up with spark's refactoring.


I think that you will also need to include:


   mahout-spark_2.10-0.12.1-SNAPSHOT-dependency-reduced.jar


For the new plotting features that we're working on.


the plotting is still a work in progress, and the grid and surface
plots
are not working properly.  The plots are swing based and can
currently
be
exported as  PNGs.  There are a few examples on the closed PR:
https://github.com/apache/mahout/pull/230


There is an example script in examples/bin/spark-shell-plot.mscala
(commited to master) :

https://github.com/apache/mahout/blob/master/examples/bin/spark-shell-plot.mscala

Thanks!



________________________________
From: Pat Ferrel <p...@occamsmachete.com<mailto:p...@occamsmachete.com
Sent: Monday, May 16, 2016 4:54:15 PM
To: Trevor Grant
Cc: Andrew Palumbo; Suneel Marthi; Dmitriy Lyubimov
Subject: Re: Intro - Future Mahout - Zeppelin work

This is only the beginning. Andy has been using Smile as a
visualization
lib since it is pretty rich in ML support. We are looking at
integrating
some of that with Zeppelin then adding code to feed the new
visualizations
in Mahout. I’m here because I’m fairly familiar with AngularJS if
that’s
the way to go. Smile is swing based but can output pngs, maybe other
image
formats—Andy?

BTW Dmitriy is still very involved but has rouble getting permission
to
donate code.


On May 16, 2016, at 1:45 PM, Trevor Grant <trevor.d.gr...@gmail.com
<mailto:trevor.d.gr...@gmail.com>> wrote:

Hey Andrew,

thanks- you basically did all of the hard work for me!

I've got the linear regression example working from:
http://mahout.apache.org/users/sparkbindings/play-with-shell.html

my java is sketchy at best, i tend to over import. I pulled in the
following jars:


org/apache/mahout/mahout-math/0.12.1-SNAPSHOT/mahout-math-0.12.1-SNAPSHOT.jar

org/apache/mahout/mahout-math-scala_2.10/0.12.1-SNAPSHOT/mahout-math-scala_2.10-0.12.1-SNAPSHOT.jar

org/apache/mahout/mahout-spark_2.10/0.12.1-SNAPSHOT/mahout-spark_2.10-0.12.1-SNAPSHOT.jar

org/apache/mahout/mahout-spark-shell_2.10/0.12.1-SNAPSHOT/mahout-spark-shell_2.10-0.12.1-SNAPSHOT.jar
I think those are all necessary...  should I be pulling in more?

I hate to say it (but will do so bc this isn't public) this
integration
is
super easy from a user perspective, almost too easy- eg why not let
the
user add it themselves...  Add the appropriate maven artifacts,
restart
the
interpreter and run the following in a notebook:
```
import org.apache.mahout.math._
import org.apache.mahout.math.scalabindings._
import org.apache.mahout.math.drm._
import org.apache.mahout.math.scalabindings.RLikeOps._
import org.apache.mahout.math.drm.RLikeDrmOps._
import org.apache.mahout.sparkbindings._

implicit val sdc:
org.apache.mahout.sparkbindings.SparkDistributedContext
= sc2sdc(sc)
```
Then whatever code you want and you're off to the races...

that said, adding a build profile like -PsparkMahout and creating an
interpretter like %spark.mahout should be fairly straight forward.

Second question, do you have an example that would be more
'visualization
friendly'? I could pass the results to Angular or R just to show off
how
to
do it.

Which leads back to the question, is this even worth building a full
interpreter for or just make a really nice blog post with examples on
how
to integrate with R...?








Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org<http://trevorgrant.org/>

"Fortunate is he, who is able to know the causes of things."  -Virgil


On Mon, May 16, 2016 at 2:09 PM, Andrew Palumbo <ap....@outlook.com
<mailto:ap....@outlook.com>> wrote:
Hi Trevor, welcome!

It's great to have you helping out, thanks very much.  I've done a
good
amount of work on our mahout spark shell .. so let me know if you
have
any
questions there about what we did there..

Thanks alot!

Andy


-------- Original message --------
From: Suneel Marthi <smar...@apache.org<mailto:smar...@apache.org>>
Date: 05/16/2016 2:44 PM (GMT-05:00)
To: Trevor Grant <trevor.d.gr...@gmail.com<mailto:
trevor.d.gr...@gmail.com
Cc: Suneel Marthi <smar...@apache.org<mailto:smar...@apache.org>>,
Pat
Ferrel <p...@occamsmachete.com<mailto:p...@occamsmachete.com>>, Andrew
Palumbo <ap....@outlook.com<mailto:ap....@outlook.com>>
Subject: Re: Intro - Future Mahout - Zeppelin work

Oh yes, he's around. I see him online.

On Mon, May 16, 2016 at 2:42 PM, Trevor Grant <
trevor.d.gr...@gmail.com
<mailto:trevor.d.gr...@gmail.com>> wrote:
Is Dmitriy Lyubimov still around?

Looks like he created this issue for Zeppelin a while ago. (The old
lost
code to which you were referring?)

https://issues.apache.org/jira/browse/ZEPPELIN-116


tg


Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org<http://trevorgrant.org/>

"Fortunate is he, who is able to know the causes of things."  -Virgil


On Mon, May 16, 2016 at 1:37 PM, Suneel Marthi <smar...@apache.org
<mailto:
smar...@apache.org>> wrote:
Welcome to the party TG !!

On Mon, May 16, 2016 at 2:28 PM, Trevor Grant <
trevor.d.gr...@gmail.com
<mailto:trevor.d.gr...@gmail.com>> wrote:
Hey all,

I'm excited for a chance to help out.  I'm actually getting ready to
download now and start playing around.

I had talked about this briefly but it given a properly functioning
Zeppelin interpreter for Apache Mahout, one could leverage all of the
Zeppelin visualizations, anything in AngularJS, or anything in R
(through
clever use of Zeppelin's Resource Pools).

I'll work on getting logged in to the slack channel as well.

Nice to meet you all, looking forward to helping out!

tg


Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org<http://trevorgrant.org/>

"Fortunate is he, who is able to know the causes of things."  -Virgil


On Sun, May 15, 2016 at 12:56 PM, Suneel Marthi <smar...@apache.org
<mailto:smar...@apache.org>> wrote:
FYi...
Trevor was there for my talk, so he has some idea of Mahout Samsara.

On Sun, May 15, 2016 at 1:51 PM, Pat Ferrel <p...@occamsmachete.com
<mailto:
p...@occamsmachete.com>> wrote:
Hey Trevor,

Good to meet you. As you probably know Mahout-Samsara is a
reincarnation
of the project in a new body, which is less a collection of
algorithms
than
a roll-your-own math/algorithm tool. The major benefit is that during
experimentation and later in production the code is by nature
scalable
on
Spark and Flink. Most of the Mahout DSL is R-like and supports tensor
math
but we are now looking at streaming online algo support too.

In any case you probably know we have a Mahout version of the Spark
Shell,
which has been integrated with an old version of Zeppelin (code is
lost).
Recently Andy has experimented with some very nice visualizations of
ML
data (not just analytics data). We as a project are interested in
Zeppelin
integration of our shell and graphics. From what I understand the
graphics
extension mechanism of Zeppelin is based on AngularJS, which I have
some
experience with.

So, we’d like to start the conversation about how to proceed. We
would
love some help but will move ahead in any case.

Pat


On May 15, 2016, at 9:52 AM, Suneel Marthi <smar...@apache.org
<mailto:
smar...@apache.org>> wrote:

Hi Trevor,

Nice meeting u last week in Vancouver.  Per our conversation, I
wanted
to
introduce u to Andrew Palumbo (Mahout Chair) and Pat Ferrel (Mahout
PMC).
As I mentioned in my talk, we are actively looking at Zeppelin
integration
with Mahout (primarily for spark) and would appreciate your help (as
also
all things DL and ML).

We definitely can use all your help as we r revamping the Mahout
project
and shedding its legacy MapReduce image.

I sent u an invite to the Mahout slack channel, mahout.apache.org<
http://mahout.apache.org/> - that's where we all hangout and not
having
to worry about avoiding naughty words.

Looking forward to working with you

Suneel




Reply via email to