Have you seen [ZEPPELIN-116] Add Mahout Support for Spark Interpreter?

https://github.com/apache/incubator-zeppelin/pull/928

It declares in the spark interpreter the mahout deps, and creates the sdc (spark distributed context).

On 29/05/16 19:16, Suneel Marthi wrote:
On Sun, May 29, 2016 at 12:07 PM, Trevor Grant <trevor.d.gr...@gmail.com>
wrote:

OK cool. Just wanted to make sure I wasn't stealing anyone's baby or
duplicating efforts.

Two things:

1- The blog post referenced the linear-regression example notebook twice-
I've updated it to reference the ggplot integration. E.g. import this note:

https://raw.githubusercontent.com/rawkintrevo/mahout-zeppelin/master/%5BMAHOUT%5D%5BPROVING-GROUNDS%5DSpark-Mahout%2Bggplot2.json
(I still need to update with a blurb about sampling, however it is done in
that note...) So to any who tried the blog, I huge appology because that
notebook is where all of the 'magic happened', (all of the screen shots /
gg-plots / etc happened there).

2- I have a working prototype of the Zeppelin integration:
'mahout-terp' branch of :
https://github.com/rawkintrevo/incubator-zeppelin
if you build, and set 'spark.mahout' to 'true' in the Spark Interpretter
properties, you have a Mahout interpreter. This is the minimally invasive
way to do it, I'll be opening a PR soon, we'll see what the gang over at
Zeppelin say.
I'll still need docs and an example notebook, but I'm waiting to make sure
I don't need to do a major refactor before I get carried away with those
activities.

In essence when 'spark-mahout' is 'true' you jump right in on r-like dsl
and you have a sdc declared based on the underlying sc.


I am not sure if messing with the very "sacrosanct" Zeppelin-Spark
interpreter is gonna go down well with the Spark insanity.  I would prefer
having a separate MAhout-Spark-Zeppelin interpreter under Zeppelin project
if that's acceptable to the Zeppelin folks, even though most of it might be
repeatee.

What do others have to say?


have a good holiday weekend,

tg



Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Sun, May 29, 2016 at 10:49 AM, Andrew Palumbo <ap....@outlook.com>
wrote:

Thx Trevor,
Re: m-1854, It was something that we started when were first discussing
using the smile plots for and trying to pipe them over to Zeppelin ..  As
far as I know there was not progress started on it.. I've unassigned it.

Feel free to Assign any Jiras to yourself.  I think that m-1854 is
similar
to the mahout-spark-shell, so I may be able to help out there.


________________________________________
From: Trevor Grant <trevor.d.gr...@gmail.com>
Sent: Saturday, May 28, 2016 11:21:44 PM
To: dev@mahout.apache.org
Subject: Re: Future Mahout - Zeppelin work

Created a subtask on 1855 for tsv strings.

Looking at 1854 assigned to Pat Ferrel, what's your progress to date?
How
can I help?

tg



Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Thu, May 26, 2016 at 2:34 PM, Andrew Palumbo <ap....@outlook.com>
wrote:

Great!

When you free up and have the time, could you create some Jiras for
these?

We actually have MAHOUT-1852 open for Histograms already, and
MAHOUT-1854
and MAHOUT-1855 (early Zeppelin integration Jiras).  I can close m-1854
and
m-1855 out and we can start new ones if they're not relevant anymore or
we
can just go with those.

Thanks

________________________________________
From: Trevor Grant <trevor.d.gr...@gmail.com>
Sent: Thursday, May 26, 2016 3:17:22 PM
To: dev@mahout.apache.org
Subject: Re: Future Mahout - Zeppelin work

Short answer: it is high priority. I think it will be a Mahout
interpreter
into Zeppelin, and given that plans are on hold for a Flink-Mahout in
the
short term, I think it should be a piggy-back spark interpreter (e.g.
exposed through something like %spark.mahout).   So I have thoughts,
but
no
plan.  Been busy with a couple of other commitments.

On the Mahout side we need:
A function that will convert small matrices into TSV strings
Convenience functions for sampling super-large matrices into things
like
histograms, etc, that one would want to plot. I.e. histogram bucketing?
(less important for the moment)

On the Zeppelin Size we need:
an interpreter.


Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Thu, May 26, 2016 at 1:22 PM, Suneel Marthi <smar...@apache.org>
wrote:

While on this subject, do we have a plan yet of integrating Zeppelin
into
Mahout (or the converse) of having Mahout specific interpreter for
Zeppelin?  I think that shuld be high priority in the short term.

On Thu, May 26, 2016 at 1:17 PM, Trevor Grant <
trevor.d.gr...@gmail.com>
wrote:

Ahh, like the "Sample From Matrix" paragraph in the notebook.

Yea that seems like a good add. If not this afternoon, I'll include
it
Saturday.


Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."
-Virgil*


On Thu, May 26, 2016 at 11:52 AM, Andrew Palumbo <
ap....@outlook.com

wrote:

Trevor, I was reading over your blog last night again- first time
since
you updated. It is  great!

I have one suggestion being adding in a code line on how the the
sampling
of the  DRM ->  in-core Matrix is done:







https://github.com/apache/mahout/blob/master/math-scala/src/main/scala/org/apache/mahout/math/drm/package.scala#L148

eg something like:

     mxSin = drmSampleKRows(drmSin, 1000, replacement = false)

Maybe you omitted this intentionally?

Andy

________________________________________
From: Trevor Grant <trevor.d.gr...@gmail.com>
Sent: Friday, May 20, 2016 7:56:20 PM
To: dev@mahout.apache.org
Subject: Re: Future Mahout - Zeppelin work

Unfortunately Zeppelin dev has been so rapid, 0.6-SNAPSHOT as a
version
is
uninformative to me. I'd say if possible, you're first
troubleshooting
measure would be to re clone or do a "git fetch upstream" to get
up
to
the
very latest

Sorry for delayed reply
Tg
On May 20, 2016 5:36 PM, "Andrew Musselman" <
andrew.mussel...@gmail.com>
wrote:

Trevor, my zeppelin source is at this version:

   <groupId>org.apache.zeppelin</groupId>
   <artifactId>zeppelin</artifactId>
   <packaging>pom</packaging>
   <version>0.6.0-incubating-SNAPSHOT</version>
   <name>Zeppelin</name>
   <description>Zeppelin project</description>
   <url>http://zeppelin.incubator.apache.org/</url>

And yes you're right the artifacts weren't added to the
dependencies;
is
that a feature in more modern zep?

On Fri, May 20, 2016 at 3:02 PM, Dmitriy Lyubimov <
dlie...@gmail.com

wrote:

no parenthesis.

import o.a.m.sparkbindings._
....
myRdd = myDrm.rdd


On Fri, May 20, 2016 at 2:57 PM, Suneel Marthi <
smar...@apache.org

wrote:

On Fri, May 20, 2016 at 3:18 PM, Trevor Grant <
trevor.d.gr...@gmail.com>
wrote:

Hey Pat,

If you spit out a TSV - you can import into pyspark /
matplotlib
from
the
resource pool in essentially the same way and use that
plotting
library
if
you prefer.  In fact you could import the tsv into pandas
and
use
all
of
the pandas plotting as well (though I think it is for the
most
part,
also
matplotlib with some convenience functions).











https://www.zeppelinhub.com/viewer/notebooks/aHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2ZlbGl4Y2hldW5nL3NwYXJrLW5vdGVib29rLWV4YW1wbGVzL21hc3Rlci9aZXBwZWxpbl9ub3RlYm9vay8yQU1YNUNWQ1Uvbm90ZS5qc29u

In Zeppelin, unless you specify otherwise, pyspark,
sparkr,
spark-sql,
and
scala-spark all share the same spark context you can
create
RDDs
in
one
language and access them / work on them in another (so I
understand).

So in Mahout can you "save" a matrix as a RDD? e.g.
something
like

val myRDD = myDRM.asRDD()


val myRDD = myDRM.rdd()


And would 'myRDD' then exist in the spark context?

yes it will be in sparkContext


Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of
things."
-Virgil*


On Fri, May 20, 2016 at 12:21 PM, Pat Ferrel <
p...@occamsmachete.com>
wrote:

Agreed.

BTW I don’t want to stall progress but being the most
ignorant
of
plot
libs, I’ll ask if we should consider python and
matplotlib.
In
another
project we use python because of the RDD support on
Spark
though
the
visualizations are extremely limited in our case. If we
can
pass
an
RDD
to
pyspark it would allow custom reductions in python
before
plotting,
even
though we will support many natively in Mahout. I’m
guessing
that
this
would cross a context boundary and require a write to
disk?

So 2 questions:
1) what does the inter language support look like with
Spark
python
vs
SparkR, can we transfer RDDs?
2) are the plot libs significantly different?

On May 20, 2016, at 9:54 AM, Trevor Grant <
trevor.d.gr...@gmail.com>
wrote:

Dmitriy really nailed it on the head in his reply to
the
post
which
I'll
rebroadcast below. In essence the whole reason you are
(theoretically)
using Mahout is the data is to big to fit in memory.
If
it's
to
big
to
fit
in memory, well then its probably too big to plot each
point
(e.g.
trillions of row, you only have so many pixels).   For
the
example
I
randomly sampled a matrix.

So as Dmitriy says, in Mahout we need to have functions
that
will
'preprocess' the data into something plotable.

For the Zepplin-Plotting thing, we need to have a
function
that
will
spit
out a tsv like string of the data we wanted plotted.

I agree an honest Mahout interpreter in Zeppelin is
probably
worth
doing.
There are a couple of ways to go about it. I opened up
the
discussion
on
dev@Zeppelin and didn't get any replies. I'm going to
take
that
to
mean
we
can do it in a way that makes the most sense to Mahout
users...

First steps are to include some methods in Mahout that
will
do
that
preprocessing, and one that will turn something into a
tsv
string.

I have some general ideas on possible approached to
making
an
honest-mahout
interpreter but I want to play in the code and look at
the
Flink-Mahout
shell a bit before I try to organize my thoughts and
present
them.

...(2) not sure what is the point of supporting
distributed
anything.
It
is
distributed presumably because it is hard to keep it in
memory.
Therefore,
plotting anything distributed potentially presents 2
problems:
storage
space and overplotting due to number of points. The
idea
is
that
we
have
to
work out algorithms that condense big data information
into
small
plottable
information (like density grids, for example, or
histograms)....

Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of
things."
-Virgil*


On Fri, May 20, 2016 at 10:22 AM, Pat Ferrel <
p...@occamsmachete.com>
wrote:

Great job Trevor, we’ll need this detail to smooth
out
the
sharp
edges
and
any guidance from you or the Zeppelin community will
be a
big
help.


On May 20, 2016, at 8:13 AM, Shannon Quinn <
squ...@gatech.edu>
wrote:

Agreed, thoroughly enjoying the blog post.

On 5/19/16 12:01 AM, Andrew Palumbo wrote:
Well done, Trevor!  I've not yet had a chance to try
this
in
zeppelin
but I just read the blog which is great!

-------- Original message --------
From: Trevor Grant <trevor.d.gr...@gmail.com>
Date: 05/18/2016 2:44 PM (GMT-05:00)
To: dev@mahout.apache.org
Subject: Re: Future Mahout - Zeppelin work

Ah thank you.

Fixing now.


Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of
things."
-Virgil*


On Wed, May 18, 2016 at 1:04 PM, Andrew Palumbo <
ap....@outlook.com

wrote:

Hey Trevor- Just refreshed your readme.  The jar
that I
mentioned
is
actually:














/home/username/.m2/repository/org/apache/mahout/mahout-spark_2.10/0.12.1-SNAPSHOT/mahout-spark_2.10-0.12.1-SNAPSHOT-dependency-reduced.jar

rather than:














/home/username/.m2/repository/org/apache/mahout/mahout-spark-shell_2.10/0.12.1-SNAPSHOT/mahout-spark_2.10-0.12.1-SNAPSHOT-dependency-reduced.jar

(In the spark module that is)
________________________________________
From: Trevor Grant <trevor.d.gr...@gmail.com>
Sent: Wednesday, May 18, 2016 11:02:43 AM
To: dev@mahout.apache.org
Subject: Re: Future Mahout - Zeppelin work

ah yes- I remember you pointing that out to me too.

I got side tracked yesterday for most of the day on
an
adventure
in
getting
Zeppelin to work right after I accidently updated
to
the
new
snapshot
(free
hint: the secret was to clear my cache *face-palm*)

I'm going to add that dependency to the readme.md
now.

thanks,
tg

Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes
of
things."
-Virgil*


On Wed, May 18, 2016 at 9:59 AM, Andrew Palumbo <
ap....@outlook.com>
wrote:

Trevor this is very cool- I have not been able to
look
at
it
closely
yet
but just a small point: I believe that you'll also
need
to
add
the


mahout-spark_2.10-0.12.1-SNAPSHOT-dependency-reduced.jar

For things like the classification stats,
confusion
matrix,
and
t-digest.

Andy

________________________________________
From: Trevor Grant <trevor.d.gr...@gmail.com>
Sent: Wednesday, May 18, 2016 10:47:21 AM
To: dev@mahout.apache.org
Subject: Re: Future Mahout - Zeppelin work

I still need to update my readme/env per Pat's
comments
below,
however
with
out further ado, I present two notebooks that
integrate
Mahout +
Spark
+
Zeppelin + ggplot2

https://github.com/rawkintrevo/mahout-zeppelin

Supposing you have a somewhat recent version of
Zeppelin
0.6
with
sparkr
support running already, you may import the
following
raw
notes
directly
into Zeppelin:















https://raw.githubusercontent.com/rawkintrevo/mahout-zeppelin/master/%5BMAHOUT%5D%5BPROVING-GROUNDS%5DLinear%20Regression%20in%20Spark.json














https://raw.githubusercontent.com/rawkintrevo/mahout-zeppelin/master/%5BMAHOUT%5D%5BPROVING-GROUNDS%5DSpark-Mahout%2Bggplot2.json
So my thoughs on next steps, which I'm positing
only
as
a
starting
point
for discussion, and are in no particular order of
importance:

- Blog on HOWTO for everyman (assumes no
familiarity
with
Mahout,
and
only
enough familiarity with Zeppelin to have Zeppelin
+
SparkR
support)
- Some syntactic sugar somewhere in Mahout to
convert
a
matrix
into
a
tsv
string. (with some sanity, eg a sample of a
matrix)
- Figure out with Zeppelin community what deeper
integration
feels
like -
e.g. build-profile vs. tutorial
  - I think the case for making a build-profile is
that
Zeppelin
is
first
and foremost a datascience tool for non technical
users.
  - If we go that route I'll need some more support
finding
out
what
is
the
absolute minimum 'bare-bones' mahout we can
include,
e.g.
does
the
user
have to have mahout installed? To be discussed.
- Add matplotlib (python) "support" -> paragraph
showing
how
to
do
the
same
thing in Python.

The basic deal here is we are:
1) Setting up a standard Zeppelin Spark
Interpretter
to
act
like a
Mahout
interpretter
    - This is taken care of by setting some env.
variables,
adding
some
dependencies, and importing relevent packages
2) do mahout things as you do
3) export table to tsv string, which is passed to
a
resource
pool
   - This could be done to a disk if you didn't
have
zeppelin
4) read the tsv from the resource pool (or disk if
you
didn't
have
zeppelin) in R (python soon) and create a <plot
package
of
your
choice>

To Pat's point- this is a kind of clumsy pipeline,
however
the
Zeppelin
wrapper at least makes it *feel* less so.


Trevor Grant
Data Scientist
https://github.com/rawkintrevo

http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes
of
things."
-Virgil*


On Tue, May 17, 2016 at 1:17 PM, Pat Ferrel <
p...@occamsmachete.com>
wrote:
Seems like there is plenty to use in ggplot or
python
but
the
pipeline
is
a little convoluted (so maybe no need for Angular
integration).
To
get
graphics out of Mahout it would be nice to not
require
knowledge
of R
and/or python. Knowing Mahout is already bad
enough
but I
guess
the
API
from the Mahout side for plotting could be Scala
syntactic
sugar.
What
and
how this all is installed and setup is the next
question.

BTW this is what I use elsewhere (Mahout as a lib
to
this
code)

    "spark.serializer":
"org.apache.spark.serializer.KryoSerializer",
    "spark.kryo.registrator":

"org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator",
    "spark.kryo.referenceTracking": "false",
    "spark.kryoserializer.buffer": "300m”,

afaik you will only see if Kryo is working when
you
have
to
serialize
a
mahout specific data type like vector of drm,
something
registered
with
Kryo.


On May 16, 2016, at 6:18 PM, Trevor Grant <
trevor.d.gr...@gmail.com>
wrote:

As a quick recap- we're trying to leverage
Zeppelin
for
charting.

It seems as though this can be achieved by
- Adding properties to the Spark Interpreter
- Adding dependency jars to the spark interpreter
- importing in a spark paragraph

All seems to be working well, but I've fooled
myself
into
thinking
things
were 'working' before because I wasn't actually
integrating.
Lower
I
will
outline the imports/properties, please look over
and
tell
me
if
I'm
theoretically missing anything.

The next phase for me will be
1) Convert a matrix to some sort of serializable
object
that
I
can
easily
unpack from R
2) use Zeppelin's resource buffers to pass the
object
3) collect the object in an R paragraph, convert
it
to
a
dataframe
then
map
using ggplot

Once I have a working prototype I will work add
some
syntactic
sugar
to
prepare the matrix from the scala side and pass
to
zeppelin
(using
resource
pools so the same functionality can be reused in
Flink)
and
an
R
library
containing some functions which will pull the
data
out
of
the
resource
pool
and spit out a dataframe.

Once its in a Dataframe in R- go nuts with any
plotting
package
you
like.
Likewise, it should be possible to do the same
thing
with
matplotlib
and
python (
https://gist.github.com/andershammar/9070e0f6916a0fbda7a5)

All of this doesn't necessarily require any
changing
of
the
Zeppelin
source
code, and isn't very intrusive or difficult to
set
up,
I'll
make
a
blog
post but its almost a text book entry tutorial on
using
imports
in
Zeppelin. (e.g. a tutorial would be just as at
home
on
the
Zeppelin
site
as
it would on the Mahout site).

Now, there has been some talk of using Zeppelin's
angularJS.
Things
get
a
little more harry in that case, but we could make
an
optional
build
profile
that would make zeppelin recognize matrices at
tables
and
expose
all
of
the
built in charting features of Zeppelin.

If you're not adding a bunch of custom charts to
Zeppelin
(which
would
be
somewhat tedious), you're going to end up with a
lot
of
examples
where
you
create a table in Mahout/Spark pass it to
AngularJS
then
some
AngularJS
code charts it for you.  At that point however,
you're
doing
just
as
much
work, if not more than it would be to simply pass
to
R
or
Python
and
let
ggplot or matlibplot do the work for you.

Finally, I haven't run into any errors yet using
Kyro
(which
in
part
is
what makes me fear I'm not doing this right... it
was
too
easy...)
If
anything seems redundant or missing, please call
it
out.

Add Properties to Spark interp:

spark.kryo.registrator

org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator
spark.serializer
org.apache.spark.serializer.KryoSerializer

Add artifacts (need to change these to maven not
local,
also
need
to
add/change one jar per below, however this does
run):















/home/trevor/.m2/repository/org/apache/mahout/mahout-math/0.12.1-SNAPSHOT/mahout-math-0.12.1-SNAPSHOT.jar













/home/trevor/.m2/repository/org/apache/mahout/mahout-math-scala_2.10/0.12.1-SNAPSHOT/mahout-math-scala_2.10-0.12.1-SNAPSHOT.jar













/home/trevor/.m2/repository/org/apache/mahout/mahout-spark_2.10/0.12.1-SNAPSHOT/mahout-spark_2.10-0.12.1-SNAPSHOT.jar













/home/trevor/.m2/repository/org/apache/mahout/mahout-spark-shell_2.10/0.12.1-SNAPSHOT/mahout-spark-shell_2.10-0.12.1-SNAPSHOT.jar
Add following code to first paragraph of
notebook:
```
%spark
import org.apache.mahout.math._
import org.apache.mahout.math.scalabindings._
import org.apache.mahout.math.drm._
import
org.apache.mahout.math.scalabindings.RLikeOps._
import org.apache.mahout.math.drm.RLikeDrmOps._
import org.apache.mahout.sparkbindings._

implicit val sdc:

org.apache.mahout.sparkbindings.SparkDistributedContext
=
sc2sdc(sc)
```



Trevor Grant
Data Scientist
https://github.com/rawkintrevo

http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes
of
things."
-Virgil*


On Mon, May 16, 2016 at 6:42 PM, Pat Ferrel <
p...@occamsmachete.com

wrote:
Creating an mc used to do some Kryo setup, like
registering
serializers
or
serializer factories IIRC. Also there is the
Spark
conf
for
allocating
memory for the Kryo buffer. Look at the code in
the
mc
creation
code
in
the
Spark package helpers. All can be done in
straight
Spark
and
passed
in
to
create the mc when needed. Again from old weak
brain
cells
but I
think
that
is part of what makes the Mahout shell different
than
teh
Spark
shell
plus
imports, it auto-creates the mc instead of or
along
with
an
sc.

When I get back to my computer I can check.

On May 16, 2016, at 3:40 PM, Andrew Palumbo <
ap....@outlook.com

wrote:
Trevor,

Could you post any kryo errors that you may be
having?

________________________________
From: Andrew Palumbo <ap....@outlook.com>
Sent: Monday, May 16, 2016 6:25:07 PM
To: mahout
Subject: Future Mahout - Zeppelin work




To Dmitriy's point, I agree ggplot is def the
priority,
The
mahout
plots
are at this point are really just a POC, but at
some
point
we
may
be
want
to integrate some data transformation features
into
the
mahout
plots
classes so they're really more future work.


long story short:


OK. I'll read through the examples and try to
do
something
with
some
data, then do a ggplot and/or an angular plot on
it
(probably
ggplot).
I'll do a quick tutorial. Then I'll reopen
discussion
on
that
Zeppelin
issue about weather we want to go ahead and add
another
interpreter.


Souds Great.


Thank you.

________________________________
From: Trevor Grant <trevor.d.gr...@gmail.com>
Sent: Monday, May 16, 2016 5:49:17 PM
To: Dmitriy Lyubimov
Cc: Andrew Palumbo; Pat Ferrel; Suneel Marthi
Subject: Re: Intro - Future Mahout - Zeppelin
work

I just signed up for dev, should i just reply
all
and
cc
dev
or
start a
new thread?

Trevor Grant
Data Scientist
https://github.com/rawkintrevo
[
https://avatars3.githubusercontent.com/u/5852441?v=3&s=400
]<
https://github.com/rawkintrevo>

rawkintrevo (Trevor Grant) · GitHub<
https://github.com/rawkintrevo>
github.com
rawkintrevo has 12 repositories written in
Python,
Batchfile,
and
R.
Follow their code on GitHub.



http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

"Fortunate is he, who is able to know the causes
of
things."
-Virgil


On Mon, May 16, 2016 at 4:46 PM, Dmitriy
Lyubimov
<
dlie...@gmail.com
<mailto:dlie...@gmail.com>> wrote:
fwiw ggplot2 is pretty darn advanced:) i am a
bit
skeptical
smile
would
have something that ggplot2 would not, the other
way
around
is
much
more
expected by me:)

anyhow if ggplot2 and matplotlib are available
in
Zeppelin
without
major
limitations, it sounds like Zeppelin should be
an
all
around
very
nice
venue then.

On Mon, May 16, 2016 at 2:42 PM, Andrew Palumbo
<
ap....@outlook.com
<mailto:ap....@outlook.com>> wrote:

yeah we should probably move this over to dev@


sorry- answering a question from a couple emails
back
on
the
thread.


If possible,  I think it would be great to
eventually
have
both
(native
mahout/smile plots and ggplot), since in the
future
we're
going
to
be
adding more visualization features rather than
simple
scatter
plots
etc
that may not be covered by ggplot.


That's why we were thinking about using angular
and
the
pngs.


But what youre saying in your last email would
be
great!


Thank you!


________________________________
From: Trevor Grant <trevor.d.gr...@gmail.com
<mailto:
trevor.d.gr...@gmail.com>>
Sent: Monday, May 16, 2016 5:33:12 PM
To: Andrew Palumbo
Cc: Pat Ferrel; Suneel Marthi; Dmitriy Lyubimov

Subject: Re: Intro - Future Mahout - Zeppelin
work

I somehow replied to your last email without
seeing
it...

OK. I'll read through the examples and try to do
something
with
some
data,
then do a ggplot and/or an angular plot on it
(probably
ggplot).

I'll do a quick tutorial. Then I'll reopen
discussion
on
that
Zeppelin
issue about weather we want to go ahead and add
another
interpreter.

Trevor Grant
Data Scientist
https://github.com/rawkintrevo

http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

"Fortunate is he, who is able to know the causes
of
things."
-Virgil


On Mon, May 16, 2016 at 4:26 PM, Trevor Grant <
trevor.d.gr...@gmail.com
<mailto:trevor.d.gr...@gmail.com>> wrote:
sorry for double email but are you thinking
visualization
should
be
a
library internal to mahout or should we leverage
zeppelins
visualization
capabilities?

Also, should we move this discussion to dev?

tg


Trevor Grant
Data Scientist
https://github.com/rawkintrevo

http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

"Fortunate is he, who is able to know the causes
of
things."
-Virgil


On Mon, May 16, 2016 at 4:14 PM, Andrew Palumbo
<
ap....@outlook.com
<mailto:ap....@outlook.com>> wrote:

Sorry- to be a little more clear,  Part of what
we're
trying
to
is
to
get
the new plotting features integrated with
Zeppelin.
We
plan
on
adding
more
advanced plotting.


________________________________
From: Andrew Palumbo <ap....@outlook.com
<mailto:
ap....@outlook.com

Sent: Monday, May 16, 2016 5:04:49 PM
To: Pat Ferrel; Trevor Grant
Cc: Suneel Marthi; Dmitriy Lyubimov
Subject: Re: Intro - Future Mahout - Zeppelin
work


Awesome!


most of the hard work was done by Dmitriy[??] ,
I've
just
reworked
it a
couple of times to keep up with spark's
refactoring.


I think that you will also need to include:



mahout-spark_2.10-0.12.1-SNAPSHOT-dependency-reduced.jar


For the new plotting features that we're working
on.


the plotting is still a work in progress, and
the
grid
and
surface
plots
are not working properly.  The plots are swing
based
and
can
currently
be
exported as  PNGs.  There are a few examples on
the
closed
PR:
https://github.com/apache/mahout/pull/230


There is an example script in
examples/bin/spark-shell-plot.mscala
(commited to master) :













https://github.com/apache/mahout/blob/master/examples/bin/spark-shell-plot.mscala

Thanks!



________________________________
From: Pat Ferrel <p...@occamsmachete.com<mailto:
p...@occamsmachete.com
Sent: Monday, May 16, 2016 4:54:15 PM
To: Trevor Grant
Cc: Andrew Palumbo; Suneel Marthi; Dmitriy
Lyubimov
Subject: Re: Intro - Future Mahout - Zeppelin
work

This is only the beginning. Andy has been using
Smile
as a
visualization
lib since it is pretty rich in ML support. We
are
looking
at
integrating
some of that with Zeppelin then adding code to
feed
the
new
visualizations
in Mahout. I’m here because I’m fairly familiar
with
AngularJS
if
that’s
the way to go. Smile is swing based but can
output
pngs,
maybe
other
image
formats—Andy?

BTW Dmitriy is still very involved but has
rouble
getting
permission
to
donate code.


On May 16, 2016, at 1:45 PM, Trevor Grant <
trevor.d.gr...@gmail.com
<mailto:trevor.d.gr...@gmail.com>> wrote:

Hey Andrew,

thanks- you basically did all of the hard work
for
me!

I've got the linear regression example working
from:


http://mahout.apache.org/users/sparkbindings/play-with-shell.html

my java is sketchy at best, i tend to over
import. I
pulled
in
the
following jars:














org/apache/mahout/mahout-math/0.12.1-SNAPSHOT/mahout-math-0.12.1-SNAPSHOT.jar













org/apache/mahout/mahout-math-scala_2.10/0.12.1-SNAPSHOT/mahout-math-scala_2.10-0.12.1-SNAPSHOT.jar













org/apache/mahout/mahout-spark_2.10/0.12.1-SNAPSHOT/mahout-spark_2.10-0.12.1-SNAPSHOT.jar













org/apache/mahout/mahout-spark-shell_2.10/0.12.1-SNAPSHOT/mahout-spark-shell_2.10-0.12.1-SNAPSHOT.jar
I think those are all necessary...  should I be
pulling
in
more?

I hate to say it (but will do so bc this isn't
public)
this
integration
is
super easy from a user perspective, almost too
easy-
eg
why
not
let
the
user add it themselves...  Add the appropriate
maven
artifacts,
restart
the
interpreter and run the following in a notebook:
```
import org.apache.mahout.math._
import org.apache.mahout.math.scalabindings._
import org.apache.mahout.math.drm._
import
org.apache.mahout.math.scalabindings.RLikeOps._
import org.apache.mahout.math.drm.RLikeDrmOps._
import org.apache.mahout.sparkbindings._

implicit val sdc:

org.apache.mahout.sparkbindings.SparkDistributedContext
= sc2sdc(sc)
```
Then whatever code you want and you're off to
the
races...

that said, adding a build profile like
-PsparkMahout
and
creating
an
interpretter like %spark.mahout should be fairly
straight
forward.

Second question, do you have an example that
would
be
more
'visualization
friendly'? I could pass the results to Angular
or
R
just
to
show
off
how
to
do it.

Which leads back to the question, is this even
worth
building
a
full
interpreter for or just make a really nice blog
post
with
examples
on
how
to integrate with R...?








Trevor Grant
Data Scientist
https://github.com/rawkintrevo

http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org<http://trevorgrant.org/>

"Fortunate is he, who is able to know the causes
of
things."
-Virgil


On Mon, May 16, 2016 at 2:09 PM, Andrew Palumbo
<
ap....@outlook.com
<mailto:ap....@outlook.com>> wrote:
Hi Trevor, welcome!

It's great to have you helping out, thanks very
much.
I've
done a
good
amount of work on our mahout spark shell .. so
let
me
know
if
you
have
any
questions there about what we did there..

Thanks alot!

Andy


-------- Original message --------
From: Suneel Marthi <smar...@apache.org<mailto:
smar...@apache.org

Date: 05/16/2016 2:44 PM (GMT-05:00)
To: Trevor Grant <trevor.d.gr...@gmail.com
<mailto:
trevor.d.gr...@gmail.com
Cc: Suneel Marthi <smar...@apache.org<mailto:
smar...@apache.org
,
Pat
Ferrel <p...@occamsmachete.com<mailto:
p...@occamsmachete.com
,
Andrew
Palumbo <ap....@outlook.com<mailto:
ap....@outlook.com

Subject: Re: Intro - Future Mahout - Zeppelin
work

Oh yes, he's around. I see him online.

On Mon, May 16, 2016 at 2:42 PM, Trevor Grant <
trevor.d.gr...@gmail.com
<mailto:trevor.d.gr...@gmail.com>> wrote:
Is Dmitriy Lyubimov still around?

Looks like he created this issue for Zeppelin a
while
ago.
(The
old
lost
code to which you were referring?)


https://issues.apache.org/jira/browse/ZEPPELIN-116


tg


Trevor Grant
Data Scientist
https://github.com/rawkintrevo

http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org<http://trevorgrant.org/>

"Fortunate is he, who is able to know the causes
of
things."
-Virgil


On Mon, May 16, 2016 at 1:37 PM, Suneel Marthi <
smar...@apache.org
<mailto:
smar...@apache.org>> wrote:
Welcome to the party TG !!

On Mon, May 16, 2016 at 2:28 PM, Trevor Grant <
trevor.d.gr...@gmail.com
<mailto:trevor.d.gr...@gmail.com>> wrote:
Hey all,

I'm excited for a chance to help out.  I'm
actually
getting
ready
to
download now and start playing around.

I had talked about this briefly but it given a
properly
functioning
Zeppelin interpreter for Apache Mahout, one
could
leverage
all
of
the
Zeppelin visualizations, anything in AngularJS,
or
anything
in R
(through
clever use of Zeppelin's Resource Pools).

I'll work on getting logged in to the slack
channel
as
well.

Nice to meet you all, looking forward to helping
out!

tg


Trevor Grant
Data Scientist
https://github.com/rawkintrevo

http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org<http://trevorgrant.org/>

"Fortunate is he, who is able to know the causes
of
things."
-Virgil


On Sun, May 15, 2016 at 12:56 PM, Suneel Marthi
<
smar...@apache.org
<mailto:smar...@apache.org>> wrote:
FYi...
Trevor was there for my talk, so he has some
idea
of
Mahout
Samsara.

On Sun, May 15, 2016 at 1:51 PM, Pat Ferrel <
p...@occamsmachete.com
<mailto:
p...@occamsmachete.com>> wrote:
Hey Trevor,

Good to meet you. As you probably know
Mahout-Samsara
is a
reincarnation
of the project in a new body, which is less a
collection
of
algorithms
than
a roll-your-own math/algorithm tool. The major
benefit
is
that
during
experimentation and later in production the code
is
by
nature
scalable
on
Spark and Flink. Most of the Mahout DSL is
R-like
and
supports
tensor
math
but we are now looking at streaming online algo
support
too.

In any case you probably know we have a Mahout
version
of
the
Spark
Shell,
which has been integrated with an old version of
Zeppelin
(code
is
lost).
Recently Andy has experimented with some very
nice
visualizations
of
ML
data (not just analytics data). We as a project
are
interested
in
Zeppelin
integration of our shell and graphics. From
what I
understand
the
graphics
extension mechanism of Zeppelin is based on
AngularJS,
which I
have
some
experience with.

So, we’d like to start the conversation about
how
to
proceed.
We
would
love some help but will move ahead in any case.

Pat


On May 15, 2016, at 9:52 AM, Suneel Marthi <
smar...@apache.org
<mailto:
smar...@apache.org>> wrote:

Hi Trevor,

Nice meeting u last week in Vancouver.  Per our
conversation,
I
wanted
to
introduce u to Andrew Palumbo (Mahout Chair) and
Pat
Ferrel
(Mahout
PMC).
As I mentioned in my talk, we are actively
looking
at
Zeppelin
integration
with Mahout (primarily for spark) and would
appreciate
your
help
(as
also
all things DL and ML).

We definitely can use all your help as we r
revamping
the
Mahout
project
and shedding its legacy MapReduce image.

I sent u an invite to the Mahout slack channel,
mahout.apache.org
<
http://mahout.apache.org/> - that's where we
all
hangout
and
not
having
to worry about avoiding naughty words.

Looking forward to working with you

Suneel



















Reply via email to