At this step of the tutorial I'm stuck because I don't have an "Import
Note" link in my Zeppelin home:

"I’m going to do you another favor. Go to the Zeppelin home page and click
on ‘Import Note’. When given the option between URL and json, click on URL
and enter the following link:

https://raw.githubusercontent.com/rawkintrevo/mahout-zeppelin/master/%5BMAHOUT%5D%5BPROVING-GROUNDS%5DLinear%20Regression%20in%20Spark.json
"

On Fri, May 20, 2016 at 12:35 PM, Trevor Grant <trevor.d.gr...@gmail.com>
wrote:

> FYI:
>
> Looks like Flink shell is fixed :D
>
> https://github.com/apache/flink/pull/1913
>
> (I tested, is working good).
>
>
>
> Trevor Grant
> Data Scientist
> https://github.com/rawkintrevo
> http://stackexchange.com/users/3002022/rawkintrevo
> http://trevorgrant.org
>
> *"Fortunate is he, who is able to know the causes of things."  -Virgil*
>
>
> On Fri, May 20, 2016 at 1:46 PM, Suneel Marthi <smar...@apache.org> wrote:
>
> > On Fri, May 20, 2016 at 12:54 PM, Trevor Grant <trevor.d.gr...@gmail.com
> >
> > wrote:
> >
> > > Dmitriy really nailed it on the head in his reply to the post which
> I'll
> > > rebroadcast below. In essence the whole reason you are (theoretically)
> > > using Mahout is the data is to big to fit in memory.  If it's to big to
> > fit
> > > in memory, well then its probably too big to plot each point (e.g.
> > > trillions of row, you only have so many pixels).   For the example I
> > > randomly sampled a matrix.
> > >
> > > So as Dmitriy says, in Mahout we need to have functions that will
> > > 'preprocess' the data into something plotable.
> > >
> > > For the Zepplin-Plotting thing, we need to have a function that will
> spit
> > > out a tsv like string of the data we wanted plotted.
> > >
> > > I agree an honest Mahout interpreter in Zeppelin is probably worth
> doing.
> > > There are a couple of ways to go about it. I opened up the discussion
> on
> > > dev@Zeppelin and didn't get any replies. I'm going to take that to
> mean
> > we
> > > can do it in a way that makes the most sense to Mahout users...
> > >
> > > First steps are to include some methods in Mahout that will do that
> > > preprocessing, and one that will turn something into a tsv string.
> > >
> > > I have some general ideas on possible approached to making an
> > honest-mahout
> > > interpreter but I want to play in the code and look at the Flink-Mahout
> > > shell a bit before I try to organize my thoughts and present them.
> > >
> >
> > FYI Trevor, there's no Flink-Mahout shell today; in large part because
> the
> > Flink Shell is still busted on their end and we on the Mahout end have
> not
> > had time to muck with it.  What exists today is the Mahout-Spark shell.
> >
> > >
> > > ...(2) not sure what is the point of supporting distributed anything.
> It
> > is
> > > distributed presumably because it is hard to keep it in memory.
> > Therefore,
> > > plotting anything distributed potentially presents 2 problems: storage
> > > space and overplotting due to number of points. The idea is that we
> have
> > to
> > > work out algorithms that condense big data information into small
> > plottable
> > > information (like density grids, for example, or histograms)....
> > >
> >
> > Agreed, something like sampling x% of points from a DRM (like the
> visuals I
> > had from Palumbo for the talk in Vancouver that demonstrated the concept)
> >
> >
> > >
> > > Trevor Grant
> > > Data Scientist
> > > https://github.com/rawkintrevo
> > > http://stackexchange.com/users/3002022/rawkintrevo
> > > http://trevorgrant.org
> > >
> > > *"Fortunate is he, who is able to know the causes of things."  -Virgil*
> > >
> > >
> > > On Fri, May 20, 2016 at 10:22 AM, Pat Ferrel <p...@occamsmachete.com>
> > > wrote:
> > >
> > > > Great job Trevor, we’ll need this detail to smooth out the sharp
> edges
> > > and
> > > > any guidance from you or the Zeppelin community will be a big help.
> > > >
> > > >
> > > > On May 20, 2016, at 8:13 AM, Shannon Quinn <squ...@gatech.edu>
> wrote:
> > > >
> > > > Agreed, thoroughly enjoying the blog post.
> > > >
> > > > On 5/19/16 12:01 AM, Andrew Palumbo wrote:
> > > > > Well done, Trevor!  I've not yet had a chance to try this in
> zeppelin
> > > > but I just read the blog which is great!
> > > > >
> > > > > -------- Original message --------
> > > > > From: Trevor Grant <trevor.d.gr...@gmail.com>
> > > > > Date: 05/18/2016 2:44 PM (GMT-05:00)
> > > > > To: dev@mahout.apache.org
> > > > > Subject: Re: Future Mahout - Zeppelin work
> > > > >
> > > > > Ah thank you.
> > > > >
> > > > > Fixing now.
> > > > >
> > > > >
> > > > > Trevor Grant
> > > > > Data Scientist
> > > > > https://github.com/rawkintrevo
> > > > > http://stackexchange.com/users/3002022/rawkintrevo
> > > > > http://trevorgrant.org
> > > > >
> > > > > *"Fortunate is he, who is able to know the causes of things."
> > -Virgil*
> > > > >
> > > > >
> > > > > On Wed, May 18, 2016 at 1:04 PM, Andrew Palumbo <
> ap....@outlook.com>
> > > > wrote:
> > > > >
> > > > >> Hey Trevor- Just refreshed your readme.  The jar that I mentioned
> is
> > > > >> actually:
> > > > >>
> > > > >>
> > > > >>
> > > >
> > >
> >
> /home/username/.m2/repository/org/apache/mahout/mahout-spark_2.10/0.12.1-SNAPSHOT/mahout-spark_2.10-0.12.1-SNAPSHOT-dependency-reduced.jar
> > > > >>
> > > > >> rather than:
> > > > >>
> > > > >>
> > > > >>
> > > >
> > >
> >
> /home/username/.m2/repository/org/apache/mahout/mahout-spark-shell_2.10/0.12.1-SNAPSHOT/mahout-spark_2.10-0.12.1-SNAPSHOT-dependency-reduced.jar
> > > > >>
> > > > >> (In the spark module that is)
> > > > >> ________________________________________
> > > > >> From: Trevor Grant <trevor.d.gr...@gmail.com>
> > > > >> Sent: Wednesday, May 18, 2016 11:02:43 AM
> > > > >> To: dev@mahout.apache.org
> > > > >> Subject: Re: Future Mahout - Zeppelin work
> > > > >>
> > > > >> ah yes- I remember you pointing that out to me too.
> > > > >>
> > > > >> I got side tracked yesterday for most of the day on an adventure
> in
> > > > getting
> > > > >> Zeppelin to work right after I accidently updated to the new
> > snapshot
> > > > (free
> > > > >> hint: the secret was to clear my cache *face-palm*)
> > > > >>
> > > > >> I'm going to add that dependency to the readme.md now.
> > > > >>
> > > > >> thanks,
> > > > >> tg
> > > > >>
> > > > >> Trevor Grant
> > > > >> Data Scientist
> > > > >> https://github.com/rawkintrevo
> > > > >> http://stackexchange.com/users/3002022/rawkintrevo
> > > > >> http://trevorgrant.org
> > > > >>
> > > > >> *"Fortunate is he, who is able to know the causes of things."
> > > -Virgil*
> > > > >>
> > > > >>
> > > > >> On Wed, May 18, 2016 at 9:59 AM, Andrew Palumbo <
> ap....@outlook.com
> > >
> > > > >> wrote:
> > > > >>
> > > > >>> Trevor this is very cool- I have not been able to look at it
> > closely
> > > > yet
> > > > >>> but just a small point: I believe that you'll also need to add
> the
> > > > >>>
> > > > >>> mahout-spark_2.10-0.12.1-SNAPSHOT-dependency-reduced.jar
> > > > >>>
> > > > >>> For things like the classification stats, confusion matrix, and
> > > > t-digest.
> > > > >>>
> > > > >>> Andy
> > > > >>>
> > > > >>> ________________________________________
> > > > >>> From: Trevor Grant <trevor.d.gr...@gmail.com>
> > > > >>> Sent: Wednesday, May 18, 2016 10:47:21 AM
> > > > >>> To: dev@mahout.apache.org
> > > > >>> Subject: Re: Future Mahout - Zeppelin work
> > > > >>>
> > > > >>> I still need to update my readme/env per Pat's comments below,
> > > however
> > > > >> with
> > > > >>> out further ado, I present two notebooks that integrate Mahout +
> > > Spark
> > > > +
> > > > >>> Zeppelin + ggplot2
> > > > >>>
> > > > >>> https://github.com/rawkintrevo/mahout-zeppelin
> > > > >>>
> > > > >>> Supposing you have a somewhat recent version of Zeppelin 0.6 with
> > > > sparkr
> > > > >>> support running already, you may import the following raw notes
> > > > directly
> > > > >>> into Zeppelin:
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> https://raw.githubusercontent.com/rawkintrevo/mahout-zeppelin/master/%5BMAHOUT%5D%5BPROVING-GROUNDS%5DLinear%20Regression%20in%20Spark.json
> > > > >>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> https://raw.githubusercontent.com/rawkintrevo/mahout-zeppelin/master/%5BMAHOUT%5D%5BPROVING-GROUNDS%5DSpark-Mahout%2Bggplot2.json
> > > > >>> So my thoughs on next steps, which I'm positing only as a
> starting
> > > > point
> > > > >>> for discussion, and are in no particular order of importance:
> > > > >>>
> > > > >>> - Blog on HOWTO for everyman (assumes no familiarity with Mahout,
> > and
> > > > >> only
> > > > >>> enough familiarity with Zeppelin to have Zeppelin + SparkR
> support)
> > > > >>> - Some syntactic sugar somewhere in Mahout to convert a matrix
> > into a
> > > > tsv
> > > > >>> string. (with some sanity, eg a sample of a matrix)
> > > > >>> - Figure out with Zeppelin community what deeper integration
> feels
> > > > like -
> > > > >>> e.g. build-profile vs. tutorial
> > > > >>>   - I think the case for making a build-profile is that Zeppelin
> is
> > > > first
> > > > >>> and foremost a datascience tool for non technical users.
> > > > >>>   - If we go that route I'll need some more support finding out
> > what
> > > is
> > > > >> the
> > > > >>> absolute minimum 'bare-bones' mahout we can include, e.g. does
> the
> > > user
> > > > >>> have to have mahout installed? To be discussed.
> > > > >>> - Add matplotlib (python) "support" -> paragraph showing how to
> do
> > > the
> > > > >> same
> > > > >>> thing in Python.
> > > > >>>
> > > > >>> The basic deal here is we are:
> > > > >>> 1) Setting up a standard Zeppelin Spark Interpretter to act like
> a
> > > > Mahout
> > > > >>> interpretter
> > > > >>>     - This is taken care of by setting some env. variables,
> adding
> > > some
> > > > >>> dependencies, and importing relevent packages
> > > > >>> 2) do mahout things as you do
> > > > >>> 3) export table to tsv string, which is passed to a resource pool
> > > > >>>    - This could be done to a disk if you didn't have zeppelin
> > > > >>> 4) read the tsv from the resource pool (or disk if you didn't
> have
> > > > >>> zeppelin) in R (python soon) and create a <plot package of your
> > > choice>
> > > > >>>
> > > > >>> To Pat's point- this is a kind of clumsy pipeline, however the
> > > Zeppelin
> > > > >>> wrapper at least makes it *feel* less so.
> > > > >>>
> > > > >>>
> > > > >>> Trevor Grant
> > > > >>> Data Scientist
> > > > >>> https://github.com/rawkintrevo
> > > > >>> http://stackexchange.com/users/3002022/rawkintrevo
> > > > >>> http://trevorgrant.org
> > > > >>>
> > > > >>> *"Fortunate is he, who is able to know the causes of things."
> > > -Virgil*
> > > > >>>
> > > > >>>
> > > > >>> On Tue, May 17, 2016 at 1:17 PM, Pat Ferrel <
> p...@occamsmachete.com
> > >
> > > > >> wrote:
> > > > >>>> Seems like there is plenty to use in ggplot or python but the
> > > pipeline
> > > > >> is
> > > > >>>> a little convoluted (so maybe no need for Angular integration).
> To
> > > get
> > > > >>>> graphics out of Mahout it would be nice to not require knowledge
> > of
> > > R
> > > > >>>> and/or python. Knowing Mahout is already bad enough but I guess
> > the
> > > > API
> > > > >>>> from the Mahout side for plotting could be Scala syntactic
> sugar.
> > > What
> > > > >>> and
> > > > >>>> how this all is installed and setup is the next question.
> > > > >>>>
> > > > >>>> BTW this is what I use elsewhere (Mahout as a lib to this code)
> > > > >>>>
> > > > >>>>     "spark.serializer":
> > > "org.apache.spark.serializer.KryoSerializer",
> > > > >>>>     "spark.kryo.registrator":
> > > > >>>> "org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator",
> > > > >>>>     "spark.kryo.referenceTracking": "false",
> > > > >>>>     "spark.kryoserializer.buffer": "300m”,
> > > > >>>>
> > > > >>>> afaik you will only see if Kryo is working when you have to
> > > serialize
> > > > a
> > > > >>>> mahout specific data type like vector of drm, something
> registered
> > > > with
> > > > >>>> Kryo.
> > > > >>>>
> > > > >>>>
> > > > >>>> On May 16, 2016, at 6:18 PM, Trevor Grant <
> > trevor.d.gr...@gmail.com
> > > >
> > > > >>>> wrote:
> > > > >>>>
> > > > >>>> As a quick recap- we're trying to leverage Zeppelin for
> charting.
> > > > >>>>
> > > > >>>> It seems as though this can be achieved by
> > > > >>>> - Adding properties to the Spark Interpreter
> > > > >>>> - Adding dependency jars to the spark interpreter
> > > > >>>> - importing in a spark paragraph
> > > > >>>>
> > > > >>>> All seems to be working well, but I've fooled myself into
> thinking
> > > > >> things
> > > > >>>> were 'working' before because I wasn't actually integrating.
> > Lower I
> > > > >> will
> > > > >>>> outline the imports/properties, please look over and tell me if
> > I'm
> > > > >>>> theoretically missing anything.
> > > > >>>>
> > > > >>>> The next phase for me will be
> > > > >>>> 1) Convert a matrix to some sort of serializable object that I
> can
> > > > >> easily
> > > > >>>> unpack from R
> > > > >>>> 2) use Zeppelin's resource buffers to pass the object
> > > > >>>> 3) collect the object in an R paragraph, convert it to a
> dataframe
> > > > then
> > > > >>> map
> > > > >>>> using ggplot
> > > > >>>>
> > > > >>>> Once I have a working prototype I will work add some syntactic
> > sugar
> > > > to
> > > > >>>> prepare the matrix from the scala side and pass to zeppelin
> (using
> > > > >>> resource
> > > > >>>> pools so the same functionality can be reused in Flink) and an R
> > > > >> library
> > > > >>>> containing some functions which will pull the data out of the
> > > resource
> > > > >>> pool
> > > > >>>> and spit out a dataframe.
> > > > >>>>
> > > > >>>> Once its in a Dataframe in R- go nuts with any plotting package
> > you
> > > > >> like.
> > > > >>>> Likewise, it should be possible to do the same thing with
> > matplotlib
> > > > >> and
> > > > >>>> python (
> https://gist.github.com/andershammar/9070e0f6916a0fbda7a5
> > )
> > > > >>>>
> > > > >>>> All of this doesn't necessarily require any changing of the
> > Zeppelin
> > > > >>> source
> > > > >>>> code, and isn't very intrusive or difficult to set up, I'll
> make a
> > > > blog
> > > > >>>> post but its almost a text book entry tutorial on using imports
> in
> > > > >>>> Zeppelin. (e.g. a tutorial would be just as at home on the
> > Zeppelin
> > > > >> site
> > > > >>> as
> > > > >>>> it would on the Mahout site).
> > > > >>>>
> > > > >>>> Now, there has been some talk of using Zeppelin's angularJS.
> > Things
> > > > >> get
> > > > >>> a
> > > > >>>> little more harry in that case, but we could make an optional
> > build
> > > > >>> profile
> > > > >>>> that would make zeppelin recognize matrices at tables and expose
> > all
> > > > of
> > > > >>> the
> > > > >>>> built in charting features of Zeppelin.
> > > > >>>>
> > > > >>>> If you're not adding a bunch of custom charts to Zeppelin (which
> > > would
> > > > >> be
> > > > >>>> somewhat tedious), you're going to end up with a lot of examples
> > > where
> > > > >>> you
> > > > >>>> create a table in Mahout/Spark pass it to AngularJS then some
> > > > AngularJS
> > > > >>>> code charts it for you.  At that point however, you're doing
> just
> > as
> > > > >> much
> > > > >>>> work, if not more than it would be to simply pass to R or Python
> > and
> > > > >> let
> > > > >>>> ggplot or matlibplot do the work for you.
> > > > >>>>
> > > > >>>> Finally, I haven't run into any errors yet using Kyro (which in
> > part
> > > > is
> > > > >>>> what makes me fear I'm not doing this right... it was too
> easy...)
> > > If
> > > > >>>> anything seems redundant or missing, please call it out.
> > > > >>>>
> > > > >>>> Add Properties to Spark interp:
> > > > >>>>
> > > > >>>> spark.kryo.registrator
> > > > >>>> org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator
> > > > >>>> spark.serializer org.apache.spark.serializer.KryoSerializer
> > > > >>>>
> > > > >>>> Add artifacts (need to change these to maven not local, also
> need
> > to
> > > > >>>> add/change one jar per below, however this does run):
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>
> > > >
> > >
> >
> /home/trevor/.m2/repository/org/apache/mahout/mahout-math/0.12.1-SNAPSHOT/mahout-math-0.12.1-SNAPSHOT.jar
> > > > >>>>
> > > > >>
> > > >
> > >
> >
> /home/trevor/.m2/repository/org/apache/mahout/mahout-math-scala_2.10/0.12.1-SNAPSHOT/mahout-math-scala_2.10-0.12.1-SNAPSHOT.jar
> > > > >>>>
> > > > >>
> > > >
> > >
> >
> /home/trevor/.m2/repository/org/apache/mahout/mahout-spark_2.10/0.12.1-SNAPSHOT/mahout-spark_2.10-0.12.1-SNAPSHOT.jar
> > > > >>>>
> > > > >>
> > > >
> > >
> >
> /home/trevor/.m2/repository/org/apache/mahout/mahout-spark-shell_2.10/0.12.1-SNAPSHOT/mahout-spark-shell_2.10-0.12.1-SNAPSHOT.jar
> > > > >>>> Add following code to first paragraph of notebook:
> > > > >>>> ```
> > > > >>>> %spark
> > > > >>>> import org.apache.mahout.math._
> > > > >>>> import org.apache.mahout.math.scalabindings._
> > > > >>>> import org.apache.mahout.math.drm._
> > > > >>>> import org.apache.mahout.math.scalabindings.RLikeOps._
> > > > >>>> import org.apache.mahout.math.drm.RLikeDrmOps._
> > > > >>>> import org.apache.mahout.sparkbindings._
> > > > >>>>
> > > > >>>> implicit val sdc:
> > > > >>> org.apache.mahout.sparkbindings.SparkDistributedContext =
> > > > >>>> sc2sdc(sc)
> > > > >>>> ```
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> Trevor Grant
> > > > >>>> Data Scientist
> > > > >>>> https://github.com/rawkintrevo
> > > > >>>> http://stackexchange.com/users/3002022/rawkintrevo
> > > > >>>> http://trevorgrant.org
> > > > >>>>
> > > > >>>> *"Fortunate is he, who is able to know the causes of things."
> > > > -Virgil*
> > > > >>>>
> > > > >>>>
> > > > >>>> On Mon, May 16, 2016 at 6:42 PM, Pat Ferrel <
> > p...@occamsmachete.com>
> > > > >>> wrote:
> > > > >>>>> Creating an mc used to do some Kryo setup, like registering
> > > > >> serializers
> > > > >>>> or
> > > > >>>>> serializer factories IIRC. Also there is the Spark conf for
> > > > >> allocating
> > > > >>>>> memory for the Kryo buffer. Look at the code in the mc creation
> > > code
> > > > >> in
> > > > >>>> the
> > > > >>>>> Spark package helpers. All can be done in straight Spark and
> > passed
> > > > >> in
> > > > >>> to
> > > > >>>>> create the mc when needed. Again from old weak brain cells but
> I
> > > > >> think
> > > > >>>> that
> > > > >>>>> is part of what makes the Mahout shell different than teh Spark
> > > shell
> > > > >>>> plus
> > > > >>>>> imports, it auto-creates the mc instead of or along with an sc.
> > > > >>>>>
> > > > >>>>> When I get back to my computer I can check.
> > > > >>>>>
> > > > >>>>> On May 16, 2016, at 3:40 PM, Andrew Palumbo <
> ap....@outlook.com>
> > > > >>> wrote:
> > > > >>>>> Trevor,
> > > > >>>>>
> > > > >>>>> Could you post any kryo errors that you may be having?
> > > > >>>>>
> > > > >>>>> ________________________________
> > > > >>>>> From: Andrew Palumbo <ap....@outlook.com>
> > > > >>>>> Sent: Monday, May 16, 2016 6:25:07 PM
> > > > >>>>> To: mahout
> > > > >>>>> Subject: Future Mahout - Zeppelin work
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> To Dmitriy's point, I agree ggplot is def the priority,  The
> > mahout
> > > > >>> plots
> > > > >>>>> are at this point are really just a POC, but at some point we
> may
> > > be
> > > > >>> want
> > > > >>>>> to integrate some data transformation features into the mahout
> > > plots
> > > > >>>>> classes so they're really more future work.
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> long story short:
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>> OK. I'll read through the examples and try to do something
> with
> > > some
> > > > >>>>> data, then do a ggplot and/or an angular plot on it (probably
> > > > >> ggplot).
> > > > >>>>>> I'll do a quick tutorial. Then I'll reopen discussion on that
> > > > >> Zeppelin
> > > > >>>>> issue about weather we want to go ahead and add another
> > > interpreter.
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> Souds Great.
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> Thank you.
> > > > >>>>>
> > > > >>>>> ________________________________
> > > > >>>>> From: Trevor Grant <trevor.d.gr...@gmail.com>
> > > > >>>>> Sent: Monday, May 16, 2016 5:49:17 PM
> > > > >>>>> To: Dmitriy Lyubimov
> > > > >>>>> Cc: Andrew Palumbo; Pat Ferrel; Suneel Marthi
> > > > >>>>> Subject: Re: Intro - Future Mahout - Zeppelin work
> > > > >>>>>
> > > > >>>>> I just signed up for dev, should i just reply all and cc dev or
> > > > >> start a
> > > > >>>>> new thread?
> > > > >>>>>
> > > > >>>>> Trevor Grant
> > > > >>>>> Data Scientist
> > > > >>>>> https://github.com/rawkintrevo
> > > > >>>>> [https://avatars3.githubusercontent.com/u/5852441?v=3&s=400]<
> > > > >>>>> https://github.com/rawkintrevo>
> > > > >>>>>
> > > > >>>>> rawkintrevo (Trevor Grant) · GitHub<
> > https://github.com/rawkintrevo
> > > >
> > > > >>>>> github.com
> > > > >>>>> rawkintrevo has 12 repositories written in Python, Batchfile,
> and
> > > R.
> > > > >>>>> Follow their code on GitHub.
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> http://stackexchange.com/users/3002022/rawkintrevo
> > > > >>>>> http://trevorgrant.org
> > > > >>>>>
> > > > >>>>> "Fortunate is he, who is able to know the causes of things."
> > > -Virgil
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> On Mon, May 16, 2016 at 4:46 PM, Dmitriy Lyubimov <
> > > dlie...@gmail.com
> > > > >>>>> <mailto:dlie...@gmail.com>> wrote:
> > > > >>>>> fwiw ggplot2 is pretty darn advanced:) i am a bit skeptical
> smile
> > > > >> would
> > > > >>>>> have something that ggplot2 would not, the other way around is
> > much
> > > > >>> more
> > > > >>>>> expected by me:)
> > > > >>>>>
> > > > >>>>> anyhow if ggplot2 and matplotlib are available in Zeppelin
> > without
> > > > >>> major
> > > > >>>>> limitations, it sounds like Zeppelin should be an all around
> very
> > > > >> nice
> > > > >>>>> venue then.
> > > > >>>>>
> > > > >>>>> On Mon, May 16, 2016 at 2:42 PM, Andrew Palumbo <
> > > ap....@outlook.com
> > > > >>>>> <mailto:ap....@outlook.com>> wrote:
> > > > >>>>>
> > > > >>>>> yeah we should probably move this over to dev@
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> sorry- answering a question from a couple emails back on the
> > > thread.
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> If possible,  I think it would be great to eventually have both
> > > > >> (native
> > > > >>>>> mahout/smile plots and ggplot), since in the future we're going
> > to
> > > be
> > > > >>>>> adding more visualization features rather than simple scatter
> > plots
> > > > >> etc
> > > > >>>>> that may not be covered by ggplot.
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> That's why we were thinking about using angular and the pngs.
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> But what youre saying in your last email would be great!
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> Thank you!
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> ________________________________
> > > > >>>>> From: Trevor Grant <trevor.d.gr...@gmail.com<mailto:
> > > > >>>>> trevor.d.gr...@gmail.com>>
> > > > >>>>> Sent: Monday, May 16, 2016 5:33:12 PM
> > > > >>>>> To: Andrew Palumbo
> > > > >>>>> Cc: Pat Ferrel; Suneel Marthi; Dmitriy Lyubimov
> > > > >>>>>
> > > > >>>>> Subject: Re: Intro - Future Mahout - Zeppelin work
> > > > >>>>>
> > > > >>>>> I somehow replied to your last email without seeing it...
> > > > >>>>>
> > > > >>>>> OK. I'll read through the examples and try to do something with
> > > some
> > > > >>>> data,
> > > > >>>>> then do a ggplot and/or an angular plot on it (probably
> ggplot).
> > > > >>>>>
> > > > >>>>> I'll do a quick tutorial. Then I'll reopen discussion on that
> > > > >> Zeppelin
> > > > >>>>> issue about weather we want to go ahead and add another
> > > interpreter.
> > > > >>>>>
> > > > >>>>> Trevor Grant
> > > > >>>>> Data Scientist
> > > > >>>>> https://github.com/rawkintrevo
> > > > >>>>> http://stackexchange.com/users/3002022/rawkintrevo
> > > > >>>>> http://trevorgrant.org
> > > > >>>>>
> > > > >>>>> "Fortunate is he, who is able to know the causes of things."
> > > -Virgil
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> On Mon, May 16, 2016 at 4:26 PM, Trevor Grant <
> > > > >>> trevor.d.gr...@gmail.com
> > > > >>>>> <mailto:trevor.d.gr...@gmail.com>> wrote:
> > > > >>>>> sorry for double email but are you thinking visualization
> should
> > > be a
> > > > >>>>> library internal to mahout or should we leverage zeppelins
> > > > >>> visualization
> > > > >>>>> capabilities?
> > > > >>>>>
> > > > >>>>> Also, should we move this discussion to dev?
> > > > >>>>>
> > > > >>>>> tg
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> Trevor Grant
> > > > >>>>> Data Scientist
> > > > >>>>> https://github.com/rawkintrevo
> > > > >>>>> http://stackexchange.com/users/3002022/rawkintrevo
> > > > >>>>> http://trevorgrant.org
> > > > >>>>>
> > > > >>>>> "Fortunate is he, who is able to know the causes of things."
> > > -Virgil
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> On Mon, May 16, 2016 at 4:14 PM, Andrew Palumbo <
> > > ap....@outlook.com
> > > > >>>>> <mailto:ap....@outlook.com>> wrote:
> > > > >>>>>
> > > > >>>>> Sorry- to be a little more clear,  Part of what we're trying to
> > is
> > > to
> > > > >>> get
> > > > >>>>> the new plotting features integrated with Zeppelin. We plan on
> > > adding
> > > > >>>> more
> > > > >>>>> advanced plotting.
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> ________________________________
> > > > >>>>> From: Andrew Palumbo <ap....@outlook.com<mailto:
> > ap....@outlook.com
> > > >>
> > > > >>>>> Sent: Monday, May 16, 2016 5:04:49 PM
> > > > >>>>> To: Pat Ferrel; Trevor Grant
> > > > >>>>> Cc: Suneel Marthi; Dmitriy Lyubimov
> > > > >>>>> Subject: Re: Intro - Future Mahout - Zeppelin work
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> Awesome!
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> most of the hard work was done by Dmitriy[??] , I've just
> > reworked
> > > > >> it a
> > > > >>>>> couple of times to keep up with spark's refactoring.
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> I think that you will also need to include:
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>   mahout-spark_2.10-0.12.1-SNAPSHOT-dependency-reduced.jar
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> For the new plotting features that we're working on.
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> the plotting is still a work in progress, and the grid and
> > surface
> > > > >>> plots
> > > > >>>>> are not working properly.  The plots are swing based and can
> > > > >> currently
> > > > >>> be
> > > > >>>>> exported as  PNGs.  There are a few examples on the closed PR:
> > > > >>>>> https://github.com/apache/mahout/pull/230
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> There is an example script in
> > examples/bin/spark-shell-plot.mscala
> > > > >>>>> (commited to master) :
> > > > >>>>>
> > > > >>
> > > >
> > >
> >
> https://github.com/apache/mahout/blob/master/examples/bin/spark-shell-plot.mscala
> > > > >>>>>
> > > > >>>>> Thanks!
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> ________________________________
> > > > >>>>> From: Pat Ferrel <p...@occamsmachete.com<mailto:
> > > p...@occamsmachete.com
> > > > >>>>> Sent: Monday, May 16, 2016 4:54:15 PM
> > > > >>>>> To: Trevor Grant
> > > > >>>>> Cc: Andrew Palumbo; Suneel Marthi; Dmitriy Lyubimov
> > > > >>>>> Subject: Re: Intro - Future Mahout - Zeppelin work
> > > > >>>>>
> > > > >>>>> This is only the beginning. Andy has been using Smile as a
> > > > >>> visualization
> > > > >>>>> lib since it is pretty rich in ML support. We are looking at
> > > > >>> integrating
> > > > >>>>> some of that with Zeppelin then adding code to feed the new
> > > > >>>> visualizations
> > > > >>>>> in Mahout. I’m here because I’m fairly familiar with AngularJS
> if
> > > > >>> that’s
> > > > >>>>> the way to go. Smile is swing based but can output pngs, maybe
> > > other
> > > > >>>> image
> > > > >>>>> formats—Andy?
> > > > >>>>>
> > > > >>>>> BTW Dmitriy is still very involved but has rouble getting
> > > permission
> > > > >> to
> > > > >>>>> donate code.
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> On May 16, 2016, at 1:45 PM, Trevor Grant <
> > > trevor.d.gr...@gmail.com
> > > > >>>>> <mailto:trevor.d.gr...@gmail.com>> wrote:
> > > > >>>>>
> > > > >>>>> Hey Andrew,
> > > > >>>>>
> > > > >>>>> thanks- you basically did all of the hard work for me!
> > > > >>>>>
> > > > >>>>> I've got the linear regression example working from:
> > > > >>>>>
> > http://mahout.apache.org/users/sparkbindings/play-with-shell.html
> > > > >>>>>
> > > > >>>>> my java is sketchy at best, i tend to over import. I pulled in
> > the
> > > > >>>>> following jars:
> > > > >>>>>
> > > > >>>>>
> > > > >>
> > > >
> > >
> >
> org/apache/mahout/mahout-math/0.12.1-SNAPSHOT/mahout-math-0.12.1-SNAPSHOT.jar
> > > > >>>>>
> > > > >>
> > > >
> > >
> >
> org/apache/mahout/mahout-math-scala_2.10/0.12.1-SNAPSHOT/mahout-math-scala_2.10-0.12.1-SNAPSHOT.jar
> > > > >>>>>
> > > > >>
> > > >
> > >
> >
> org/apache/mahout/mahout-spark_2.10/0.12.1-SNAPSHOT/mahout-spark_2.10-0.12.1-SNAPSHOT.jar
> > > > >>>>>
> > > > >>
> > > >
> > >
> >
> org/apache/mahout/mahout-spark-shell_2.10/0.12.1-SNAPSHOT/mahout-spark-shell_2.10-0.12.1-SNAPSHOT.jar
> > > > >>>>> I think those are all necessary...  should I be pulling in
> more?
> > > > >>>>>
> > > > >>>>> I hate to say it (but will do so bc this isn't public) this
> > > > >> integration
> > > > >>>> is
> > > > >>>>> super easy from a user perspective, almost too easy- eg why not
> > let
> > > > >> the
> > > > >>>>> user add it themselves...  Add the appropriate maven artifacts,
> > > > >> restart
> > > > >>>> the
> > > > >>>>> interpreter and run the following in a notebook:
> > > > >>>>> ```
> > > > >>>>> import org.apache.mahout.math._
> > > > >>>>> import org.apache.mahout.math.scalabindings._
> > > > >>>>> import org.apache.mahout.math.drm._
> > > > >>>>> import org.apache.mahout.math.scalabindings.RLikeOps._
> > > > >>>>> import org.apache.mahout.math.drm.RLikeDrmOps._
> > > > >>>>> import org.apache.mahout.sparkbindings._
> > > > >>>>>
> > > > >>>>> implicit val sdc:
> > > > >>> org.apache.mahout.sparkbindings.SparkDistributedContext
> > > > >>>>> = sc2sdc(sc)
> > > > >>>>> ```
> > > > >>>>> Then whatever code you want and you're off to the races...
> > > > >>>>>
> > > > >>>>> that said, adding a build profile like -PsparkMahout and
> creating
> > > an
> > > > >>>>> interpretter like %spark.mahout should be fairly straight
> > forward.
> > > > >>>>>
> > > > >>>>> Second question, do you have an example that would be more
> > > > >>> 'visualization
> > > > >>>>> friendly'? I could pass the results to Angular or R just to
> show
> > > off
> > > > >>> how
> > > > >>>> to
> > > > >>>>> do it.
> > > > >>>>>
> > > > >>>>> Which leads back to the question, is this even worth building a
> > > full
> > > > >>>>> interpreter for or just make a really nice blog post with
> > examples
> > > on
> > > > >>> how
> > > > >>>>> to integrate with R...?
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> Trevor Grant
> > > > >>>>> Data Scientist
> > > > >>>>> https://github.com/rawkintrevo
> > > > >>>>> http://stackexchange.com/users/3002022/rawkintrevo
> > > > >>>>> http://trevorgrant.org<http://trevorgrant.org/>
> > > > >>>>>
> > > > >>>>> "Fortunate is he, who is able to know the causes of things."
> > > -Virgil
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> On Mon, May 16, 2016 at 2:09 PM, Andrew Palumbo <
> > > ap....@outlook.com
> > > > >>>>> <mailto:ap....@outlook.com>> wrote:
> > > > >>>>> Hi Trevor, welcome!
> > > > >>>>>
> > > > >>>>> It's great to have you helping out, thanks very much.  I've
> done
> > a
> > > > >> good
> > > > >>>>> amount of work on our mahout spark shell .. so let me know if
> you
> > > > >> have
> > > > >>>> any
> > > > >>>>> questions there about what we did there..
> > > > >>>>>
> > > > >>>>> Thanks alot!
> > > > >>>>>
> > > > >>>>> Andy
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> -------- Original message --------
> > > > >>>>> From: Suneel Marthi <smar...@apache.org<mailto:
> > smar...@apache.org
> > > >>
> > > > >>>>> Date: 05/16/2016 2:44 PM (GMT-05:00)
> > > > >>>>> To: Trevor Grant <trevor.d.gr...@gmail.com<mailto:
> > > > >>>> trevor.d.gr...@gmail.com
> > > > >>>>> Cc: Suneel Marthi <smar...@apache.org<mailto:
> smar...@apache.org
> > >>,
> > > > >> Pat
> > > > >>>>> Ferrel <p...@occamsmachete.com<mailto:p...@occamsmachete.com>>,
> > > Andrew
> > > > >>>>> Palumbo <ap....@outlook.com<mailto:ap....@outlook.com>>
> > > > >>>>> Subject: Re: Intro - Future Mahout - Zeppelin work
> > > > >>>>>
> > > > >>>>> Oh yes, he's around. I see him online.
> > > > >>>>>
> > > > >>>>> On Mon, May 16, 2016 at 2:42 PM, Trevor Grant <
> > > > >>> trevor.d.gr...@gmail.com
> > > > >>>>> <mailto:trevor.d.gr...@gmail.com>> wrote:
> > > > >>>>> Is Dmitriy Lyubimov still around?
> > > > >>>>>
> > > > >>>>> Looks like he created this issue for Zeppelin a while ago. (The
> > old
> > > > >>> lost
> > > > >>>>> code to which you were referring?)
> > > > >>>>>
> > > > >>>>> https://issues.apache.org/jira/browse/ZEPPELIN-116
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> tg
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> Trevor Grant
> > > > >>>>> Data Scientist
> > > > >>>>> https://github.com/rawkintrevo
> > > > >>>>> http://stackexchange.com/users/3002022/rawkintrevo
> > > > >>>>> http://trevorgrant.org<http://trevorgrant.org/>
> > > > >>>>>
> > > > >>>>> "Fortunate is he, who is able to know the causes of things."
> > > -Virgil
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> On Mon, May 16, 2016 at 1:37 PM, Suneel Marthi <
> > smar...@apache.org
> > > > >>>> <mailto:
> > > > >>>>> smar...@apache.org>> wrote:
> > > > >>>>> Welcome to the party TG !!
> > > > >>>>>
> > > > >>>>> On Mon, May 16, 2016 at 2:28 PM, Trevor Grant <
> > > > >>> trevor.d.gr...@gmail.com
> > > > >>>>> <mailto:trevor.d.gr...@gmail.com>> wrote:
> > > > >>>>> Hey all,
> > > > >>>>>
> > > > >>>>> I'm excited for a chance to help out.  I'm actually getting
> ready
> > > to
> > > > >>>>> download now and start playing around.
> > > > >>>>>
> > > > >>>>> I had talked about this briefly but it given a properly
> > functioning
> > > > >>>>> Zeppelin interpreter for Apache Mahout, one could leverage all
> of
> > > the
> > > > >>>>> Zeppelin visualizations, anything in AngularJS, or anything in
> R
> > > > >>> (through
> > > > >>>>> clever use of Zeppelin's Resource Pools).
> > > > >>>>>
> > > > >>>>> I'll work on getting logged in to the slack channel as well.
> > > > >>>>>
> > > > >>>>> Nice to meet you all, looking forward to helping out!
> > > > >>>>>
> > > > >>>>> tg
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> Trevor Grant
> > > > >>>>> Data Scientist
> > > > >>>>> https://github.com/rawkintrevo
> > > > >>>>> http://stackexchange.com/users/3002022/rawkintrevo
> > > > >>>>> http://trevorgrant.org<http://trevorgrant.org/>
> > > > >>>>>
> > > > >>>>> "Fortunate is he, who is able to know the causes of things."
> > > -Virgil
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> On Sun, May 15, 2016 at 12:56 PM, Suneel Marthi <
> > > smar...@apache.org
> > > > >>>>> <mailto:smar...@apache.org>> wrote:
> > > > >>>>> FYi...
> > > > >>>>> Trevor was there for my talk, so he has some idea of Mahout
> > > Samsara.
> > > > >>>>>
> > > > >>>>> On Sun, May 15, 2016 at 1:51 PM, Pat Ferrel <
> > p...@occamsmachete.com
> > > > >>>> <mailto:
> > > > >>>>> p...@occamsmachete.com>> wrote:
> > > > >>>>> Hey Trevor,
> > > > >>>>>
> > > > >>>>> Good to meet you. As you probably know Mahout-Samsara is a
> > > > >>> reincarnation
> > > > >>>>> of the project in a new body, which is less a collection of
> > > > >> algorithms
> > > > >>>> than
> > > > >>>>> a roll-your-own math/algorithm tool. The major benefit is that
> > > during
> > > > >>>>> experimentation and later in production the code is by nature
> > > > >> scalable
> > > > >>> on
> > > > >>>>> Spark and Flink. Most of the Mahout DSL is R-like and supports
> > > tensor
> > > > >>>> math
> > > > >>>>> but we are now looking at streaming online algo support too.
> > > > >>>>>
> > > > >>>>> In any case you probably know we have a Mahout version of the
> > Spark
> > > > >>>> Shell,
> > > > >>>>> which has been integrated with an old version of Zeppelin (code
> > is
> > > > >>> lost).
> > > > >>>>> Recently Andy has experimented with some very nice
> visualizations
> > > of
> > > > >> ML
> > > > >>>>> data (not just analytics data). We as a project are interested
> in
> > > > >>>> Zeppelin
> > > > >>>>> integration of our shell and graphics. From what I understand
> the
> > > > >>>> graphics
> > > > >>>>> extension mechanism of Zeppelin is based on AngularJS, which I
> > have
> > > > >>> some
> > > > >>>>> experience with.
> > > > >>>>>
> > > > >>>>> So, we’d like to start the conversation about how to proceed.
> We
> > > > >> would
> > > > >>>>> love some help but will move ahead in any case.
> > > > >>>>>
> > > > >>>>> Pat
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> On May 15, 2016, at 9:52 AM, Suneel Marthi <smar...@apache.org
> > > > >> <mailto:
> > > > >>>>> smar...@apache.org>> wrote:
> > > > >>>>>
> > > > >>>>> Hi Trevor,
> > > > >>>>>
> > > > >>>>> Nice meeting u last week in Vancouver.  Per our conversation, I
> > > > >> wanted
> > > > >>> to
> > > > >>>>> introduce u to Andrew Palumbo (Mahout Chair) and Pat Ferrel
> > (Mahout
> > > > >>> PMC).
> > > > >>>>> As I mentioned in my talk, we are actively looking at Zeppelin
> > > > >>>> integration
> > > > >>>>> with Mahout (primarily for spark) and would appreciate your
> help
> > > (as
> > > > >>> also
> > > > >>>>> all things DL and ML).
> > > > >>>>>
> > > > >>>>> We definitely can use all your help as we r revamping the
> Mahout
> > > > >>> project
> > > > >>>>> and shedding its legacy MapReduce image.
> > > > >>>>>
> > > > >>>>> I sent u an invite to the Mahout slack channel,
> > mahout.apache.org<
> > > > >>>>> http://mahout.apache.org/> - that's where we all hangout and
> not
> > > > >>> having
> > > > >>>>> to worry about avoiding naughty words.
> > > > >>>>>
> > > > >>>>> Looking forward to working with you
> > > > >>>>>
> > > > >>>>> Suneel
> > > > >>>>>
> > > > >>>>>
> > > > >>>>
> > > >
> > > >
> > > >
> > >
> >
>

Reply via email to