On Fri, May 20, 2016 at 3:18 PM, Trevor Grant <[email protected]>
wrote:

> Hey Pat,
>
> If you spit out a TSV - you can import into pyspark / matplotlib from the
> resource pool in essentially the same way and use that plotting library if
> you prefer.  In fact you could import the tsv into pandas and use all of
> the pandas plotting as well (though I think it is for the most part, also
> matplotlib with some convenience functions).
>
>
> https://www.zeppelinhub.com/viewer/notebooks/aHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2ZlbGl4Y2hldW5nL3NwYXJrLW5vdGVib29rLWV4YW1wbGVzL21hc3Rlci9aZXBwZWxpbl9ub3RlYm9vay8yQU1YNUNWQ1Uvbm90ZS5qc29u
>
> In Zeppelin, unless you specify otherwise, pyspark, sparkr, spark-sql, and
> scala-spark all share the same spark context you can create RDDs in one
> language and access them / work on them in another (so I understand).
>
> So in Mahout can you "save" a matrix as a RDD? e.g. something like
>
> val myRDD = myDRM.asRDD()
>

val myRDD = myDRM.rdd()

>
> And would 'myRDD' then exist in the spark context?
>
> yes it will be in sparkContext

>
> Trevor Grant
> Data Scientist
> https://github.com/rawkintrevo
> http://stackexchange.com/users/3002022/rawkintrevo
> http://trevorgrant.org
>
> *"Fortunate is he, who is able to know the causes of things."  -Virgil*
>
>
> On Fri, May 20, 2016 at 12:21 PM, Pat Ferrel <[email protected]>
> wrote:
>
> > Agreed.
> >
> > BTW I don’t want to stall progress but being the most ignorant of plot
> > libs, I’ll ask if we should consider python and matplotlib. In another
> > project we use python because of the RDD support on Spark though the
> > visualizations are extremely limited in our case. If we can pass an RDD
> to
> > pyspark it would allow custom reductions in python before plotting, even
> > though we will support many natively in Mahout. I’m guessing that this
> > would cross a context boundary and require a write to disk?
> >
> > So 2 questions:
> > 1) what does the inter language support look like with Spark python vs
> > SparkR, can we transfer RDDs?
> > 2) are the plot libs significantly different?
> >
> > On May 20, 2016, at 9:54 AM, Trevor Grant <[email protected]>
> > wrote:
> >
> > Dmitriy really nailed it on the head in his reply to the post which I'll
> > rebroadcast below. In essence the whole reason you are (theoretically)
> > using Mahout is the data is to big to fit in memory.  If it's to big to
> fit
> > in memory, well then its probably too big to plot each point (e.g.
> > trillions of row, you only have so many pixels).   For the example I
> > randomly sampled a matrix.
> >
> > So as Dmitriy says, in Mahout we need to have functions that will
> > 'preprocess' the data into something plotable.
> >
> > For the Zepplin-Plotting thing, we need to have a function that will spit
> > out a tsv like string of the data we wanted plotted.
> >
> > I agree an honest Mahout interpreter in Zeppelin is probably worth doing.
> > There are a couple of ways to go about it. I opened up the discussion on
> > dev@Zeppelin and didn't get any replies. I'm going to take that to mean
> we
> > can do it in a way that makes the most sense to Mahout users...
> >
> > First steps are to include some methods in Mahout that will do that
> > preprocessing, and one that will turn something into a tsv string.
> >
> > I have some general ideas on possible approached to making an
> honest-mahout
> > interpreter but I want to play in the code and look at the Flink-Mahout
> > shell a bit before I try to organize my thoughts and present them.
> >
> > ...(2) not sure what is the point of supporting distributed anything. It
> is
> > distributed presumably because it is hard to keep it in memory.
> Therefore,
> > plotting anything distributed potentially presents 2 problems: storage
> > space and overplotting due to number of points. The idea is that we have
> to
> > work out algorithms that condense big data information into small
> plottable
> > information (like density grids, for example, or histograms)....
> >
> > Trevor Grant
> > Data Scientist
> > https://github.com/rawkintrevo
> > http://stackexchange.com/users/3002022/rawkintrevo
> > http://trevorgrant.org
> >
> > *"Fortunate is he, who is able to know the causes of things."  -Virgil*
> >
> >
> > On Fri, May 20, 2016 at 10:22 AM, Pat Ferrel <[email protected]>
> > wrote:
> >
> > > Great job Trevor, we’ll need this detail to smooth out the sharp edges
> > and
> > > any guidance from you or the Zeppelin community will be a big help.
> > >
> > >
> > > On May 20, 2016, at 8:13 AM, Shannon Quinn <[email protected]> wrote:
> > >
> > > Agreed, thoroughly enjoying the blog post.
> > >
> > > On 5/19/16 12:01 AM, Andrew Palumbo wrote:
> > >> Well done, Trevor!  I've not yet had a chance to try this in zeppelin
> > > but I just read the blog which is great!
> > >>
> > >> -------- Original message --------
> > >> From: Trevor Grant <[email protected]>
> > >> Date: 05/18/2016 2:44 PM (GMT-05:00)
> > >> To: [email protected]
> > >> Subject: Re: Future Mahout - Zeppelin work
> > >>
> > >> Ah thank you.
> > >>
> > >> Fixing now.
> > >>
> > >>
> > >> Trevor Grant
> > >> Data Scientist
> > >> https://github.com/rawkintrevo
> > >> http://stackexchange.com/users/3002022/rawkintrevo
> > >> http://trevorgrant.org
> > >>
> > >> *"Fortunate is he, who is able to know the causes of things."
> -Virgil*
> > >>
> > >>
> > >> On Wed, May 18, 2016 at 1:04 PM, Andrew Palumbo <[email protected]>
> > > wrote:
> > >>
> > >>> Hey Trevor- Just refreshed your readme.  The jar that I mentioned is
> > >>> actually:
> > >>>
> > >>>
> > >>>
> > >
> >
> /home/username/.m2/repository/org/apache/mahout/mahout-spark_2.10/0.12.1-SNAPSHOT/mahout-spark_2.10-0.12.1-SNAPSHOT-dependency-reduced.jar
> > >>>
> > >>> rather than:
> > >>>
> > >>>
> > >>>
> > >
> >
> /home/username/.m2/repository/org/apache/mahout/mahout-spark-shell_2.10/0.12.1-SNAPSHOT/mahout-spark_2.10-0.12.1-SNAPSHOT-dependency-reduced.jar
> > >>>
> > >>> (In the spark module that is)
> > >>> ________________________________________
> > >>> From: Trevor Grant <[email protected]>
> > >>> Sent: Wednesday, May 18, 2016 11:02:43 AM
> > >>> To: [email protected]
> > >>> Subject: Re: Future Mahout - Zeppelin work
> > >>>
> > >>> ah yes- I remember you pointing that out to me too.
> > >>>
> > >>> I got side tracked yesterday for most of the day on an adventure in
> > > getting
> > >>> Zeppelin to work right after I accidently updated to the new snapshot
> > > (free
> > >>> hint: the secret was to clear my cache *face-palm*)
> > >>>
> > >>> I'm going to add that dependency to the readme.md now.
> > >>>
> > >>> thanks,
> > >>> tg
> > >>>
> > >>> Trevor Grant
> > >>> Data Scientist
> > >>> https://github.com/rawkintrevo
> > >>> http://stackexchange.com/users/3002022/rawkintrevo
> > >>> http://trevorgrant.org
> > >>>
> > >>> *"Fortunate is he, who is able to know the causes of things."
> -Virgil*
> > >>>
> > >>>
> > >>> On Wed, May 18, 2016 at 9:59 AM, Andrew Palumbo <[email protected]>
> > >>> wrote:
> > >>>
> > >>>> Trevor this is very cool- I have not been able to look at it closely
> > > yet
> > >>>> but just a small point: I believe that you'll also need to add the
> > >>>>
> > >>>> mahout-spark_2.10-0.12.1-SNAPSHOT-dependency-reduced.jar
> > >>>>
> > >>>> For things like the classification stats, confusion matrix, and
> > > t-digest.
> > >>>>
> > >>>> Andy
> > >>>>
> > >>>> ________________________________________
> > >>>> From: Trevor Grant <[email protected]>
> > >>>> Sent: Wednesday, May 18, 2016 10:47:21 AM
> > >>>> To: [email protected]
> > >>>> Subject: Re: Future Mahout - Zeppelin work
> > >>>>
> > >>>> I still need to update my readme/env per Pat's comments below,
> however
> > >>> with
> > >>>> out further ado, I present two notebooks that integrate Mahout +
> Spark
> > > +
> > >>>> Zeppelin + ggplot2
> > >>>>
> > >>>> https://github.com/rawkintrevo/mahout-zeppelin
> > >>>>
> > >>>> Supposing you have a somewhat recent version of Zeppelin 0.6 with
> > > sparkr
> > >>>> support running already, you may import the following raw notes
> > > directly
> > >>>> into Zeppelin:
> > >>>>
> > >>>>
> > >>>>
> > >>>
> > >
> >
> https://raw.githubusercontent.com/rawkintrevo/mahout-zeppelin/master/%5BMAHOUT%5D%5BPROVING-GROUNDS%5DLinear%20Regression%20in%20Spark.json
> > >>>>
> > >>>>
> > >>>
> > >
> >
> https://raw.githubusercontent.com/rawkintrevo/mahout-zeppelin/master/%5BMAHOUT%5D%5BPROVING-GROUNDS%5DSpark-Mahout%2Bggplot2.json
> > >>>> So my thoughs on next steps, which I'm positing only as a starting
> > > point
> > >>>> for discussion, and are in no particular order of importance:
> > >>>>
> > >>>> - Blog on HOWTO for everyman (assumes no familiarity with Mahout,
> and
> > >>> only
> > >>>> enough familiarity with Zeppelin to have Zeppelin + SparkR support)
> > >>>> - Some syntactic sugar somewhere in Mahout to convert a matrix into
> a
> > > tsv
> > >>>> string. (with some sanity, eg a sample of a matrix)
> > >>>> - Figure out with Zeppelin community what deeper integration feels
> > > like -
> > >>>> e.g. build-profile vs. tutorial
> > >>>>  - I think the case for making a build-profile is that Zeppelin is
> > > first
> > >>>> and foremost a datascience tool for non technical users.
> > >>>>  - If we go that route I'll need some more support finding out what
> is
> > >>> the
> > >>>> absolute minimum 'bare-bones' mahout we can include, e.g. does the
> > user
> > >>>> have to have mahout installed? To be discussed.
> > >>>> - Add matplotlib (python) "support" -> paragraph showing how to do
> the
> > >>> same
> > >>>> thing in Python.
> > >>>>
> > >>>> The basic deal here is we are:
> > >>>> 1) Setting up a standard Zeppelin Spark Interpretter to act like a
> > > Mahout
> > >>>> interpretter
> > >>>>    - This is taken care of by setting some env. variables, adding
> some
> > >>>> dependencies, and importing relevent packages
> > >>>> 2) do mahout things as you do
> > >>>> 3) export table to tsv string, which is passed to a resource pool
> > >>>>   - This could be done to a disk if you didn't have zeppelin
> > >>>> 4) read the tsv from the resource pool (or disk if you didn't have
> > >>>> zeppelin) in R (python soon) and create a <plot package of your
> > choice>
> > >>>>
> > >>>> To Pat's point- this is a kind of clumsy pipeline, however the
> > Zeppelin
> > >>>> wrapper at least makes it *feel* less so.
> > >>>>
> > >>>>
> > >>>> Trevor Grant
> > >>>> Data Scientist
> > >>>> https://github.com/rawkintrevo
> > >>>> http://stackexchange.com/users/3002022/rawkintrevo
> > >>>> http://trevorgrant.org
> > >>>>
> > >>>> *"Fortunate is he, who is able to know the causes of things."
> > -Virgil*
> > >>>>
> > >>>>
> > >>>> On Tue, May 17, 2016 at 1:17 PM, Pat Ferrel <[email protected]>
> > >>> wrote:
> > >>>>> Seems like there is plenty to use in ggplot or python but the
> > pipeline
> > >>> is
> > >>>>> a little convoluted (so maybe no need for Angular integration). To
> > get
> > >>>>> graphics out of Mahout it would be nice to not require knowledge
> of R
> > >>>>> and/or python. Knowing Mahout is already bad enough but I guess the
> > > API
> > >>>>> from the Mahout side for plotting could be Scala syntactic sugar.
> > What
> > >>>> and
> > >>>>> how this all is installed and setup is the next question.
> > >>>>>
> > >>>>> BTW this is what I use elsewhere (Mahout as a lib to this code)
> > >>>>>
> > >>>>>    "spark.serializer":
> "org.apache.spark.serializer.KryoSerializer",
> > >>>>>    "spark.kryo.registrator":
> > >>>>> "org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator",
> > >>>>>    "spark.kryo.referenceTracking": "false",
> > >>>>>    "spark.kryoserializer.buffer": "300m”,
> > >>>>>
> > >>>>> afaik you will only see if Kryo is working when you have to
> serialize
> > > a
> > >>>>> mahout specific data type like vector of drm, something registered
> > > with
> > >>>>> Kryo.
> > >>>>>
> > >>>>>
> > >>>>> On May 16, 2016, at 6:18 PM, Trevor Grant <
> [email protected]>
> > >>>>> wrote:
> > >>>>>
> > >>>>> As a quick recap- we're trying to leverage Zeppelin for charting.
> > >>>>>
> > >>>>> It seems as though this can be achieved by
> > >>>>> - Adding properties to the Spark Interpreter
> > >>>>> - Adding dependency jars to the spark interpreter
> > >>>>> - importing in a spark paragraph
> > >>>>>
> > >>>>> All seems to be working well, but I've fooled myself into thinking
> > >>> things
> > >>>>> were 'working' before because I wasn't actually integrating. Lower
> I
> > >>> will
> > >>>>> outline the imports/properties, please look over and tell me if I'm
> > >>>>> theoretically missing anything.
> > >>>>>
> > >>>>> The next phase for me will be
> > >>>>> 1) Convert a matrix to some sort of serializable object that I can
> > >>> easily
> > >>>>> unpack from R
> > >>>>> 2) use Zeppelin's resource buffers to pass the object
> > >>>>> 3) collect the object in an R paragraph, convert it to a dataframe
> > > then
> > >>>> map
> > >>>>> using ggplot
> > >>>>>
> > >>>>> Once I have a working prototype I will work add some syntactic
> sugar
> > > to
> > >>>>> prepare the matrix from the scala side and pass to zeppelin (using
> > >>>> resource
> > >>>>> pools so the same functionality can be reused in Flink) and an R
> > >>> library
> > >>>>> containing some functions which will pull the data out of the
> > resource
> > >>>> pool
> > >>>>> and spit out a dataframe.
> > >>>>>
> > >>>>> Once its in a Dataframe in R- go nuts with any plotting package you
> > >>> like.
> > >>>>> Likewise, it should be possible to do the same thing with
> matplotlib
> > >>> and
> > >>>>> python (https://gist.github.com/andershammar/9070e0f6916a0fbda7a5)
> > >>>>>
> > >>>>> All of this doesn't necessarily require any changing of the
> Zeppelin
> > >>>> source
> > >>>>> code, and isn't very intrusive or difficult to set up, I'll make a
> > > blog
> > >>>>> post but its almost a text book entry tutorial on using imports in
> > >>>>> Zeppelin. (e.g. a tutorial would be just as at home on the Zeppelin
> > >>> site
> > >>>> as
> > >>>>> it would on the Mahout site).
> > >>>>>
> > >>>>> Now, there has been some talk of using Zeppelin's angularJS.
> Things
> > >>> get
> > >>>> a
> > >>>>> little more harry in that case, but we could make an optional build
> > >>>> profile
> > >>>>> that would make zeppelin recognize matrices at tables and expose
> all
> > > of
> > >>>> the
> > >>>>> built in charting features of Zeppelin.
> > >>>>>
> > >>>>> If you're not adding a bunch of custom charts to Zeppelin (which
> > would
> > >>> be
> > >>>>> somewhat tedious), you're going to end up with a lot of examples
> > where
> > >>>> you
> > >>>>> create a table in Mahout/Spark pass it to AngularJS then some
> > > AngularJS
> > >>>>> code charts it for you.  At that point however, you're doing just
> as
> > >>> much
> > >>>>> work, if not more than it would be to simply pass to R or Python
> and
> > >>> let
> > >>>>> ggplot or matlibplot do the work for you.
> > >>>>>
> > >>>>> Finally, I haven't run into any errors yet using Kyro (which in
> part
> > > is
> > >>>>> what makes me fear I'm not doing this right... it was too easy...)
> If
> > >>>>> anything seems redundant or missing, please call it out.
> > >>>>>
> > >>>>> Add Properties to Spark interp:
> > >>>>>
> > >>>>> spark.kryo.registrator
> > >>>>> org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator
> > >>>>> spark.serializer org.apache.spark.serializer.KryoSerializer
> > >>>>>
> > >>>>> Add artifacts (need to change these to maven not local, also need
> to
> > >>>>> add/change one jar per below, however this does run):
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>
> > >
> >
> /home/trevor/.m2/repository/org/apache/mahout/mahout-math/0.12.1-SNAPSHOT/mahout-math-0.12.1-SNAPSHOT.jar
> > >>>>>
> > >>>
> > >
> >
> /home/trevor/.m2/repository/org/apache/mahout/mahout-math-scala_2.10/0.12.1-SNAPSHOT/mahout-math-scala_2.10-0.12.1-SNAPSHOT.jar
> > >>>>>
> > >>>
> > >
> >
> /home/trevor/.m2/repository/org/apache/mahout/mahout-spark_2.10/0.12.1-SNAPSHOT/mahout-spark_2.10-0.12.1-SNAPSHOT.jar
> > >>>>>
> > >>>
> > >
> >
> /home/trevor/.m2/repository/org/apache/mahout/mahout-spark-shell_2.10/0.12.1-SNAPSHOT/mahout-spark-shell_2.10-0.12.1-SNAPSHOT.jar
> > >>>>> Add following code to first paragraph of notebook:
> > >>>>> ```
> > >>>>> %spark
> > >>>>> import org.apache.mahout.math._
> > >>>>> import org.apache.mahout.math.scalabindings._
> > >>>>> import org.apache.mahout.math.drm._
> > >>>>> import org.apache.mahout.math.scalabindings.RLikeOps._
> > >>>>> import org.apache.mahout.math.drm.RLikeDrmOps._
> > >>>>> import org.apache.mahout.sparkbindings._
> > >>>>>
> > >>>>> implicit val sdc:
> > >>>> org.apache.mahout.sparkbindings.SparkDistributedContext =
> > >>>>> sc2sdc(sc)
> > >>>>> ```
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> Trevor Grant
> > >>>>> Data Scientist
> > >>>>> https://github.com/rawkintrevo
> > >>>>> http://stackexchange.com/users/3002022/rawkintrevo
> > >>>>> http://trevorgrant.org
> > >>>>>
> > >>>>> *"Fortunate is he, who is able to know the causes of things."
> > > -Virgil*
> > >>>>>
> > >>>>>
> > >>>>> On Mon, May 16, 2016 at 6:42 PM, Pat Ferrel <[email protected]
> >
> > >>>> wrote:
> > >>>>>> Creating an mc used to do some Kryo setup, like registering
> > >>> serializers
> > >>>>> or
> > >>>>>> serializer factories IIRC. Also there is the Spark conf for
> > >>> allocating
> > >>>>>> memory for the Kryo buffer. Look at the code in the mc creation
> code
> > >>> in
> > >>>>> the
> > >>>>>> Spark package helpers. All can be done in straight Spark and
> passed
> > >>> in
> > >>>> to
> > >>>>>> create the mc when needed. Again from old weak brain cells but I
> > >>> think
> > >>>>> that
> > >>>>>> is part of what makes the Mahout shell different than teh Spark
> > shell
> > >>>>> plus
> > >>>>>> imports, it auto-creates the mc instead of or along with an sc.
> > >>>>>>
> > >>>>>> When I get back to my computer I can check.
> > >>>>>>
> > >>>>>> On May 16, 2016, at 3:40 PM, Andrew Palumbo <[email protected]>
> > >>>> wrote:
> > >>>>>> Trevor,
> > >>>>>>
> > >>>>>> Could you post any kryo errors that you may be having?
> > >>>>>>
> > >>>>>> ________________________________
> > >>>>>> From: Andrew Palumbo <[email protected]>
> > >>>>>> Sent: Monday, May 16, 2016 6:25:07 PM
> > >>>>>> To: mahout
> > >>>>>> Subject: Future Mahout - Zeppelin work
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> To Dmitriy's point, I agree ggplot is def the priority,  The
> mahout
> > >>>> plots
> > >>>>>> are at this point are really just a POC, but at some point we may
> be
> > >>>> want
> > >>>>>> to integrate some data transformation features into the mahout
> plots
> > >>>>>> classes so they're really more future work.
> > >>>>>>
> > >>>>>>
> > >>>>>> long story short:
> > >>>>>>
> > >>>>>>
> > >>>>>>> OK. I'll read through the examples and try to do something with
> > some
> > >>>>>> data, then do a ggplot and/or an angular plot on it (probably
> > >>> ggplot).
> > >>>>>>> I'll do a quick tutorial. Then I'll reopen discussion on that
> > >>> Zeppelin
> > >>>>>> issue about weather we want to go ahead and add another
> interpreter.
> > >>>>>>
> > >>>>>>
> > >>>>>> Souds Great.
> > >>>>>>
> > >>>>>>
> > >>>>>> Thank you.
> > >>>>>>
> > >>>>>> ________________________________
> > >>>>>> From: Trevor Grant <[email protected]>
> > >>>>>> Sent: Monday, May 16, 2016 5:49:17 PM
> > >>>>>> To: Dmitriy Lyubimov
> > >>>>>> Cc: Andrew Palumbo; Pat Ferrel; Suneel Marthi
> > >>>>>> Subject: Re: Intro - Future Mahout - Zeppelin work
> > >>>>>>
> > >>>>>> I just signed up for dev, should i just reply all and cc dev or
> > >>> start a
> > >>>>>> new thread?
> > >>>>>>
> > >>>>>> Trevor Grant
> > >>>>>> Data Scientist
> > >>>>>> https://github.com/rawkintrevo
> > >>>>>> [https://avatars3.githubusercontent.com/u/5852441?v=3&s=400]<
> > >>>>>> https://github.com/rawkintrevo>
> > >>>>>>
> > >>>>>> rawkintrevo (Trevor Grant) · GitHub<
> https://github.com/rawkintrevo>
> > >>>>>> github.com
> > >>>>>> rawkintrevo has 12 repositories written in Python, Batchfile, and
> R.
> > >>>>>> Follow their code on GitHub.
> > >>>>>>
> > >>>>>>
> > >>>>>> http://stackexchange.com/users/3002022/rawkintrevo
> > >>>>>> http://trevorgrant.org
> > >>>>>>
> > >>>>>> "Fortunate is he, who is able to know the causes of things."
> > -Virgil
> > >>>>>>
> > >>>>>>
> > >>>>>> On Mon, May 16, 2016 at 4:46 PM, Dmitriy Lyubimov <
> > [email protected]
> > >>>>>> <mailto:[email protected]>> wrote:
> > >>>>>> fwiw ggplot2 is pretty darn advanced:) i am a bit skeptical smile
> > >>> would
> > >>>>>> have something that ggplot2 would not, the other way around is
> much
> > >>>> more
> > >>>>>> expected by me:)
> > >>>>>>
> > >>>>>> anyhow if ggplot2 and matplotlib are available in Zeppelin without
> > >>>> major
> > >>>>>> limitations, it sounds like Zeppelin should be an all around very
> > >>> nice
> > >>>>>> venue then.
> > >>>>>>
> > >>>>>> On Mon, May 16, 2016 at 2:42 PM, Andrew Palumbo <
> [email protected]
> > >>>>>> <mailto:[email protected]>> wrote:
> > >>>>>>
> > >>>>>> yeah we should probably move this over to dev@
> > >>>>>>
> > >>>>>>
> > >>>>>> sorry- answering a question from a couple emails back on the
> thread.
> > >>>>>>
> > >>>>>>
> > >>>>>> If possible,  I think it would be great to eventually have both
> > >>> (native
> > >>>>>> mahout/smile plots and ggplot), since in the future we're going to
> > be
> > >>>>>> adding more visualization features rather than simple scatter
> plots
> > >>> etc
> > >>>>>> that may not be covered by ggplot.
> > >>>>>>
> > >>>>>>
> > >>>>>> That's why we were thinking about using angular and the pngs.
> > >>>>>>
> > >>>>>>
> > >>>>>> But what youre saying in your last email would be great!
> > >>>>>>
> > >>>>>>
> > >>>>>> Thank you!
> > >>>>>>
> > >>>>>>
> > >>>>>> ________________________________
> > >>>>>> From: Trevor Grant <[email protected]<mailto:
> > >>>>>> [email protected]>>
> > >>>>>> Sent: Monday, May 16, 2016 5:33:12 PM
> > >>>>>> To: Andrew Palumbo
> > >>>>>> Cc: Pat Ferrel; Suneel Marthi; Dmitriy Lyubimov
> > >>>>>>
> > >>>>>> Subject: Re: Intro - Future Mahout - Zeppelin work
> > >>>>>>
> > >>>>>> I somehow replied to your last email without seeing it...
> > >>>>>>
> > >>>>>> OK. I'll read through the examples and try to do something with
> some
> > >>>>> data,
> > >>>>>> then do a ggplot and/or an angular plot on it (probably ggplot).
> > >>>>>>
> > >>>>>> I'll do a quick tutorial. Then I'll reopen discussion on that
> > >>> Zeppelin
> > >>>>>> issue about weather we want to go ahead and add another
> interpreter.
> > >>>>>>
> > >>>>>> Trevor Grant
> > >>>>>> Data Scientist
> > >>>>>> https://github.com/rawkintrevo
> > >>>>>> http://stackexchange.com/users/3002022/rawkintrevo
> > >>>>>> http://trevorgrant.org
> > >>>>>>
> > >>>>>> "Fortunate is he, who is able to know the causes of things."
> > -Virgil
> > >>>>>>
> > >>>>>>
> > >>>>>> On Mon, May 16, 2016 at 4:26 PM, Trevor Grant <
> > >>>> [email protected]
> > >>>>>> <mailto:[email protected]>> wrote:
> > >>>>>> sorry for double email but are you thinking visualization should
> be
> > a
> > >>>>>> library internal to mahout or should we leverage zeppelins
> > >>>> visualization
> > >>>>>> capabilities?
> > >>>>>>
> > >>>>>> Also, should we move this discussion to dev?
> > >>>>>>
> > >>>>>> tg
> > >>>>>>
> > >>>>>>
> > >>>>>> Trevor Grant
> > >>>>>> Data Scientist
> > >>>>>> https://github.com/rawkintrevo
> > >>>>>> http://stackexchange.com/users/3002022/rawkintrevo
> > >>>>>> http://trevorgrant.org
> > >>>>>>
> > >>>>>> "Fortunate is he, who is able to know the causes of things."
> > -Virgil
> > >>>>>>
> > >>>>>>
> > >>>>>> On Mon, May 16, 2016 at 4:14 PM, Andrew Palumbo <
> [email protected]
> > >>>>>> <mailto:[email protected]>> wrote:
> > >>>>>>
> > >>>>>> Sorry- to be a little more clear,  Part of what we're trying to is
> > to
> > >>>> get
> > >>>>>> the new plotting features integrated with Zeppelin. We plan on
> > adding
> > >>>>> more
> > >>>>>> advanced plotting.
> > >>>>>>
> > >>>>>>
> > >>>>>> ________________________________
> > >>>>>> From: Andrew Palumbo <[email protected]<mailto:
> [email protected]
> > >>
> > >>>>>> Sent: Monday, May 16, 2016 5:04:49 PM
> > >>>>>> To: Pat Ferrel; Trevor Grant
> > >>>>>> Cc: Suneel Marthi; Dmitriy Lyubimov
> > >>>>>> Subject: Re: Intro - Future Mahout - Zeppelin work
> > >>>>>>
> > >>>>>>
> > >>>>>> Awesome!
> > >>>>>>
> > >>>>>>
> > >>>>>> most of the hard work was done by Dmitriy[??] , I've just reworked
> > >>> it a
> > >>>>>> couple of times to keep up with spark's refactoring.
> > >>>>>>
> > >>>>>>
> > >>>>>> I think that you will also need to include:
> > >>>>>>
> > >>>>>>
> > >>>>>>  mahout-spark_2.10-0.12.1-SNAPSHOT-dependency-reduced.jar
> > >>>>>>
> > >>>>>>
> > >>>>>> For the new plotting features that we're working on.
> > >>>>>>
> > >>>>>>
> > >>>>>> the plotting is still a work in progress, and the grid and surface
> > >>>> plots
> > >>>>>> are not working properly.  The plots are swing based and can
> > >>> currently
> > >>>> be
> > >>>>>> exported as  PNGs.  There are a few examples on the closed PR:
> > >>>>>> https://github.com/apache/mahout/pull/230
> > >>>>>>
> > >>>>>>
> > >>>>>> There is an example script in examples/bin/spark-shell-plot.mscala
> > >>>>>> (commited to master) :
> > >>>>>>
> > >>>
> > >
> >
> https://github.com/apache/mahout/blob/master/examples/bin/spark-shell-plot.mscala
> > >>>>>>
> > >>>>>> Thanks!
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> ________________________________
> > >>>>>> From: Pat Ferrel <[email protected]<mailto:
> > [email protected]
> > >>>>>> Sent: Monday, May 16, 2016 4:54:15 PM
> > >>>>>> To: Trevor Grant
> > >>>>>> Cc: Andrew Palumbo; Suneel Marthi; Dmitriy Lyubimov
> > >>>>>> Subject: Re: Intro - Future Mahout - Zeppelin work
> > >>>>>>
> > >>>>>> This is only the beginning. Andy has been using Smile as a
> > >>>> visualization
> > >>>>>> lib since it is pretty rich in ML support. We are looking at
> > >>>> integrating
> > >>>>>> some of that with Zeppelin then adding code to feed the new
> > >>>>> visualizations
> > >>>>>> in Mahout. I’m here because I’m fairly familiar with AngularJS if
> > >>>> that’s
> > >>>>>> the way to go. Smile is swing based but can output pngs, maybe
> other
> > >>>>> image
> > >>>>>> formats—Andy?
> > >>>>>>
> > >>>>>> BTW Dmitriy is still very involved but has rouble getting
> permission
> > >>> to
> > >>>>>> donate code.
> > >>>>>>
> > >>>>>>
> > >>>>>> On May 16, 2016, at 1:45 PM, Trevor Grant <
> [email protected]
> > >>>>>> <mailto:[email protected]>> wrote:
> > >>>>>>
> > >>>>>> Hey Andrew,
> > >>>>>>
> > >>>>>> thanks- you basically did all of the hard work for me!
> > >>>>>>
> > >>>>>> I've got the linear regression example working from:
> > >>>>>> http://mahout.apache.org/users/sparkbindings/play-with-shell.html
> > >>>>>>
> > >>>>>> my java is sketchy at best, i tend to over import. I pulled in the
> > >>>>>> following jars:
> > >>>>>>
> > >>>>>>
> > >>>
> > >
> >
> org/apache/mahout/mahout-math/0.12.1-SNAPSHOT/mahout-math-0.12.1-SNAPSHOT.jar
> > >>>>>>
> > >>>
> > >
> >
> org/apache/mahout/mahout-math-scala_2.10/0.12.1-SNAPSHOT/mahout-math-scala_2.10-0.12.1-SNAPSHOT.jar
> > >>>>>>
> > >>>
> > >
> >
> org/apache/mahout/mahout-spark_2.10/0.12.1-SNAPSHOT/mahout-spark_2.10-0.12.1-SNAPSHOT.jar
> > >>>>>>
> > >>>
> > >
> >
> org/apache/mahout/mahout-spark-shell_2.10/0.12.1-SNAPSHOT/mahout-spark-shell_2.10-0.12.1-SNAPSHOT.jar
> > >>>>>> I think those are all necessary...  should I be pulling in more?
> > >>>>>>
> > >>>>>> I hate to say it (but will do so bc this isn't public) this
> > >>> integration
> > >>>>> is
> > >>>>>> super easy from a user perspective, almost too easy- eg why not
> let
> > >>> the
> > >>>>>> user add it themselves...  Add the appropriate maven artifacts,
> > >>> restart
> > >>>>> the
> > >>>>>> interpreter and run the following in a notebook:
> > >>>>>> ```
> > >>>>>> import org.apache.mahout.math._
> > >>>>>> import org.apache.mahout.math.scalabindings._
> > >>>>>> import org.apache.mahout.math.drm._
> > >>>>>> import org.apache.mahout.math.scalabindings.RLikeOps._
> > >>>>>> import org.apache.mahout.math.drm.RLikeDrmOps._
> > >>>>>> import org.apache.mahout.sparkbindings._
> > >>>>>>
> > >>>>>> implicit val sdc:
> > >>>> org.apache.mahout.sparkbindings.SparkDistributedContext
> > >>>>>> = sc2sdc(sc)
> > >>>>>> ```
> > >>>>>> Then whatever code you want and you're off to the races...
> > >>>>>>
> > >>>>>> that said, adding a build profile like -PsparkMahout and creating
> an
> > >>>>>> interpretter like %spark.mahout should be fairly straight forward.
> > >>>>>>
> > >>>>>> Second question, do you have an example that would be more
> > >>>> 'visualization
> > >>>>>> friendly'? I could pass the results to Angular or R just to show
> off
> > >>>> how
> > >>>>> to
> > >>>>>> do it.
> > >>>>>>
> > >>>>>> Which leads back to the question, is this even worth building a
> full
> > >>>>>> interpreter for or just make a really nice blog post with examples
> > on
> > >>>> how
> > >>>>>> to integrate with R...?
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> Trevor Grant
> > >>>>>> Data Scientist
> > >>>>>> https://github.com/rawkintrevo
> > >>>>>> http://stackexchange.com/users/3002022/rawkintrevo
> > >>>>>> http://trevorgrant.org<http://trevorgrant.org/>
> > >>>>>>
> > >>>>>> "Fortunate is he, who is able to know the causes of things."
> > -Virgil
> > >>>>>>
> > >>>>>>
> > >>>>>> On Mon, May 16, 2016 at 2:09 PM, Andrew Palumbo <
> [email protected]
> > >>>>>> <mailto:[email protected]>> wrote:
> > >>>>>> Hi Trevor, welcome!
> > >>>>>>
> > >>>>>> It's great to have you helping out, thanks very much.  I've done a
> > >>> good
> > >>>>>> amount of work on our mahout spark shell .. so let me know if you
> > >>> have
> > >>>>> any
> > >>>>>> questions there about what we did there..
> > >>>>>>
> > >>>>>> Thanks alot!
> > >>>>>>
> > >>>>>> Andy
> > >>>>>>
> > >>>>>>
> > >>>>>> -------- Original message --------
> > >>>>>> From: Suneel Marthi <[email protected]<mailto:[email protected]
> >>
> > >>>>>> Date: 05/16/2016 2:44 PM (GMT-05:00)
> > >>>>>> To: Trevor Grant <[email protected]<mailto:
> > >>>>> [email protected]
> > >>>>>> Cc: Suneel Marthi <[email protected]<mailto:[email protected]
> >>,
> > >>> Pat
> > >>>>>> Ferrel <[email protected]<mailto:[email protected]>>,
> > Andrew
> > >>>>>> Palumbo <[email protected]<mailto:[email protected]>>
> > >>>>>> Subject: Re: Intro - Future Mahout - Zeppelin work
> > >>>>>>
> > >>>>>> Oh yes, he's around. I see him online.
> > >>>>>>
> > >>>>>> On Mon, May 16, 2016 at 2:42 PM, Trevor Grant <
> > >>>> [email protected]
> > >>>>>> <mailto:[email protected]>> wrote:
> > >>>>>> Is Dmitriy Lyubimov still around?
> > >>>>>>
> > >>>>>> Looks like he created this issue for Zeppelin a while ago. (The
> old
> > >>>> lost
> > >>>>>> code to which you were referring?)
> > >>>>>>
> > >>>>>> https://issues.apache.org/jira/browse/ZEPPELIN-116
> > >>>>>>
> > >>>>>>
> > >>>>>> tg
> > >>>>>>
> > >>>>>>
> > >>>>>> Trevor Grant
> > >>>>>> Data Scientist
> > >>>>>> https://github.com/rawkintrevo
> > >>>>>> http://stackexchange.com/users/3002022/rawkintrevo
> > >>>>>> http://trevorgrant.org<http://trevorgrant.org/>
> > >>>>>>
> > >>>>>> "Fortunate is he, who is able to know the causes of things."
> > -Virgil
> > >>>>>>
> > >>>>>>
> > >>>>>> On Mon, May 16, 2016 at 1:37 PM, Suneel Marthi <
> [email protected]
> > >>>>> <mailto:
> > >>>>>> [email protected]>> wrote:
> > >>>>>> Welcome to the party TG !!
> > >>>>>>
> > >>>>>> On Mon, May 16, 2016 at 2:28 PM, Trevor Grant <
> > >>>> [email protected]
> > >>>>>> <mailto:[email protected]>> wrote:
> > >>>>>> Hey all,
> > >>>>>>
> > >>>>>> I'm excited for a chance to help out.  I'm actually getting ready
> to
> > >>>>>> download now and start playing around.
> > >>>>>>
> > >>>>>> I had talked about this briefly but it given a properly
> functioning
> > >>>>>> Zeppelin interpreter for Apache Mahout, one could leverage all of
> > the
> > >>>>>> Zeppelin visualizations, anything in AngularJS, or anything in R
> > >>>> (through
> > >>>>>> clever use of Zeppelin's Resource Pools).
> > >>>>>>
> > >>>>>> I'll work on getting logged in to the slack channel as well.
> > >>>>>>
> > >>>>>> Nice to meet you all, looking forward to helping out!
> > >>>>>>
> > >>>>>> tg
> > >>>>>>
> > >>>>>>
> > >>>>>> Trevor Grant
> > >>>>>> Data Scientist
> > >>>>>> https://github.com/rawkintrevo
> > >>>>>> http://stackexchange.com/users/3002022/rawkintrevo
> > >>>>>> http://trevorgrant.org<http://trevorgrant.org/>
> > >>>>>>
> > >>>>>> "Fortunate is he, who is able to know the causes of things."
> > -Virgil
> > >>>>>>
> > >>>>>>
> > >>>>>> On Sun, May 15, 2016 at 12:56 PM, Suneel Marthi <
> [email protected]
> > >>>>>> <mailto:[email protected]>> wrote:
> > >>>>>> FYi...
> > >>>>>> Trevor was there for my talk, so he has some idea of Mahout
> Samsara.
> > >>>>>>
> > >>>>>> On Sun, May 15, 2016 at 1:51 PM, Pat Ferrel <
> [email protected]
> > >>>>> <mailto:
> > >>>>>> [email protected]>> wrote:
> > >>>>>> Hey Trevor,
> > >>>>>>
> > >>>>>> Good to meet you. As you probably know Mahout-Samsara is a
> > >>>> reincarnation
> > >>>>>> of the project in a new body, which is less a collection of
> > >>> algorithms
> > >>>>> than
> > >>>>>> a roll-your-own math/algorithm tool. The major benefit is that
> > during
> > >>>>>> experimentation and later in production the code is by nature
> > >>> scalable
> > >>>> on
> > >>>>>> Spark and Flink. Most of the Mahout DSL is R-like and supports
> > tensor
> > >>>>> math
> > >>>>>> but we are now looking at streaming online algo support too.
> > >>>>>>
> > >>>>>> In any case you probably know we have a Mahout version of the
> Spark
> > >>>>> Shell,
> > >>>>>> which has been integrated with an old version of Zeppelin (code is
> > >>>> lost).
> > >>>>>> Recently Andy has experimented with some very nice visualizations
> of
> > >>> ML
> > >>>>>> data (not just analytics data). We as a project are interested in
> > >>>>> Zeppelin
> > >>>>>> integration of our shell and graphics. From what I understand the
> > >>>>> graphics
> > >>>>>> extension mechanism of Zeppelin is based on AngularJS, which I
> have
> > >>>> some
> > >>>>>> experience with.
> > >>>>>>
> > >>>>>> So, we’d like to start the conversation about how to proceed. We
> > >>> would
> > >>>>>> love some help but will move ahead in any case.
> > >>>>>>
> > >>>>>> Pat
> > >>>>>>
> > >>>>>>
> > >>>>>> On May 15, 2016, at 9:52 AM, Suneel Marthi <[email protected]
> > >>> <mailto:
> > >>>>>> [email protected]>> wrote:
> > >>>>>>
> > >>>>>> Hi Trevor,
> > >>>>>>
> > >>>>>> Nice meeting u last week in Vancouver.  Per our conversation, I
> > >>> wanted
> > >>>> to
> > >>>>>> introduce u to Andrew Palumbo (Mahout Chair) and Pat Ferrel
> (Mahout
> > >>>> PMC).
> > >>>>>> As I mentioned in my talk, we are actively looking at Zeppelin
> > >>>>> integration
> > >>>>>> with Mahout (primarily for spark) and would appreciate your help
> (as
> > >>>> also
> > >>>>>> all things DL and ML).
> > >>>>>>
> > >>>>>> We definitely can use all your help as we r revamping the Mahout
> > >>>> project
> > >>>>>> and shedding its legacy MapReduce image.
> > >>>>>>
> > >>>>>> I sent u an invite to the Mahout slack channel, mahout.apache.org
> <
> > >>>>>> http://mahout.apache.org/> - that's where we all hangout and not
> > >>>> having
> > >>>>>> to worry about avoiding naughty words.
> > >>>>>>
> > >>>>>> Looking forward to working with you
> > >>>>>>
> > >>>>>> Suneel
> > >>>>>>
> > >>>>>>
> > >>>>>
> > >
> > >
> > >
> >
> >
>

Reply via email to