Thx Trevor, Re: m-1854, It was something that we started when were first discussing using the smile plots for and trying to pipe them over to Zeppelin .. As far as I know there was not progress started on it.. I've unassigned it.
Feel free to Assign any Jiras to yourself. I think that m-1854 is similar to the mahout-spark-shell, so I may be able to help out there. ________________________________________ From: Trevor Grant <trevor.d.gr...@gmail.com> Sent: Saturday, May 28, 2016 11:21:44 PM To: dev@mahout.apache.org Subject: Re: Future Mahout - Zeppelin work Created a subtask on 1855 for tsv strings. Looking at 1854 assigned to Pat Ferrel, what's your progress to date? How can I help? tg Trevor Grant Data Scientist https://github.com/rawkintrevo http://stackexchange.com/users/3002022/rawkintrevo http://trevorgrant.org *"Fortunate is he, who is able to know the causes of things." -Virgil* On Thu, May 26, 2016 at 2:34 PM, Andrew Palumbo <ap....@outlook.com> wrote: > Great! > > When you free up and have the time, could you create some Jiras for these? > > We actually have MAHOUT-1852 open for Histograms already, and MAHOUT-1854 > and MAHOUT-1855 (early Zeppelin integration Jiras). I can close m-1854 and > m-1855 out and we can start new ones if they're not relevant anymore or we > can just go with those. > > Thanks > > ________________________________________ > From: Trevor Grant <trevor.d.gr...@gmail.com> > Sent: Thursday, May 26, 2016 3:17:22 PM > To: dev@mahout.apache.org > Subject: Re: Future Mahout - Zeppelin work > > Short answer: it is high priority. I think it will be a Mahout interpreter > into Zeppelin, and given that plans are on hold for a Flink-Mahout in the > short term, I think it should be a piggy-back spark interpreter (e.g. > exposed through something like %spark.mahout). So I have thoughts, but no > plan. Been busy with a couple of other commitments. > > On the Mahout side we need: > A function that will convert small matrices into TSV strings > Convenience functions for sampling super-large matrices into things like > histograms, etc, that one would want to plot. I.e. histogram bucketing? > (less important for the moment) > > On the Zeppelin Size we need: > an interpreter. > > > Trevor Grant > Data Scientist > https://github.com/rawkintrevo > http://stackexchange.com/users/3002022/rawkintrevo > http://trevorgrant.org > > *"Fortunate is he, who is able to know the causes of things." -Virgil* > > > On Thu, May 26, 2016 at 1:22 PM, Suneel Marthi <smar...@apache.org> wrote: > > > While on this subject, do we have a plan yet of integrating Zeppelin into > > Mahout (or the converse) of having Mahout specific interpreter for > > Zeppelin? I think that shuld be high priority in the short term. > > > > On Thu, May 26, 2016 at 1:17 PM, Trevor Grant <trevor.d.gr...@gmail.com> > > wrote: > > > > > Ahh, like the "Sample From Matrix" paragraph in the notebook. > > > > > > Yea that seems like a good add. If not this afternoon, I'll include it > > > Saturday. > > > > > > > > > Trevor Grant > > > Data Scientist > > > https://github.com/rawkintrevo > > > http://stackexchange.com/users/3002022/rawkintrevo > > > http://trevorgrant.org > > > > > > *"Fortunate is he, who is able to know the causes of things." -Virgil* > > > > > > > > > On Thu, May 26, 2016 at 11:52 AM, Andrew Palumbo <ap....@outlook.com> > > > wrote: > > > > > > > Trevor, I was reading over your blog last night again- first time > since > > > > you updated. It is great! > > > > > > > > I have one suggestion being adding in a code line on how the the > > sampling > > > > of the DRM -> in-core Matrix is done: > > > > > > > > > > > > > > > > > > https://github.com/apache/mahout/blob/master/math-scala/src/main/scala/org/apache/mahout/math/drm/package.scala#L148 > > > > > > > > eg something like: > > > > > > > > mxSin = drmSampleKRows(drmSin, 1000, replacement = false) > > > > > > > > Maybe you omitted this intentionally? > > > > > > > > Andy > > > > > > > > ________________________________________ > > > > From: Trevor Grant <trevor.d.gr...@gmail.com> > > > > Sent: Friday, May 20, 2016 7:56:20 PM > > > > To: dev@mahout.apache.org > > > > Subject: Re: Future Mahout - Zeppelin work > > > > > > > > Unfortunately Zeppelin dev has been so rapid, 0.6-SNAPSHOT as a > version > > > is > > > > uninformative to me. I'd say if possible, you're first > troubleshooting > > > > measure would be to re clone or do a "git fetch upstream" to get up > to > > > the > > > > very latest > > > > > > > > Sorry for delayed reply > > > > Tg > > > > On May 20, 2016 5:36 PM, "Andrew Musselman" < > > andrew.mussel...@gmail.com> > > > > wrote: > > > > > > > > > Trevor, my zeppelin source is at this version: > > > > > > > > > > <groupId>org.apache.zeppelin</groupId> > > > > > <artifactId>zeppelin</artifactId> > > > > > <packaging>pom</packaging> > > > > > <version>0.6.0-incubating-SNAPSHOT</version> > > > > > <name>Zeppelin</name> > > > > > <description>Zeppelin project</description> > > > > > <url>http://zeppelin.incubator.apache.org/</url> > > > > > > > > > > And yes you're right the artifacts weren't added to the > dependencies; > > > is > > > > > that a feature in more modern zep? > > > > > > > > > > On Fri, May 20, 2016 at 3:02 PM, Dmitriy Lyubimov < > dlie...@gmail.com > > > > > > > > wrote: > > > > > > > > > > > no parenthesis. > > > > > > > > > > > > import o.a.m.sparkbindings._ > > > > > > .... > > > > > > myRdd = myDrm.rdd > > > > > > > > > > > > > > > > > > On Fri, May 20, 2016 at 2:57 PM, Suneel Marthi < > smar...@apache.org > > > > > > > > wrote: > > > > > > > > > > > > > On Fri, May 20, 2016 at 3:18 PM, Trevor Grant < > > > > > trevor.d.gr...@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > > Hey Pat, > > > > > > > > > > > > > > > > If you spit out a TSV - you can import into pyspark / > > matplotlib > > > > from > > > > > > the > > > > > > > > resource pool in essentially the same way and use that > plotting > > > > > library > > > > > > > if > > > > > > > > you prefer. In fact you could import the tsv into pandas and > > use > > > > all > > > > > > of > > > > > > > > the pandas plotting as well (though I think it is for the > most > > > > part, > > > > > > also > > > > > > > > matplotlib with some convenience functions). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://www.zeppelinhub.com/viewer/notebooks/aHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2ZlbGl4Y2hldW5nL3NwYXJrLW5vdGVib29rLWV4YW1wbGVzL21hc3Rlci9aZXBwZWxpbl9ub3RlYm9vay8yQU1YNUNWQ1Uvbm90ZS5qc29u > > > > > > > > > > > > > > > > In Zeppelin, unless you specify otherwise, pyspark, sparkr, > > > > > spark-sql, > > > > > > > and > > > > > > > > scala-spark all share the same spark context you can create > > RDDs > > > in > > > > > one > > > > > > > > language and access them / work on them in another (so I > > > > understand). > > > > > > > > > > > > > > > > So in Mahout can you "save" a matrix as a RDD? e.g. something > > > like > > > > > > > > > > > > > > > > val myRDD = myDRM.asRDD() > > > > > > > > > > > > > > > > > > > > > > val myRDD = myDRM.rdd() > > > > > > > > > > > > > > > > > > > > > > > And would 'myRDD' then exist in the spark context? > > > > > > > > > > > > > > > > yes it will be in sparkContext > > > > > > > > > > > > > > > > > > > > > > > Trevor Grant > > > > > > > > Data Scientist > > > > > > > > https://github.com/rawkintrevo > > > > > > > > http://stackexchange.com/users/3002022/rawkintrevo > > > > > > > > http://trevorgrant.org > > > > > > > > > > > > > > > > *"Fortunate is he, who is able to know the causes of things." > > > > > -Virgil* > > > > > > > > > > > > > > > > > > > > > > > > On Fri, May 20, 2016 at 12:21 PM, Pat Ferrel < > > > > p...@occamsmachete.com> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Agreed. > > > > > > > > > > > > > > > > > > BTW I don’t want to stall progress but being the most > > ignorant > > > of > > > > > > plot > > > > > > > > > libs, I’ll ask if we should consider python and matplotlib. > > In > > > > > > another > > > > > > > > > project we use python because of the RDD support on Spark > > > though > > > > > the > > > > > > > > > visualizations are extremely limited in our case. If we can > > > pass > > > > an > > > > > > RDD > > > > > > > > to > > > > > > > > > pyspark it would allow custom reductions in python before > > > > plotting, > > > > > > > even > > > > > > > > > though we will support many natively in Mahout. I’m > guessing > > > that > > > > > > this > > > > > > > > > would cross a context boundary and require a write to disk? > > > > > > > > > > > > > > > > > > So 2 questions: > > > > > > > > > 1) what does the inter language support look like with > Spark > > > > python > > > > > > vs > > > > > > > > > SparkR, can we transfer RDDs? > > > > > > > > > 2) are the plot libs significantly different? > > > > > > > > > > > > > > > > > > On May 20, 2016, at 9:54 AM, Trevor Grant < > > > > > trevor.d.gr...@gmail.com> > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > Dmitriy really nailed it on the head in his reply to the > post > > > > which > > > > > > > I'll > > > > > > > > > rebroadcast below. In essence the whole reason you are > > > > > > (theoretically) > > > > > > > > > using Mahout is the data is to big to fit in memory. If > it's > > > to > > > > > big > > > > > > to > > > > > > > > fit > > > > > > > > > in memory, well then its probably too big to plot each > point > > > > (e.g. > > > > > > > > > trillions of row, you only have so many pixels). For the > > > > example > > > > > I > > > > > > > > > randomly sampled a matrix. > > > > > > > > > > > > > > > > > > So as Dmitriy says, in Mahout we need to have functions > that > > > will > > > > > > > > > 'preprocess' the data into something plotable. > > > > > > > > > > > > > > > > > > For the Zepplin-Plotting thing, we need to have a function > > that > > > > > will > > > > > > > spit > > > > > > > > > out a tsv like string of the data we wanted plotted. > > > > > > > > > > > > > > > > > > I agree an honest Mahout interpreter in Zeppelin is > probably > > > > worth > > > > > > > doing. > > > > > > > > > There are a couple of ways to go about it. I opened up the > > > > > discussion > > > > > > > on > > > > > > > > > dev@Zeppelin and didn't get any replies. I'm going to take > > > that > > > > to > > > > > > > mean > > > > > > > > we > > > > > > > > > can do it in a way that makes the most sense to Mahout > > users... > > > > > > > > > > > > > > > > > > First steps are to include some methods in Mahout that will > > do > > > > that > > > > > > > > > preprocessing, and one that will turn something into a tsv > > > > string. > > > > > > > > > > > > > > > > > > I have some general ideas on possible approached to making > an > > > > > > > > honest-mahout > > > > > > > > > interpreter but I want to play in the code and look at the > > > > > > Flink-Mahout > > > > > > > > > shell a bit before I try to organize my thoughts and > present > > > > them. > > > > > > > > > > > > > > > > > > ...(2) not sure what is the point of supporting distributed > > > > > anything. > > > > > > > It > > > > > > > > is > > > > > > > > > distributed presumably because it is hard to keep it in > > memory. > > > > > > > > Therefore, > > > > > > > > > plotting anything distributed potentially presents 2 > > problems: > > > > > > storage > > > > > > > > > space and overplotting due to number of points. The idea is > > > that > > > > we > > > > > > > have > > > > > > > > to > > > > > > > > > work out algorithms that condense big data information into > > > small > > > > > > > > plottable > > > > > > > > > information (like density grids, for example, or > > > histograms).... > > > > > > > > > > > > > > > > > > Trevor Grant > > > > > > > > > Data Scientist > > > > > > > > > https://github.com/rawkintrevo > > > > > > > > > http://stackexchange.com/users/3002022/rawkintrevo > > > > > > > > > http://trevorgrant.org > > > > > > > > > > > > > > > > > > *"Fortunate is he, who is able to know the causes of > things." > > > > > > -Virgil* > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, May 20, 2016 at 10:22 AM, Pat Ferrel < > > > > > p...@occamsmachete.com> > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Great job Trevor, we’ll need this detail to smooth out > the > > > > sharp > > > > > > > edges > > > > > > > > > and > > > > > > > > > > any guidance from you or the Zeppelin community will be a > > big > > > > > help. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On May 20, 2016, at 8:13 AM, Shannon Quinn < > > > squ...@gatech.edu> > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > Agreed, thoroughly enjoying the blog post. > > > > > > > > > > > > > > > > > > > > On 5/19/16 12:01 AM, Andrew Palumbo wrote: > > > > > > > > > >> Well done, Trevor! I've not yet had a chance to try > this > > in > > > > > > > zeppelin > > > > > > > > > > but I just read the blog which is great! > > > > > > > > > >> > > > > > > > > > >> -------- Original message -------- > > > > > > > > > >> From: Trevor Grant <trevor.d.gr...@gmail.com> > > > > > > > > > >> Date: 05/18/2016 2:44 PM (GMT-05:00) > > > > > > > > > >> To: dev@mahout.apache.org > > > > > > > > > >> Subject: Re: Future Mahout - Zeppelin work > > > > > > > > > >> > > > > > > > > > >> Ah thank you. > > > > > > > > > >> > > > > > > > > > >> Fixing now. > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> Trevor Grant > > > > > > > > > >> Data Scientist > > > > > > > > > >> https://github.com/rawkintrevo > > > > > > > > > >> http://stackexchange.com/users/3002022/rawkintrevo > > > > > > > > > >> http://trevorgrant.org > > > > > > > > > >> > > > > > > > > > >> *"Fortunate is he, who is able to know the causes of > > > things." > > > > > > > > -Virgil* > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> On Wed, May 18, 2016 at 1:04 PM, Andrew Palumbo < > > > > > > ap....@outlook.com > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > >> > > > > > > > > > >>> Hey Trevor- Just refreshed your readme. The jar that I > > > > > mentioned > > > > > > > is > > > > > > > > > >>> actually: > > > > > > > > > >>> > > > > > > > > > >>> > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > /home/username/.m2/repository/org/apache/mahout/mahout-spark_2.10/0.12.1-SNAPSHOT/mahout-spark_2.10-0.12.1-SNAPSHOT-dependency-reduced.jar > > > > > > > > > >>> > > > > > > > > > >>> rather than: > > > > > > > > > >>> > > > > > > > > > >>> > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > /home/username/.m2/repository/org/apache/mahout/mahout-spark-shell_2.10/0.12.1-SNAPSHOT/mahout-spark_2.10-0.12.1-SNAPSHOT-dependency-reduced.jar > > > > > > > > > >>> > > > > > > > > > >>> (In the spark module that is) > > > > > > > > > >>> ________________________________________ > > > > > > > > > >>> From: Trevor Grant <trevor.d.gr...@gmail.com> > > > > > > > > > >>> Sent: Wednesday, May 18, 2016 11:02:43 AM > > > > > > > > > >>> To: dev@mahout.apache.org > > > > > > > > > >>> Subject: Re: Future Mahout - Zeppelin work > > > > > > > > > >>> > > > > > > > > > >>> ah yes- I remember you pointing that out to me too. > > > > > > > > > >>> > > > > > > > > > >>> I got side tracked yesterday for most of the day on an > > > > > adventure > > > > > > in > > > > > > > > > > getting > > > > > > > > > >>> Zeppelin to work right after I accidently updated to > the > > > new > > > > > > > snapshot > > > > > > > > > > (free > > > > > > > > > >>> hint: the secret was to clear my cache *face-palm*) > > > > > > > > > >>> > > > > > > > > > >>> I'm going to add that dependency to the readme.md now. > > > > > > > > > >>> > > > > > > > > > >>> thanks, > > > > > > > > > >>> tg > > > > > > > > > >>> > > > > > > > > > >>> Trevor Grant > > > > > > > > > >>> Data Scientist > > > > > > > > > >>> https://github.com/rawkintrevo > > > > > > > > > >>> http://stackexchange.com/users/3002022/rawkintrevo > > > > > > > > > >>> http://trevorgrant.org > > > > > > > > > >>> > > > > > > > > > >>> *"Fortunate is he, who is able to know the causes of > > > things." > > > > > > > > -Virgil* > > > > > > > > > >>> > > > > > > > > > >>> > > > > > > > > > >>> On Wed, May 18, 2016 at 9:59 AM, Andrew Palumbo < > > > > > > > ap....@outlook.com> > > > > > > > > > >>> wrote: > > > > > > > > > >>> > > > > > > > > > >>>> Trevor this is very cool- I have not been able to look > > at > > > it > > > > > > > closely > > > > > > > > > > yet > > > > > > > > > >>>> but just a small point: I believe that you'll also > need > > to > > > > add > > > > > > the > > > > > > > > > >>>> > > > > > > > > > >>>> > mahout-spark_2.10-0.12.1-SNAPSHOT-dependency-reduced.jar > > > > > > > > > >>>> > > > > > > > > > >>>> For things like the classification stats, confusion > > > matrix, > > > > > and > > > > > > > > > > t-digest. > > > > > > > > > >>>> > > > > > > > > > >>>> Andy > > > > > > > > > >>>> > > > > > > > > > >>>> ________________________________________ > > > > > > > > > >>>> From: Trevor Grant <trevor.d.gr...@gmail.com> > > > > > > > > > >>>> Sent: Wednesday, May 18, 2016 10:47:21 AM > > > > > > > > > >>>> To: dev@mahout.apache.org > > > > > > > > > >>>> Subject: Re: Future Mahout - Zeppelin work > > > > > > > > > >>>> > > > > > > > > > >>>> I still need to update my readme/env per Pat's > comments > > > > below, > > > > > > > > however > > > > > > > > > >>> with > > > > > > > > > >>>> out further ado, I present two notebooks that > integrate > > > > > Mahout + > > > > > > > > Spark > > > > > > > > > > + > > > > > > > > > >>>> Zeppelin + ggplot2 > > > > > > > > > >>>> > > > > > > > > > >>>> https://github.com/rawkintrevo/mahout-zeppelin > > > > > > > > > >>>> > > > > > > > > > >>>> Supposing you have a somewhat recent version of > Zeppelin > > > 0.6 > > > > > > with > > > > > > > > > > sparkr > > > > > > > > > >>>> support running already, you may import the following > > raw > > > > > notes > > > > > > > > > > directly > > > > > > > > > >>>> into Zeppelin: > > > > > > > > > >>>> > > > > > > > > > >>>> > > > > > > > > > >>>> > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://raw.githubusercontent.com/rawkintrevo/mahout-zeppelin/master/%5BMAHOUT%5D%5BPROVING-GROUNDS%5DLinear%20Regression%20in%20Spark.json > > > > > > > > > >>>> > > > > > > > > > >>>> > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://raw.githubusercontent.com/rawkintrevo/mahout-zeppelin/master/%5BMAHOUT%5D%5BPROVING-GROUNDS%5DSpark-Mahout%2Bggplot2.json > > > > > > > > > >>>> So my thoughs on next steps, which I'm positing only > as > > a > > > > > > starting > > > > > > > > > > point > > > > > > > > > >>>> for discussion, and are in no particular order of > > > > importance: > > > > > > > > > >>>> > > > > > > > > > >>>> - Blog on HOWTO for everyman (assumes no familiarity > > with > > > > > > Mahout, > > > > > > > > and > > > > > > > > > >>> only > > > > > > > > > >>>> enough familiarity with Zeppelin to have Zeppelin + > > SparkR > > > > > > > support) > > > > > > > > > >>>> - Some syntactic sugar somewhere in Mahout to convert > a > > > > matrix > > > > > > > into > > > > > > > > a > > > > > > > > > > tsv > > > > > > > > > >>>> string. (with some sanity, eg a sample of a matrix) > > > > > > > > > >>>> - Figure out with Zeppelin community what deeper > > > integration > > > > > > feels > > > > > > > > > > like - > > > > > > > > > >>>> e.g. build-profile vs. tutorial > > > > > > > > > >>>> - I think the case for making a build-profile is that > > > > > Zeppelin > > > > > > is > > > > > > > > > > first > > > > > > > > > >>>> and foremost a datascience tool for non technical > users. > > > > > > > > > >>>> - If we go that route I'll need some more support > > finding > > > > out > > > > > > > what > > > > > > > > is > > > > > > > > > >>> the > > > > > > > > > >>>> absolute minimum 'bare-bones' mahout we can include, > > e.g. > > > > does > > > > > > the > > > > > > > > > user > > > > > > > > > >>>> have to have mahout installed? To be discussed. > > > > > > > > > >>>> - Add matplotlib (python) "support" -> paragraph > showing > > > how > > > > > to > > > > > > do > > > > > > > > the > > > > > > > > > >>> same > > > > > > > > > >>>> thing in Python. > > > > > > > > > >>>> > > > > > > > > > >>>> The basic deal here is we are: > > > > > > > > > >>>> 1) Setting up a standard Zeppelin Spark Interpretter > to > > > act > > > > > > like a > > > > > > > > > > Mahout > > > > > > > > > >>>> interpretter > > > > > > > > > >>>> - This is taken care of by setting some env. > > variables, > > > > > > adding > > > > > > > > some > > > > > > > > > >>>> dependencies, and importing relevent packages > > > > > > > > > >>>> 2) do mahout things as you do > > > > > > > > > >>>> 3) export table to tsv string, which is passed to a > > > resource > > > > > > pool > > > > > > > > > >>>> - This could be done to a disk if you didn't have > > > zeppelin > > > > > > > > > >>>> 4) read the tsv from the resource pool (or disk if you > > > > didn't > > > > > > have > > > > > > > > > >>>> zeppelin) in R (python soon) and create a <plot > package > > of > > > > > your > > > > > > > > > choice> > > > > > > > > > >>>> > > > > > > > > > >>>> To Pat's point- this is a kind of clumsy pipeline, > > however > > > > the > > > > > > > > > Zeppelin > > > > > > > > > >>>> wrapper at least makes it *feel* less so. > > > > > > > > > >>>> > > > > > > > > > >>>> > > > > > > > > > >>>> Trevor Grant > > > > > > > > > >>>> Data Scientist > > > > > > > > > >>>> https://github.com/rawkintrevo > > > > > > > > > >>>> http://stackexchange.com/users/3002022/rawkintrevo > > > > > > > > > >>>> http://trevorgrant.org > > > > > > > > > >>>> > > > > > > > > > >>>> *"Fortunate is he, who is able to know the causes of > > > > things." > > > > > > > > > -Virgil* > > > > > > > > > >>>> > > > > > > > > > >>>> > > > > > > > > > >>>> On Tue, May 17, 2016 at 1:17 PM, Pat Ferrel < > > > > > > > p...@occamsmachete.com> > > > > > > > > > >>> wrote: > > > > > > > > > >>>>> Seems like there is plenty to use in ggplot or python > > but > > > > the > > > > > > > > > pipeline > > > > > > > > > >>> is > > > > > > > > > >>>>> a little convoluted (so maybe no need for Angular > > > > > integration). > > > > > > > To > > > > > > > > > get > > > > > > > > > >>>>> graphics out of Mahout it would be nice to not > require > > > > > > knowledge > > > > > > > > of R > > > > > > > > > >>>>> and/or python. Knowing Mahout is already bad enough > > but I > > > > > guess > > > > > > > the > > > > > > > > > > API > > > > > > > > > >>>>> from the Mahout side for plotting could be Scala > > > syntactic > > > > > > sugar. > > > > > > > > > What > > > > > > > > > >>>> and > > > > > > > > > >>>>> how this all is installed and setup is the next > > question. > > > > > > > > > >>>>> > > > > > > > > > >>>>> BTW this is what I use elsewhere (Mahout as a lib to > > this > > > > > code) > > > > > > > > > >>>>> > > > > > > > > > >>>>> "spark.serializer": > > > > > > > > "org.apache.spark.serializer.KryoSerializer", > > > > > > > > > >>>>> "spark.kryo.registrator": > > > > > > > > > >>>>> > > > "org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator", > > > > > > > > > >>>>> "spark.kryo.referenceTracking": "false", > > > > > > > > > >>>>> "spark.kryoserializer.buffer": "300m”, > > > > > > > > > >>>>> > > > > > > > > > >>>>> afaik you will only see if Kryo is working when you > > have > > > to > > > > > > > > serialize > > > > > > > > > > a > > > > > > > > > >>>>> mahout specific data type like vector of drm, > something > > > > > > > registered > > > > > > > > > > with > > > > > > > > > >>>>> Kryo. > > > > > > > > > >>>>> > > > > > > > > > >>>>> > > > > > > > > > >>>>> On May 16, 2016, at 6:18 PM, Trevor Grant < > > > > > > > > trevor.d.gr...@gmail.com> > > > > > > > > > >>>>> wrote: > > > > > > > > > >>>>> > > > > > > > > > >>>>> As a quick recap- we're trying to leverage Zeppelin > for > > > > > > charting. > > > > > > > > > >>>>> > > > > > > > > > >>>>> It seems as though this can be achieved by > > > > > > > > > >>>>> - Adding properties to the Spark Interpreter > > > > > > > > > >>>>> - Adding dependency jars to the spark interpreter > > > > > > > > > >>>>> - importing in a spark paragraph > > > > > > > > > >>>>> > > > > > > > > > >>>>> All seems to be working well, but I've fooled myself > > into > > > > > > > thinking > > > > > > > > > >>> things > > > > > > > > > >>>>> were 'working' before because I wasn't actually > > > > integrating. > > > > > > > Lower > > > > > > > > I > > > > > > > > > >>> will > > > > > > > > > >>>>> outline the imports/properties, please look over and > > tell > > > > me > > > > > if > > > > > > > I'm > > > > > > > > > >>>>> theoretically missing anything. > > > > > > > > > >>>>> > > > > > > > > > >>>>> The next phase for me will be > > > > > > > > > >>>>> 1) Convert a matrix to some sort of serializable > object > > > > that > > > > > I > > > > > > > can > > > > > > > > > >>> easily > > > > > > > > > >>>>> unpack from R > > > > > > > > > >>>>> 2) use Zeppelin's resource buffers to pass the object > > > > > > > > > >>>>> 3) collect the object in an R paragraph, convert it > to > > a > > > > > > > dataframe > > > > > > > > > > then > > > > > > > > > >>>> map > > > > > > > > > >>>>> using ggplot > > > > > > > > > >>>>> > > > > > > > > > >>>>> Once I have a working prototype I will work add some > > > > > syntactic > > > > > > > > sugar > > > > > > > > > > to > > > > > > > > > >>>>> prepare the matrix from the scala side and pass to > > > zeppelin > > > > > > > (using > > > > > > > > > >>>> resource > > > > > > > > > >>>>> pools so the same functionality can be reused in > Flink) > > > and > > > > > an > > > > > > R > > > > > > > > > >>> library > > > > > > > > > >>>>> containing some functions which will pull the data > out > > of > > > > the > > > > > > > > > resource > > > > > > > > > >>>> pool > > > > > > > > > >>>>> and spit out a dataframe. > > > > > > > > > >>>>> > > > > > > > > > >>>>> Once its in a Dataframe in R- go nuts with any > plotting > > > > > package > > > > > > > you > > > > > > > > > >>> like. > > > > > > > > > >>>>> Likewise, it should be possible to do the same thing > > with > > > > > > > > matplotlib > > > > > > > > > >>> and > > > > > > > > > >>>>> python ( > > > > > > > https://gist.github.com/andershammar/9070e0f6916a0fbda7a5) > > > > > > > > > >>>>> > > > > > > > > > >>>>> All of this doesn't necessarily require any changing > of > > > the > > > > > > > > Zeppelin > > > > > > > > > >>>> source > > > > > > > > > >>>>> code, and isn't very intrusive or difficult to set > up, > > > I'll > > > > > > make > > > > > > > a > > > > > > > > > > blog > > > > > > > > > >>>>> post but its almost a text book entry tutorial on > using > > > > > imports > > > > > > > in > > > > > > > > > >>>>> Zeppelin. (e.g. a tutorial would be just as at home > on > > > the > > > > > > > Zeppelin > > > > > > > > > >>> site > > > > > > > > > >>>> as > > > > > > > > > >>>>> it would on the Mahout site). > > > > > > > > > >>>>> > > > > > > > > > >>>>> Now, there has been some talk of using Zeppelin's > > > > angularJS. > > > > > > > > Things > > > > > > > > > >>> get > > > > > > > > > >>>> a > > > > > > > > > >>>>> little more harry in that case, but we could make an > > > > optional > > > > > > > build > > > > > > > > > >>>> profile > > > > > > > > > >>>>> that would make zeppelin recognize matrices at tables > > and > > > > > > expose > > > > > > > > all > > > > > > > > > > of > > > > > > > > > >>>> the > > > > > > > > > >>>>> built in charting features of Zeppelin. > > > > > > > > > >>>>> > > > > > > > > > >>>>> If you're not adding a bunch of custom charts to > > Zeppelin > > > > > > (which > > > > > > > > > would > > > > > > > > > >>> be > > > > > > > > > >>>>> somewhat tedious), you're going to end up with a lot > of > > > > > > examples > > > > > > > > > where > > > > > > > > > >>>> you > > > > > > > > > >>>>> create a table in Mahout/Spark pass it to AngularJS > > then > > > > some > > > > > > > > > > AngularJS > > > > > > > > > >>>>> code charts it for you. At that point however, > you're > > > > doing > > > > > > just > > > > > > > > as > > > > > > > > > >>> much > > > > > > > > > >>>>> work, if not more than it would be to simply pass to > R > > or > > > > > > Python > > > > > > > > and > > > > > > > > > >>> let > > > > > > > > > >>>>> ggplot or matlibplot do the work for you. > > > > > > > > > >>>>> > > > > > > > > > >>>>> Finally, I haven't run into any errors yet using Kyro > > > > (which > > > > > in > > > > > > > > part > > > > > > > > > > is > > > > > > > > > >>>>> what makes me fear I'm not doing this right... it was > > too > > > > > > > easy...) > > > > > > > > If > > > > > > > > > >>>>> anything seems redundant or missing, please call it > > out. > > > > > > > > > >>>>> > > > > > > > > > >>>>> Add Properties to Spark interp: > > > > > > > > > >>>>> > > > > > > > > > >>>>> spark.kryo.registrator > > > > > > > > > >>>>> > > org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator > > > > > > > > > >>>>> spark.serializer > > > org.apache.spark.serializer.KryoSerializer > > > > > > > > > >>>>> > > > > > > > > > >>>>> Add artifacts (need to change these to maven not > local, > > > > also > > > > > > need > > > > > > > > to > > > > > > > > > >>>>> add/change one jar per below, however this does run): > > > > > > > > > >>>>> > > > > > > > > > >>>>> > > > > > > > > > >>>>> > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > /home/trevor/.m2/repository/org/apache/mahout/mahout-math/0.12.1-SNAPSHOT/mahout-math-0.12.1-SNAPSHOT.jar > > > > > > > > > >>>>> > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > /home/trevor/.m2/repository/org/apache/mahout/mahout-math-scala_2.10/0.12.1-SNAPSHOT/mahout-math-scala_2.10-0.12.1-SNAPSHOT.jar > > > > > > > > > >>>>> > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > /home/trevor/.m2/repository/org/apache/mahout/mahout-spark_2.10/0.12.1-SNAPSHOT/mahout-spark_2.10-0.12.1-SNAPSHOT.jar > > > > > > > > > >>>>> > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > /home/trevor/.m2/repository/org/apache/mahout/mahout-spark-shell_2.10/0.12.1-SNAPSHOT/mahout-spark-shell_2.10-0.12.1-SNAPSHOT.jar > > > > > > > > > >>>>> Add following code to first paragraph of notebook: > > > > > > > > > >>>>> ``` > > > > > > > > > >>>>> %spark > > > > > > > > > >>>>> import org.apache.mahout.math._ > > > > > > > > > >>>>> import org.apache.mahout.math.scalabindings._ > > > > > > > > > >>>>> import org.apache.mahout.math.drm._ > > > > > > > > > >>>>> import > org.apache.mahout.math.scalabindings.RLikeOps._ > > > > > > > > > >>>>> import org.apache.mahout.math.drm.RLikeDrmOps._ > > > > > > > > > >>>>> import org.apache.mahout.sparkbindings._ > > > > > > > > > >>>>> > > > > > > > > > >>>>> implicit val sdc: > > > > > > > > > >>>> > org.apache.mahout.sparkbindings.SparkDistributedContext > > = > > > > > > > > > >>>>> sc2sdc(sc) > > > > > > > > > >>>>> ``` > > > > > > > > > >>>>> > > > > > > > > > >>>>> > > > > > > > > > >>>>> > > > > > > > > > >>>>> Trevor Grant > > > > > > > > > >>>>> Data Scientist > > > > > > > > > >>>>> https://github.com/rawkintrevo > > > > > > > > > >>>>> http://stackexchange.com/users/3002022/rawkintrevo > > > > > > > > > >>>>> http://trevorgrant.org > > > > > > > > > >>>>> > > > > > > > > > >>>>> *"Fortunate is he, who is able to know the causes of > > > > things." > > > > > > > > > > -Virgil* > > > > > > > > > >>>>> > > > > > > > > > >>>>> > > > > > > > > > >>>>> On Mon, May 16, 2016 at 6:42 PM, Pat Ferrel < > > > > > > > p...@occamsmachete.com > > > > > > > > > > > > > > > > > > >>>> wrote: > > > > > > > > > >>>>>> Creating an mc used to do some Kryo setup, like > > > > registering > > > > > > > > > >>> serializers > > > > > > > > > >>>>> or > > > > > > > > > >>>>>> serializer factories IIRC. Also there is the Spark > > conf > > > > for > > > > > > > > > >>> allocating > > > > > > > > > >>>>>> memory for the Kryo buffer. Look at the code in the > mc > > > > > > creation > > > > > > > > code > > > > > > > > > >>> in > > > > > > > > > >>>>> the > > > > > > > > > >>>>>> Spark package helpers. All can be done in straight > > Spark > > > > and > > > > > > > > passed > > > > > > > > > >>> in > > > > > > > > > >>>> to > > > > > > > > > >>>>>> create the mc when needed. Again from old weak brain > > > cells > > > > > > but I > > > > > > > > > >>> think > > > > > > > > > >>>>> that > > > > > > > > > >>>>>> is part of what makes the Mahout shell different > than > > > teh > > > > > > Spark > > > > > > > > > shell > > > > > > > > > >>>>> plus > > > > > > > > > >>>>>> imports, it auto-creates the mc instead of or along > > with > > > > an > > > > > > sc. > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> When I get back to my computer I can check. > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> On May 16, 2016, at 3:40 PM, Andrew Palumbo < > > > > > > ap....@outlook.com > > > > > > > > > > > > > > > > > >>>> wrote: > > > > > > > > > >>>>>> Trevor, > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> Could you post any kryo errors that you may be > having? > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> ________________________________ > > > > > > > > > >>>>>> From: Andrew Palumbo <ap....@outlook.com> > > > > > > > > > >>>>>> Sent: Monday, May 16, 2016 6:25:07 PM > > > > > > > > > >>>>>> To: mahout > > > > > > > > > >>>>>> Subject: Future Mahout - Zeppelin work > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> To Dmitriy's point, I agree ggplot is def the > > priority, > > > > The > > > > > > > > mahout > > > > > > > > > >>>> plots > > > > > > > > > >>>>>> are at this point are really just a POC, but at some > > > point > > > > > we > > > > > > > may > > > > > > > > be > > > > > > > > > >>>> want > > > > > > > > > >>>>>> to integrate some data transformation features into > > the > > > > > mahout > > > > > > > > plots > > > > > > > > > >>>>>> classes so they're really more future work. > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> long story short: > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > > >>>>>>> OK. I'll read through the examples and try to do > > > > something > > > > > > with > > > > > > > > > some > > > > > > > > > >>>>>> data, then do a ggplot and/or an angular plot on it > > > > > (probably > > > > > > > > > >>> ggplot). > > > > > > > > > >>>>>>> I'll do a quick tutorial. Then I'll reopen > discussion > > > on > > > > > that > > > > > > > > > >>> Zeppelin > > > > > > > > > >>>>>> issue about weather we want to go ahead and add > > another > > > > > > > > interpreter. > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> Souds Great. > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> Thank you. > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> ________________________________ > > > > > > > > > >>>>>> From: Trevor Grant <trevor.d.gr...@gmail.com> > > > > > > > > > >>>>>> Sent: Monday, May 16, 2016 5:49:17 PM > > > > > > > > > >>>>>> To: Dmitriy Lyubimov > > > > > > > > > >>>>>> Cc: Andrew Palumbo; Pat Ferrel; Suneel Marthi > > > > > > > > > >>>>>> Subject: Re: Intro - Future Mahout - Zeppelin work > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> I just signed up for dev, should i just reply all > and > > cc > > > > dev > > > > > > or > > > > > > > > > >>> start a > > > > > > > > > >>>>>> new thread? > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> Trevor Grant > > > > > > > > > >>>>>> Data Scientist > > > > > > > > > >>>>>> https://github.com/rawkintrevo > > > > > > > > > >>>>>> [ > > > > https://avatars3.githubusercontent.com/u/5852441?v=3&s=400 > > > > > ]< > > > > > > > > > >>>>>> https://github.com/rawkintrevo> > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> rawkintrevo (Trevor Grant) · GitHub< > > > > > > > > https://github.com/rawkintrevo> > > > > > > > > > >>>>>> github.com > > > > > > > > > >>>>>> rawkintrevo has 12 repositories written in Python, > > > > > Batchfile, > > > > > > > and > > > > > > > > R. > > > > > > > > > >>>>>> Follow their code on GitHub. > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> http://stackexchange.com/users/3002022/rawkintrevo > > > > > > > > > >>>>>> http://trevorgrant.org > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> "Fortunate is he, who is able to know the causes of > > > > things." > > > > > > > > > -Virgil > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> On Mon, May 16, 2016 at 4:46 PM, Dmitriy Lyubimov < > > > > > > > > > dlie...@gmail.com > > > > > > > > > >>>>>> <mailto:dlie...@gmail.com>> wrote: > > > > > > > > > >>>>>> fwiw ggplot2 is pretty darn advanced:) i am a bit > > > > skeptical > > > > > > > smile > > > > > > > > > >>> would > > > > > > > > > >>>>>> have something that ggplot2 would not, the other way > > > > around > > > > > is > > > > > > > > much > > > > > > > > > >>>> more > > > > > > > > > >>>>>> expected by me:) > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> anyhow if ggplot2 and matplotlib are available in > > > Zeppelin > > > > > > > without > > > > > > > > > >>>> major > > > > > > > > > >>>>>> limitations, it sounds like Zeppelin should be an > all > > > > around > > > > > > > very > > > > > > > > > >>> nice > > > > > > > > > >>>>>> venue then. > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> On Mon, May 16, 2016 at 2:42 PM, Andrew Palumbo < > > > > > > > > ap....@outlook.com > > > > > > > > > >>>>>> <mailto:ap....@outlook.com>> wrote: > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> yeah we should probably move this over to dev@ > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> sorry- answering a question from a couple emails > back > > on > > > > the > > > > > > > > thread. > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> If possible, I think it would be great to > eventually > > > have > > > > > > both > > > > > > > > > >>> (native > > > > > > > > > >>>>>> mahout/smile plots and ggplot), since in the future > > > we're > > > > > > going > > > > > > > to > > > > > > > > > be > > > > > > > > > >>>>>> adding more visualization features rather than > simple > > > > > scatter > > > > > > > > plots > > > > > > > > > >>> etc > > > > > > > > > >>>>>> that may not be covered by ggplot. > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> That's why we were thinking about using angular and > > the > > > > > pngs. > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> But what youre saying in your last email would be > > great! > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> Thank you! > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> ________________________________ > > > > > > > > > >>>>>> From: Trevor Grant <trevor.d.gr...@gmail.com > <mailto: > > > > > > > > > >>>>>> trevor.d.gr...@gmail.com>> > > > > > > > > > >>>>>> Sent: Monday, May 16, 2016 5:33:12 PM > > > > > > > > > >>>>>> To: Andrew Palumbo > > > > > > > > > >>>>>> Cc: Pat Ferrel; Suneel Marthi; Dmitriy Lyubimov > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> Subject: Re: Intro - Future Mahout - Zeppelin work > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> I somehow replied to your last email without seeing > > > it... > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> OK. I'll read through the examples and try to do > > > something > > > > > > with > > > > > > > > some > > > > > > > > > >>>>> data, > > > > > > > > > >>>>>> then do a ggplot and/or an angular plot on it > > (probably > > > > > > ggplot). > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> I'll do a quick tutorial. Then I'll reopen > discussion > > on > > > > > that > > > > > > > > > >>> Zeppelin > > > > > > > > > >>>>>> issue about weather we want to go ahead and add > > another > > > > > > > > interpreter. > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> Trevor Grant > > > > > > > > > >>>>>> Data Scientist > > > > > > > > > >>>>>> https://github.com/rawkintrevo > > > > > > > > > >>>>>> http://stackexchange.com/users/3002022/rawkintrevo > > > > > > > > > >>>>>> http://trevorgrant.org > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> "Fortunate is he, who is able to know the causes of > > > > things." > > > > > > > > > -Virgil > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> On Mon, May 16, 2016 at 4:26 PM, Trevor Grant < > > > > > > > > > >>>> trevor.d.gr...@gmail.com > > > > > > > > > >>>>>> <mailto:trevor.d.gr...@gmail.com>> wrote: > > > > > > > > > >>>>>> sorry for double email but are you thinking > > > visualization > > > > > > should > > > > > > > > be > > > > > > > > > a > > > > > > > > > >>>>>> library internal to mahout or should we leverage > > > zeppelins > > > > > > > > > >>>> visualization > > > > > > > > > >>>>>> capabilities? > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> Also, should we move this discussion to dev? > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> tg > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> Trevor Grant > > > > > > > > > >>>>>> Data Scientist > > > > > > > > > >>>>>> https://github.com/rawkintrevo > > > > > > > > > >>>>>> http://stackexchange.com/users/3002022/rawkintrevo > > > > > > > > > >>>>>> http://trevorgrant.org > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> "Fortunate is he, who is able to know the causes of > > > > things." > > > > > > > > > -Virgil > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> On Mon, May 16, 2016 at 4:14 PM, Andrew Palumbo < > > > > > > > > ap....@outlook.com > > > > > > > > > >>>>>> <mailto:ap....@outlook.com>> wrote: > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> Sorry- to be a little more clear, Part of what > we're > > > > trying > > > > > > to > > > > > > > is > > > > > > > > > to > > > > > > > > > >>>> get > > > > > > > > > >>>>>> the new plotting features integrated with Zeppelin. > We > > > > plan > > > > > on > > > > > > > > > adding > > > > > > > > > >>>>> more > > > > > > > > > >>>>>> advanced plotting. > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> ________________________________ > > > > > > > > > >>>>>> From: Andrew Palumbo <ap....@outlook.com<mailto: > > > > > > > > ap....@outlook.com > > > > > > > > > >> > > > > > > > > > >>>>>> Sent: Monday, May 16, 2016 5:04:49 PM > > > > > > > > > >>>>>> To: Pat Ferrel; Trevor Grant > > > > > > > > > >>>>>> Cc: Suneel Marthi; Dmitriy Lyubimov > > > > > > > > > >>>>>> Subject: Re: Intro - Future Mahout - Zeppelin work > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> Awesome! > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> most of the hard work was done by Dmitriy[??] , I've > > > just > > > > > > > reworked > > > > > > > > > >>> it a > > > > > > > > > >>>>>> couple of times to keep up with spark's refactoring. > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> I think that you will also need to include: > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> > > > mahout-spark_2.10-0.12.1-SNAPSHOT-dependency-reduced.jar > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> For the new plotting features that we're working on. > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> the plotting is still a work in progress, and the > grid > > > and > > > > > > > surface > > > > > > > > > >>>> plots > > > > > > > > > >>>>>> are not working properly. The plots are swing based > > and > > > > can > > > > > > > > > >>> currently > > > > > > > > > >>>> be > > > > > > > > > >>>>>> exported as PNGs. There are a few examples on the > > > closed > > > > > PR: > > > > > > > > > >>>>>> https://github.com/apache/mahout/pull/230 > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> There is an example script in > > > > > > > examples/bin/spark-shell-plot.mscala > > > > > > > > > >>>>>> (commited to master) : > > > > > > > > > >>>>>> > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/mahout/blob/master/examples/bin/spark-shell-plot.mscala > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> Thanks! > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> ________________________________ > > > > > > > > > >>>>>> From: Pat Ferrel <p...@occamsmachete.com<mailto: > > > > > > > > > p...@occamsmachete.com > > > > > > > > > >>>>>> Sent: Monday, May 16, 2016 4:54:15 PM > > > > > > > > > >>>>>> To: Trevor Grant > > > > > > > > > >>>>>> Cc: Andrew Palumbo; Suneel Marthi; Dmitriy Lyubimov > > > > > > > > > >>>>>> Subject: Re: Intro - Future Mahout - Zeppelin work > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> This is only the beginning. Andy has been using > Smile > > > as a > > > > > > > > > >>>> visualization > > > > > > > > > >>>>>> lib since it is pretty rich in ML support. We are > > > looking > > > > at > > > > > > > > > >>>> integrating > > > > > > > > > >>>>>> some of that with Zeppelin then adding code to feed > > the > > > > new > > > > > > > > > >>>>> visualizations > > > > > > > > > >>>>>> in Mahout. I’m here because I’m fairly familiar with > > > > > AngularJS > > > > > > > if > > > > > > > > > >>>> that’s > > > > > > > > > >>>>>> the way to go. Smile is swing based but can output > > pngs, > > > > > maybe > > > > > > > > other > > > > > > > > > >>>>> image > > > > > > > > > >>>>>> formats—Andy? > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> BTW Dmitriy is still very involved but has rouble > > > getting > > > > > > > > permission > > > > > > > > > >>> to > > > > > > > > > >>>>>> donate code. > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> On May 16, 2016, at 1:45 PM, Trevor Grant < > > > > > > > > trevor.d.gr...@gmail.com > > > > > > > > > >>>>>> <mailto:trevor.d.gr...@gmail.com>> wrote: > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> Hey Andrew, > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> thanks- you basically did all of the hard work for > me! > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> I've got the linear regression example working from: > > > > > > > > > >>>>>> > > > > > > > > > http://mahout.apache.org/users/sparkbindings/play-with-shell.html > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> my java is sketchy at best, i tend to over import. I > > > > pulled > > > > > in > > > > > > > the > > > > > > > > > >>>>>> following jars: > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org/apache/mahout/mahout-math/0.12.1-SNAPSHOT/mahout-math-0.12.1-SNAPSHOT.jar > > > > > > > > > >>>>>> > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org/apache/mahout/mahout-math-scala_2.10/0.12.1-SNAPSHOT/mahout-math-scala_2.10-0.12.1-SNAPSHOT.jar > > > > > > > > > >>>>>> > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org/apache/mahout/mahout-spark_2.10/0.12.1-SNAPSHOT/mahout-spark_2.10-0.12.1-SNAPSHOT.jar > > > > > > > > > >>>>>> > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org/apache/mahout/mahout-spark-shell_2.10/0.12.1-SNAPSHOT/mahout-spark-shell_2.10-0.12.1-SNAPSHOT.jar > > > > > > > > > >>>>>> I think those are all necessary... should I be > > pulling > > > in > > > > > > more? > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> I hate to say it (but will do so bc this isn't > public) > > > > this > > > > > > > > > >>> integration > > > > > > > > > >>>>> is > > > > > > > > > >>>>>> super easy from a user perspective, almost too easy- > > eg > > > > why > > > > > > not > > > > > > > > let > > > > > > > > > >>> the > > > > > > > > > >>>>>> user add it themselves... Add the appropriate maven > > > > > > artifacts, > > > > > > > > > >>> restart > > > > > > > > > >>>>> the > > > > > > > > > >>>>>> interpreter and run the following in a notebook: > > > > > > > > > >>>>>> ``` > > > > > > > > > >>>>>> import org.apache.mahout.math._ > > > > > > > > > >>>>>> import org.apache.mahout.math.scalabindings._ > > > > > > > > > >>>>>> import org.apache.mahout.math.drm._ > > > > > > > > > >>>>>> import > org.apache.mahout.math.scalabindings.RLikeOps._ > > > > > > > > > >>>>>> import org.apache.mahout.math.drm.RLikeDrmOps._ > > > > > > > > > >>>>>> import org.apache.mahout.sparkbindings._ > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> implicit val sdc: > > > > > > > > > >>>> > org.apache.mahout.sparkbindings.SparkDistributedContext > > > > > > > > > >>>>>> = sc2sdc(sc) > > > > > > > > > >>>>>> ``` > > > > > > > > > >>>>>> Then whatever code you want and you're off to the > > > races... > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> that said, adding a build profile like -PsparkMahout > > and > > > > > > > creating > > > > > > > > an > > > > > > > > > >>>>>> interpretter like %spark.mahout should be fairly > > > straight > > > > > > > forward. > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> Second question, do you have an example that would > be > > > more > > > > > > > > > >>>> 'visualization > > > > > > > > > >>>>>> friendly'? I could pass the results to Angular or R > > just > > > > to > > > > > > show > > > > > > > > off > > > > > > > > > >>>> how > > > > > > > > > >>>>> to > > > > > > > > > >>>>>> do it. > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> Which leads back to the question, is this even worth > > > > > building > > > > > > a > > > > > > > > full > > > > > > > > > >>>>>> interpreter for or just make a really nice blog post > > > with > > > > > > > examples > > > > > > > > > on > > > > > > > > > >>>> how > > > > > > > > > >>>>>> to integrate with R...? > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> Trevor Grant > > > > > > > > > >>>>>> Data Scientist > > > > > > > > > >>>>>> https://github.com/rawkintrevo > > > > > > > > > >>>>>> http://stackexchange.com/users/3002022/rawkintrevo > > > > > > > > > >>>>>> http://trevorgrant.org<http://trevorgrant.org/> > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> "Fortunate is he, who is able to know the causes of > > > > things." > > > > > > > > > -Virgil > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> On Mon, May 16, 2016 at 2:09 PM, Andrew Palumbo < > > > > > > > > ap....@outlook.com > > > > > > > > > >>>>>> <mailto:ap....@outlook.com>> wrote: > > > > > > > > > >>>>>> Hi Trevor, welcome! > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> It's great to have you helping out, thanks very > much. > > > > I've > > > > > > > done a > > > > > > > > > >>> good > > > > > > > > > >>>>>> amount of work on our mahout spark shell .. so let > me > > > know > > > > > if > > > > > > > you > > > > > > > > > >>> have > > > > > > > > > >>>>> any > > > > > > > > > >>>>>> questions there about what we did there.. > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> Thanks alot! > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> Andy > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> -------- Original message -------- > > > > > > > > > >>>>>> From: Suneel Marthi <smar...@apache.org<mailto: > > > > > > > smar...@apache.org > > > > > > > > >> > > > > > > > > > >>>>>> Date: 05/16/2016 2:44 PM (GMT-05:00) > > > > > > > > > >>>>>> To: Trevor Grant <trevor.d.gr...@gmail.com<mailto: > > > > > > > > > >>>>> trevor.d.gr...@gmail.com > > > > > > > > > >>>>>> Cc: Suneel Marthi <smar...@apache.org<mailto: > > > > > > smar...@apache.org > > > > > > > > >>, > > > > > > > > > >>> Pat > > > > > > > > > >>>>>> Ferrel <p...@occamsmachete.com<mailto: > > > > p...@occamsmachete.com > > > > > >>, > > > > > > > > > Andrew > > > > > > > > > >>>>>> Palumbo <ap....@outlook.com<mailto: > ap....@outlook.com > > >> > > > > > > > > > >>>>>> Subject: Re: Intro - Future Mahout - Zeppelin work > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> Oh yes, he's around. I see him online. > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> On Mon, May 16, 2016 at 2:42 PM, Trevor Grant < > > > > > > > > > >>>> trevor.d.gr...@gmail.com > > > > > > > > > >>>>>> <mailto:trevor.d.gr...@gmail.com>> wrote: > > > > > > > > > >>>>>> Is Dmitriy Lyubimov still around? > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> Looks like he created this issue for Zeppelin a > while > > > ago. > > > > > > (The > > > > > > > > old > > > > > > > > > >>>> lost > > > > > > > > > >>>>>> code to which you were referring?) > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> https://issues.apache.org/jira/browse/ZEPPELIN-116 > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> tg > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> Trevor Grant > > > > > > > > > >>>>>> Data Scientist > > > > > > > > > >>>>>> https://github.com/rawkintrevo > > > > > > > > > >>>>>> http://stackexchange.com/users/3002022/rawkintrevo > > > > > > > > > >>>>>> http://trevorgrant.org<http://trevorgrant.org/> > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> "Fortunate is he, who is able to know the causes of > > > > things." > > > > > > > > > -Virgil > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> On Mon, May 16, 2016 at 1:37 PM, Suneel Marthi < > > > > > > > > smar...@apache.org > > > > > > > > > >>>>> <mailto: > > > > > > > > > >>>>>> smar...@apache.org>> wrote: > > > > > > > > > >>>>>> Welcome to the party TG !! > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> On Mon, May 16, 2016 at 2:28 PM, Trevor Grant < > > > > > > > > > >>>> trevor.d.gr...@gmail.com > > > > > > > > > >>>>>> <mailto:trevor.d.gr...@gmail.com>> wrote: > > > > > > > > > >>>>>> Hey all, > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> I'm excited for a chance to help out. I'm actually > > > > getting > > > > > > > ready > > > > > > > > to > > > > > > > > > >>>>>> download now and start playing around. > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> I had talked about this briefly but it given a > > properly > > > > > > > > functioning > > > > > > > > > >>>>>> Zeppelin interpreter for Apache Mahout, one could > > > leverage > > > > > all > > > > > > > of > > > > > > > > > the > > > > > > > > > >>>>>> Zeppelin visualizations, anything in AngularJS, or > > > > anything > > > > > > in R > > > > > > > > > >>>> (through > > > > > > > > > >>>>>> clever use of Zeppelin's Resource Pools). > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> I'll work on getting logged in to the slack channel > as > > > > well. > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> Nice to meet you all, looking forward to helping > out! > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> tg > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> Trevor Grant > > > > > > > > > >>>>>> Data Scientist > > > > > > > > > >>>>>> https://github.com/rawkintrevo > > > > > > > > > >>>>>> http://stackexchange.com/users/3002022/rawkintrevo > > > > > > > > > >>>>>> http://trevorgrant.org<http://trevorgrant.org/> > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> "Fortunate is he, who is able to know the causes of > > > > things." > > > > > > > > > -Virgil > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> On Sun, May 15, 2016 at 12:56 PM, Suneel Marthi < > > > > > > > > smar...@apache.org > > > > > > > > > >>>>>> <mailto:smar...@apache.org>> wrote: > > > > > > > > > >>>>>> FYi... > > > > > > > > > >>>>>> Trevor was there for my talk, so he has some idea of > > > > Mahout > > > > > > > > Samsara. > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> On Sun, May 15, 2016 at 1:51 PM, Pat Ferrel < > > > > > > > > p...@occamsmachete.com > > > > > > > > > >>>>> <mailto: > > > > > > > > > >>>>>> p...@occamsmachete.com>> wrote: > > > > > > > > > >>>>>> Hey Trevor, > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> Good to meet you. As you probably know > Mahout-Samsara > > > is a > > > > > > > > > >>>> reincarnation > > > > > > > > > >>>>>> of the project in a new body, which is less a > > collection > > > > of > > > > > > > > > >>> algorithms > > > > > > > > > >>>>> than > > > > > > > > > >>>>>> a roll-your-own math/algorithm tool. The major > benefit > > > is > > > > > that > > > > > > > > > during > > > > > > > > > >>>>>> experimentation and later in production the code is > by > > > > > nature > > > > > > > > > >>> scalable > > > > > > > > > >>>> on > > > > > > > > > >>>>>> Spark and Flink. Most of the Mahout DSL is R-like > and > > > > > supports > > > > > > > > > tensor > > > > > > > > > >>>>> math > > > > > > > > > >>>>>> but we are now looking at streaming online algo > > support > > > > too. > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> In any case you probably know we have a Mahout > version > > > of > > > > > the > > > > > > > > Spark > > > > > > > > > >>>>> Shell, > > > > > > > > > >>>>>> which has been integrated with an old version of > > > Zeppelin > > > > > > (code > > > > > > > is > > > > > > > > > >>>> lost). > > > > > > > > > >>>>>> Recently Andy has experimented with some very nice > > > > > > > visualizations > > > > > > > > of > > > > > > > > > >>> ML > > > > > > > > > >>>>>> data (not just analytics data). We as a project are > > > > > interested > > > > > > > in > > > > > > > > > >>>>> Zeppelin > > > > > > > > > >>>>>> integration of our shell and graphics. From what I > > > > > understand > > > > > > > the > > > > > > > > > >>>>> graphics > > > > > > > > > >>>>>> extension mechanism of Zeppelin is based on > AngularJS, > > > > > which I > > > > > > > > have > > > > > > > > > >>>> some > > > > > > > > > >>>>>> experience with. > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> So, we’d like to start the conversation about how to > > > > > proceed. > > > > > > We > > > > > > > > > >>> would > > > > > > > > > >>>>>> love some help but will move ahead in any case. > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> Pat > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> On May 15, 2016, at 9:52 AM, Suneel Marthi < > > > > > > smar...@apache.org > > > > > > > > > >>> <mailto: > > > > > > > > > >>>>>> smar...@apache.org>> wrote: > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> Hi Trevor, > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> Nice meeting u last week in Vancouver. Per our > > > > > conversation, > > > > > > I > > > > > > > > > >>> wanted > > > > > > > > > >>>> to > > > > > > > > > >>>>>> introduce u to Andrew Palumbo (Mahout Chair) and Pat > > > > Ferrel > > > > > > > > (Mahout > > > > > > > > > >>>> PMC). > > > > > > > > > >>>>>> As I mentioned in my talk, we are actively looking > at > > > > > Zeppelin > > > > > > > > > >>>>> integration > > > > > > > > > >>>>>> with Mahout (primarily for spark) and would > appreciate > > > > your > > > > > > help > > > > > > > > (as > > > > > > > > > >>>> also > > > > > > > > > >>>>>> all things DL and ML). > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> We definitely can use all your help as we r > revamping > > > the > > > > > > Mahout > > > > > > > > > >>>> project > > > > > > > > > >>>>>> and shedding its legacy MapReduce image. > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> I sent u an invite to the Mahout slack channel, > > > > > > > mahout.apache.org > > > > > > > > < > > > > > > > > > >>>>>> http://mahout.apache.org/> - that's where we all > > > hangout > > > > > and > > > > > > > not > > > > > > > > > >>>> having > > > > > > > > > >>>>>> to worry about avoiding naughty words. > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> Looking forward to working with you > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> Suneel > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > > >>>>> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >