I concur. It seems to work for me the present Zeppelin snapshot version. On Fri, May 20, 2016 at 5:54 PM, Trevor Grant <[email protected]> wrote:
> If appears the jars aren't loading. > > Did you add those artifacts? > > If your version is the one cloned from tills that's fairly ancient. > > I need to update that post badly. > > Do a fresh git clone from apache/incubator-zeppelin the point of my last > post was to get flink 0.10 working w Zeppelin pre release. Zeppelin > snapshot is now on 1.0 > On May 20, 2016 4:35 PM, "Andrew Musselman" <[email protected]> > wrote: > > > In any case, still getting this error in the console when I run this > block: > > > > "import org.apache.mahout.math._ > > import org.apache.mahout.math.scalabindings._ > > import org.apache.mahout.math.drm._ > > import org.apache.mahout.math.scalabindings.RLikeOps._ > > import org.apache.mahout.math.drm.RLikeDrmOps._ > > import org.apache.mahout.sparkbindings._ > > > > implicit val sdc: > org.apache.mahout.sparkbindings.SparkDistributedContext = > > sc2sdc(sc)" > > > > "<console>:21: error: object mahout is not a member of package org.apache > > import org.apache.mahout.math._" > > > > On Fri, May 20, 2016 at 2:31 PM, Andrew Musselman < > > [email protected]> wrote: > > > > > Ah, well I cloned the Till branch per your Nov 3 article.. > > > > > > git clone https://github.com/tillrohrmann/incubator-zeppelin.git > > > > > > On Fri, May 20, 2016 at 2:28 PM, Trevor Grant < > [email protected]> > > > wrote: > > > > > >> That's a "new" feature in the 0.6-snapshot... Say within the last > month > > or > > >> two, how long has it been since you did a git pull? > > >> > > >> I'll update soon with a note on that. > > >> > > >> I can also create a gist with the code. > > >> On May 20, 2016 4:24 PM, "Andrew Musselman" < > [email protected] > > > > > >> wrote: > > >> > > >> > At this step of the tutorial I'm stuck because I don't have an > "Import > > >> > Note" link in my Zeppelin home: > > >> > > > >> > "I’m going to do you another favor. Go to the Zeppelin home page and > > >> click > > >> > on ‘Import Note’. When given the option between URL and json, click > on > > >> URL > > >> > and enter the following link: > > >> > > > >> > > > >> > > > >> > > > https://raw.githubusercontent.com/rawkintrevo/mahout-zeppelin/master/%5BMAHOUT%5D%5BPROVING-GROUNDS%5DLinear%20Regression%20in%20Spark.json > > >> > " > > >> > > > >> > On Fri, May 20, 2016 at 12:35 PM, Trevor Grant < > > >> [email protected]> > > >> > wrote: > > >> > > > >> > > FYI: > > >> > > > > >> > > Looks like Flink shell is fixed :D > > >> > > > > >> > > https://github.com/apache/flink/pull/1913 > > >> > > > > >> > > (I tested, is working good). > > >> > > > > >> > > > > >> > > > > >> > > Trevor Grant > > >> > > Data Scientist > > >> > > https://github.com/rawkintrevo > > >> > > http://stackexchange.com/users/3002022/rawkintrevo > > >> > > http://trevorgrant.org > > >> > > > > >> > > *"Fortunate is he, who is able to know the causes of things." > > >> -Virgil* > > >> > > > > >> > > > > >> > > On Fri, May 20, 2016 at 1:46 PM, Suneel Marthi < > [email protected]> > > >> > wrote: > > >> > > > > >> > > > On Fri, May 20, 2016 at 12:54 PM, Trevor Grant < > > >> > [email protected] > > >> > > > > > >> > > > wrote: > > >> > > > > > >> > > > > Dmitriy really nailed it on the head in his reply to the post > > >> which > > >> > > I'll > > >> > > > > rebroadcast below. In essence the whole reason you are > > >> > (theoretically) > > >> > > > > using Mahout is the data is to big to fit in memory. If it's > to > > >> big > > >> > to > > >> > > > fit > > >> > > > > in memory, well then its probably too big to plot each point > > (e.g. > > >> > > > > trillions of row, you only have so many pixels). For the > > >> example I > > >> > > > > randomly sampled a matrix. > > >> > > > > > > >> > > > > So as Dmitriy says, in Mahout we need to have functions that > > will > > >> > > > > 'preprocess' the data into something plotable. > > >> > > > > > > >> > > > > For the Zepplin-Plotting thing, we need to have a function > that > > >> will > > >> > > spit > > >> > > > > out a tsv like string of the data we wanted plotted. > > >> > > > > > > >> > > > > I agree an honest Mahout interpreter in Zeppelin is probably > > worth > > >> > > doing. > > >> > > > > There are a couple of ways to go about it. I opened up the > > >> discussion > > >> > > on > > >> > > > > dev@Zeppelin and didn't get any replies. I'm going to take > that > > >> to > > >> > > mean > > >> > > > we > > >> > > > > can do it in a way that makes the most sense to Mahout > users... > > >> > > > > > > >> > > > > First steps are to include some methods in Mahout that will do > > >> that > > >> > > > > preprocessing, and one that will turn something into a tsv > > string. > > >> > > > > > > >> > > > > I have some general ideas on possible approached to making an > > >> > > > honest-mahout > > >> > > > > interpreter but I want to play in the code and look at the > > >> > Flink-Mahout > > >> > > > > shell a bit before I try to organize my thoughts and present > > them. > > >> > > > > > > >> > > > > > >> > > > FYI Trevor, there's no Flink-Mahout shell today; in large part > > >> because > > >> > > the > > >> > > > Flink Shell is still busted on their end and we on the Mahout > end > > >> have > > >> > > not > > >> > > > had time to muck with it. What exists today is the Mahout-Spark > > >> shell. > > >> > > > > > >> > > > > > > >> > > > > ...(2) not sure what is the point of supporting distributed > > >> anything. > > >> > > It > > >> > > > is > > >> > > > > distributed presumably because it is hard to keep it in > memory. > > >> > > > Therefore, > > >> > > > > plotting anything distributed potentially presents 2 problems: > > >> > storage > > >> > > > > space and overplotting due to number of points. The idea is > that > > >> we > > >> > > have > > >> > > > to > > >> > > > > work out algorithms that condense big data information into > > small > > >> > > > plottable > > >> > > > > information (like density grids, for example, or > histograms).... > > >> > > > > > > >> > > > > > >> > > > Agreed, something like sampling x% of points from a DRM (like > the > > >> > > visuals I > > >> > > > had from Palumbo for the talk in Vancouver that demonstrated the > > >> > concept) > > >> > > > > > >> > > > > > >> > > > > > > >> > > > > Trevor Grant > > >> > > > > Data Scientist > > >> > > > > https://github.com/rawkintrevo > > >> > > > > http://stackexchange.com/users/3002022/rawkintrevo > > >> > > > > http://trevorgrant.org > > >> > > > > > > >> > > > > *"Fortunate is he, who is able to know the causes of things." > > >> > -Virgil* > > >> > > > > > > >> > > > > > > >> > > > > On Fri, May 20, 2016 at 10:22 AM, Pat Ferrel < > > >> [email protected]> > > >> > > > > wrote: > > >> > > > > > > >> > > > > > Great job Trevor, we’ll need this detail to smooth out the > > sharp > > >> > > edges > > >> > > > > and > > >> > > > > > any guidance from you or the Zeppelin community will be a > big > > >> help. > > >> > > > > > > > >> > > > > > > > >> > > > > > On May 20, 2016, at 8:13 AM, Shannon Quinn < > [email protected] > > > > > >> > > wrote: > > >> > > > > > > > >> > > > > > Agreed, thoroughly enjoying the blog post. > > >> > > > > > > > >> > > > > > On 5/19/16 12:01 AM, Andrew Palumbo wrote: > > >> > > > > > > Well done, Trevor! I've not yet had a chance to try this > in > > >> > > zeppelin > > >> > > > > > but I just read the blog which is great! > > >> > > > > > > > > >> > > > > > > -------- Original message -------- > > >> > > > > > > From: Trevor Grant <[email protected]> > > >> > > > > > > Date: 05/18/2016 2:44 PM (GMT-05:00) > > >> > > > > > > To: [email protected] > > >> > > > > > > Subject: Re: Future Mahout - Zeppelin work > > >> > > > > > > > > >> > > > > > > Ah thank you. > > >> > > > > > > > > >> > > > > > > Fixing now. > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > Trevor Grant > > >> > > > > > > Data Scientist > > >> > > > > > > https://github.com/rawkintrevo > > >> > > > > > > http://stackexchange.com/users/3002022/rawkintrevo > > >> > > > > > > http://trevorgrant.org > > >> > > > > > > > > >> > > > > > > *"Fortunate is he, who is able to know the causes of > > things." > > >> > > > -Virgil* > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > On Wed, May 18, 2016 at 1:04 PM, Andrew Palumbo < > > >> > > [email protected]> > > >> > > > > > wrote: > > >> > > > > > > > > >> > > > > > >> Hey Trevor- Just refreshed your readme. The jar that I > > >> > mentioned > > >> > > is > > >> > > > > > >> actually: > > >> > > > > > >> > > >> > > > > > >> > > >> > > > > > >> > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > /home/username/.m2/repository/org/apache/mahout/mahout-spark_2.10/0.12.1-SNAPSHOT/mahout-spark_2.10-0.12.1-SNAPSHOT-dependency-reduced.jar > > >> > > > > > >> > > >> > > > > > >> rather than: > > >> > > > > > >> > > >> > > > > > >> > > >> > > > > > >> > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > /home/username/.m2/repository/org/apache/mahout/mahout-spark-shell_2.10/0.12.1-SNAPSHOT/mahout-spark_2.10-0.12.1-SNAPSHOT-dependency-reduced.jar > > >> > > > > > >> > > >> > > > > > >> (In the spark module that is) > > >> > > > > > >> ________________________________________ > > >> > > > > > >> From: Trevor Grant <[email protected]> > > >> > > > > > >> Sent: Wednesday, May 18, 2016 11:02:43 AM > > >> > > > > > >> To: [email protected] > > >> > > > > > >> Subject: Re: Future Mahout - Zeppelin work > > >> > > > > > >> > > >> > > > > > >> ah yes- I remember you pointing that out to me too. > > >> > > > > > >> > > >> > > > > > >> I got side tracked yesterday for most of the day on an > > >> adventure > > >> > > in > > >> > > > > > getting > > >> > > > > > >> Zeppelin to work right after I accidently updated to the > > new > > >> > > > snapshot > > >> > > > > > (free > > >> > > > > > >> hint: the secret was to clear my cache *face-palm*) > > >> > > > > > >> > > >> > > > > > >> I'm going to add that dependency to the readme.md now. > > >> > > > > > >> > > >> > > > > > >> thanks, > > >> > > > > > >> tg > > >> > > > > > >> > > >> > > > > > >> Trevor Grant > > >> > > > > > >> Data Scientist > > >> > > > > > >> https://github.com/rawkintrevo > > >> > > > > > >> http://stackexchange.com/users/3002022/rawkintrevo > > >> > > > > > >> http://trevorgrant.org > > >> > > > > > >> > > >> > > > > > >> *"Fortunate is he, who is able to know the causes of > > things." > > >> > > > > -Virgil* > > >> > > > > > >> > > >> > > > > > >> > > >> > > > > > >> On Wed, May 18, 2016 at 9:59 AM, Andrew Palumbo < > > >> > > [email protected] > > >> > > > > > > >> > > > > > >> wrote: > > >> > > > > > >> > > >> > > > > > >>> Trevor this is very cool- I have not been able to look > at > > it > > >> > > > closely > > >> > > > > > yet > > >> > > > > > >>> but just a small point: I believe that you'll also need > to > > >> add > > >> > > the > > >> > > > > > >>> > > >> > > > > > >>> mahout-spark_2.10-0.12.1-SNAPSHOT-dependency-reduced.jar > > >> > > > > > >>> > > >> > > > > > >>> For things like the classification stats, confusion > > matrix, > > >> and > > >> > > > > > t-digest. > > >> > > > > > >>> > > >> > > > > > >>> Andy > > >> > > > > > >>> > > >> > > > > > >>> ________________________________________ > > >> > > > > > >>> From: Trevor Grant <[email protected]> > > >> > > > > > >>> Sent: Wednesday, May 18, 2016 10:47:21 AM > > >> > > > > > >>> To: [email protected] > > >> > > > > > >>> Subject: Re: Future Mahout - Zeppelin work > > >> > > > > > >>> > > >> > > > > > >>> I still need to update my readme/env per Pat's comments > > >> below, > > >> > > > > however > > >> > > > > > >> with > > >> > > > > > >>> out further ado, I present two notebooks that integrate > > >> Mahout > > >> > + > > >> > > > > Spark > > >> > > > > > + > > >> > > > > > >>> Zeppelin + ggplot2 > > >> > > > > > >>> > > >> > > > > > >>> https://github.com/rawkintrevo/mahout-zeppelin > > >> > > > > > >>> > > >> > > > > > >>> Supposing you have a somewhat recent version of Zeppelin > > 0.6 > > >> > with > > >> > > > > > sparkr > > >> > > > > > >>> support running already, you may import the following > raw > > >> notes > > >> > > > > > directly > > >> > > > > > >>> into Zeppelin: > > >> > > > > > >>> > > >> > > > > > >>> > > >> > > > > > >>> > > >> > > > > > >> > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > https://raw.githubusercontent.com/rawkintrevo/mahout-zeppelin/master/%5BMAHOUT%5D%5BPROVING-GROUNDS%5DLinear%20Regression%20in%20Spark.json > > >> > > > > > >>> > > >> > > > > > >>> > > >> > > > > > >> > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > https://raw.githubusercontent.com/rawkintrevo/mahout-zeppelin/master/%5BMAHOUT%5D%5BPROVING-GROUNDS%5DSpark-Mahout%2Bggplot2.json > > >> > > > > > >>> So my thoughs on next steps, which I'm positing only as > a > > >> > > starting > > >> > > > > > point > > >> > > > > > >>> for discussion, and are in no particular order of > > >> importance: > > >> > > > > > >>> > > >> > > > > > >>> - Blog on HOWTO for everyman (assumes no familiarity > with > > >> > Mahout, > > >> > > > and > > >> > > > > > >> only > > >> > > > > > >>> enough familiarity with Zeppelin to have Zeppelin + > SparkR > > >> > > support) > > >> > > > > > >>> - Some syntactic sugar somewhere in Mahout to convert a > > >> matrix > > >> > > > into a > > >> > > > > > tsv > > >> > > > > > >>> string. (with some sanity, eg a sample of a matrix) > > >> > > > > > >>> - Figure out with Zeppelin community what deeper > > integration > > >> > > feels > > >> > > > > > like - > > >> > > > > > >>> e.g. build-profile vs. tutorial > > >> > > > > > >>> - I think the case for making a build-profile is that > > >> > Zeppelin > > >> > > is > > >> > > > > > first > > >> > > > > > >>> and foremost a datascience tool for non technical users. > > >> > > > > > >>> - If we go that route I'll need some more support > > finding > > >> out > > >> > > > what > > >> > > > > is > > >> > > > > > >> the > > >> > > > > > >>> absolute minimum 'bare-bones' mahout we can include, > e.g. > > >> does > > >> > > the > > >> > > > > user > > >> > > > > > >>> have to have mahout installed? To be discussed. > > >> > > > > > >>> - Add matplotlib (python) "support" -> paragraph showing > > >> how to > > >> > > do > > >> > > > > the > > >> > > > > > >> same > > >> > > > > > >>> thing in Python. > > >> > > > > > >>> > > >> > > > > > >>> The basic deal here is we are: > > >> > > > > > >>> 1) Setting up a standard Zeppelin Spark Interpretter to > > act > > >> > like > > >> > > a > > >> > > > > > Mahout > > >> > > > > > >>> interpretter > > >> > > > > > >>> - This is taken care of by setting some env. > > variables, > > >> > > adding > > >> > > > > some > > >> > > > > > >>> dependencies, and importing relevent packages > > >> > > > > > >>> 2) do mahout things as you do > > >> > > > > > >>> 3) export table to tsv string, which is passed to a > > resource > > >> > pool > > >> > > > > > >>> - This could be done to a disk if you didn't have > > >> zeppelin > > >> > > > > > >>> 4) read the tsv from the resource pool (or disk if you > > >> didn't > > >> > > have > > >> > > > > > >>> zeppelin) in R (python soon) and create a <plot package > of > > >> your > > >> > > > > choice> > > >> > > > > > >>> > > >> > > > > > >>> To Pat's point- this is a kind of clumsy pipeline, > however > > >> the > > >> > > > > Zeppelin > > >> > > > > > >>> wrapper at least makes it *feel* less so. > > >> > > > > > >>> > > >> > > > > > >>> > > >> > > > > > >>> Trevor Grant > > >> > > > > > >>> Data Scientist > > >> > > > > > >>> https://github.com/rawkintrevo > > >> > > > > > >>> http://stackexchange.com/users/3002022/rawkintrevo > > >> > > > > > >>> http://trevorgrant.org > > >> > > > > > >>> > > >> > > > > > >>> *"Fortunate is he, who is able to know the causes of > > >> things." > > >> > > > > -Virgil* > > >> > > > > > >>> > > >> > > > > > >>> > > >> > > > > > >>> On Tue, May 17, 2016 at 1:17 PM, Pat Ferrel < > > >> > > [email protected] > > >> > > > > > > >> > > > > > >> wrote: > > >> > > > > > >>>> Seems like there is plenty to use in ggplot or python > but > > >> the > > >> > > > > pipeline > > >> > > > > > >> is > > >> > > > > > >>>> a little convoluted (so maybe no need for Angular > > >> > integration). > > >> > > To > > >> > > > > get > > >> > > > > > >>>> graphics out of Mahout it would be nice to not require > > >> > knowledge > > >> > > > of > > >> > > > > R > > >> > > > > > >>>> and/or python. Knowing Mahout is already bad enough > but I > > >> > guess > > >> > > > the > > >> > > > > > API > > >> > > > > > >>>> from the Mahout side for plotting could be Scala > > syntactic > > >> > > sugar. > > >> > > > > What > > >> > > > > > >>> and > > >> > > > > > >>>> how this all is installed and setup is the next > question. > > >> > > > > > >>>> > > >> > > > > > >>>> BTW this is what I use elsewhere (Mahout as a lib to > this > > >> > code) > > >> > > > > > >>>> > > >> > > > > > >>>> "spark.serializer": > > >> > > > > "org.apache.spark.serializer.KryoSerializer", > > >> > > > > > >>>> "spark.kryo.registrator": > > >> > > > > > >>>> > > "org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator", > > >> > > > > > >>>> "spark.kryo.referenceTracking": "false", > > >> > > > > > >>>> "spark.kryoserializer.buffer": "300m”, > > >> > > > > > >>>> > > >> > > > > > >>>> afaik you will only see if Kryo is working when you > have > > to > > >> > > > > serialize > > >> > > > > > a > > >> > > > > > >>>> mahout specific data type like vector of drm, something > > >> > > registered > > >> > > > > > with > > >> > > > > > >>>> Kryo. > > >> > > > > > >>>> > > >> > > > > > >>>> > > >> > > > > > >>>> On May 16, 2016, at 6:18 PM, Trevor Grant < > > >> > > > [email protected] > > >> > > > > > > > >> > > > > > >>>> wrote: > > >> > > > > > >>>> > > >> > > > > > >>>> As a quick recap- we're trying to leverage Zeppelin for > > >> > > charting. > > >> > > > > > >>>> > > >> > > > > > >>>> It seems as though this can be achieved by > > >> > > > > > >>>> - Adding properties to the Spark Interpreter > > >> > > > > > >>>> - Adding dependency jars to the spark interpreter > > >> > > > > > >>>> - importing in a spark paragraph > > >> > > > > > >>>> > > >> > > > > > >>>> All seems to be working well, but I've fooled myself > into > > >> > > thinking > > >> > > > > > >> things > > >> > > > > > >>>> were 'working' before because I wasn't actually > > >> integrating. > > >> > > > Lower I > > >> > > > > > >> will > > >> > > > > > >>>> outline the imports/properties, please look over and > tell > > >> me > > >> > if > > >> > > > I'm > > >> > > > > > >>>> theoretically missing anything. > > >> > > > > > >>>> > > >> > > > > > >>>> The next phase for me will be > > >> > > > > > >>>> 1) Convert a matrix to some sort of serializable object > > >> that I > > >> > > can > > >> > > > > > >> easily > > >> > > > > > >>>> unpack from R > > >> > > > > > >>>> 2) use Zeppelin's resource buffers to pass the object > > >> > > > > > >>>> 3) collect the object in an R paragraph, convert it to > a > > >> > > dataframe > > >> > > > > > then > > >> > > > > > >>> map > > >> > > > > > >>>> using ggplot > > >> > > > > > >>>> > > >> > > > > > >>>> Once I have a working prototype I will work add some > > >> syntactic > > >> > > > sugar > > >> > > > > > to > > >> > > > > > >>>> prepare the matrix from the scala side and pass to > > zeppelin > > >> > > (using > > >> > > > > > >>> resource > > >> > > > > > >>>> pools so the same functionality can be reused in Flink) > > and > > >> > an R > > >> > > > > > >> library > > >> > > > > > >>>> containing some functions which will pull the data out > of > > >> the > > >> > > > > resource > > >> > > > > > >>> pool > > >> > > > > > >>>> and spit out a dataframe. > > >> > > > > > >>>> > > >> > > > > > >>>> Once its in a Dataframe in R- go nuts with any plotting > > >> > package > > >> > > > you > > >> > > > > > >> like. > > >> > > > > > >>>> Likewise, it should be possible to do the same thing > with > > >> > > > matplotlib > > >> > > > > > >> and > > >> > > > > > >>>> python ( > > >> > > https://gist.github.com/andershammar/9070e0f6916a0fbda7a5 > > >> > > > ) > > >> > > > > > >>>> > > >> > > > > > >>>> All of this doesn't necessarily require any changing of > > the > > >> > > > Zeppelin > > >> > > > > > >>> source > > >> > > > > > >>>> code, and isn't very intrusive or difficult to set up, > > I'll > > >> > > make a > > >> > > > > > blog > > >> > > > > > >>>> post but its almost a text book entry tutorial on using > > >> > imports > > >> > > in > > >> > > > > > >>>> Zeppelin. (e.g. a tutorial would be just as at home on > > the > > >> > > > Zeppelin > > >> > > > > > >> site > > >> > > > > > >>> as > > >> > > > > > >>>> it would on the Mahout site). > > >> > > > > > >>>> > > >> > > > > > >>>> Now, there has been some talk of using Zeppelin's > > >> angularJS. > > >> > > > Things > > >> > > > > > >> get > > >> > > > > > >>> a > > >> > > > > > >>>> little more harry in that case, but we could make an > > >> optional > > >> > > > build > > >> > > > > > >>> profile > > >> > > > > > >>>> that would make zeppelin recognize matrices at tables > and > > >> > expose > > >> > > > all > > >> > > > > > of > > >> > > > > > >>> the > > >> > > > > > >>>> built in charting features of Zeppelin. > > >> > > > > > >>>> > > >> > > > > > >>>> If you're not adding a bunch of custom charts to > Zeppelin > > >> > (which > > >> > > > > would > > >> > > > > > >> be > > >> > > > > > >>>> somewhat tedious), you're going to end up with a lot of > > >> > examples > > >> > > > > where > > >> > > > > > >>> you > > >> > > > > > >>>> create a table in Mahout/Spark pass it to AngularJS > then > > >> some > > >> > > > > > AngularJS > > >> > > > > > >>>> code charts it for you. At that point however, you're > > >> doing > > >> > > just > > >> > > > as > > >> > > > > > >> much > > >> > > > > > >>>> work, if not more than it would be to simply pass to R > or > > >> > Python > > >> > > > and > > >> > > > > > >> let > > >> > > > > > >>>> ggplot or matlibplot do the work for you. > > >> > > > > > >>>> > > >> > > > > > >>>> Finally, I haven't run into any errors yet using Kyro > > >> (which > > >> > in > > >> > > > part > > >> > > > > > is > > >> > > > > > >>>> what makes me fear I'm not doing this right... it was > too > > >> > > easy...) > > >> > > > > If > > >> > > > > > >>>> anything seems redundant or missing, please call it > out. > > >> > > > > > >>>> > > >> > > > > > >>>> Add Properties to Spark interp: > > >> > > > > > >>>> > > >> > > > > > >>>> spark.kryo.registrator > > >> > > > > > >>>> > org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator > > >> > > > > > >>>> spark.serializer > > org.apache.spark.serializer.KryoSerializer > > >> > > > > > >>>> > > >> > > > > > >>>> Add artifacts (need to change these to maven not local, > > >> also > > >> > > need > > >> > > > to > > >> > > > > > >>>> add/change one jar per below, however this does run): > > >> > > > > > >>>> > > >> > > > > > >>>> > > >> > > > > > >>>> > > >> > > > > > >> > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > /home/trevor/.m2/repository/org/apache/mahout/mahout-math/0.12.1-SNAPSHOT/mahout-math-0.12.1-SNAPSHOT.jar > > >> > > > > > >>>> > > >> > > > > > >> > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > /home/trevor/.m2/repository/org/apache/mahout/mahout-math-scala_2.10/0.12.1-SNAPSHOT/mahout-math-scala_2.10-0.12.1-SNAPSHOT.jar > > >> > > > > > >>>> > > >> > > > > > >> > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > /home/trevor/.m2/repository/org/apache/mahout/mahout-spark_2.10/0.12.1-SNAPSHOT/mahout-spark_2.10-0.12.1-SNAPSHOT.jar > > >> > > > > > >>>> > > >> > > > > > >> > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > /home/trevor/.m2/repository/org/apache/mahout/mahout-spark-shell_2.10/0.12.1-SNAPSHOT/mahout-spark-shell_2.10-0.12.1-SNAPSHOT.jar > > >> > > > > > >>>> Add following code to first paragraph of notebook: > > >> > > > > > >>>> ``` > > >> > > > > > >>>> %spark > > >> > > > > > >>>> import org.apache.mahout.math._ > > >> > > > > > >>>> import org.apache.mahout.math.scalabindings._ > > >> > > > > > >>>> import org.apache.mahout.math.drm._ > > >> > > > > > >>>> import org.apache.mahout.math.scalabindings.RLikeOps._ > > >> > > > > > >>>> import org.apache.mahout.math.drm.RLikeDrmOps._ > > >> > > > > > >>>> import org.apache.mahout.sparkbindings._ > > >> > > > > > >>>> > > >> > > > > > >>>> implicit val sdc: > > >> > > > > > >>> org.apache.mahout.sparkbindings.SparkDistributedContext > = > > >> > > > > > >>>> sc2sdc(sc) > > >> > > > > > >>>> ``` > > >> > > > > > >>>> > > >> > > > > > >>>> > > >> > > > > > >>>> > > >> > > > > > >>>> Trevor Grant > > >> > > > > > >>>> Data Scientist > > >> > > > > > >>>> https://github.com/rawkintrevo > > >> > > > > > >>>> http://stackexchange.com/users/3002022/rawkintrevo > > >> > > > > > >>>> http://trevorgrant.org > > >> > > > > > >>>> > > >> > > > > > >>>> *"Fortunate is he, who is able to know the causes of > > >> things." > > >> > > > > > -Virgil* > > >> > > > > > >>>> > > >> > > > > > >>>> > > >> > > > > > >>>> On Mon, May 16, 2016 at 6:42 PM, Pat Ferrel < > > >> > > > [email protected]> > > >> > > > > > >>> wrote: > > >> > > > > > >>>>> Creating an mc used to do some Kryo setup, like > > >> registering > > >> > > > > > >> serializers > > >> > > > > > >>>> or > > >> > > > > > >>>>> serializer factories IIRC. Also there is the Spark > conf > > >> for > > >> > > > > > >> allocating > > >> > > > > > >>>>> memory for the Kryo buffer. Look at the code in the mc > > >> > creation > > >> > > > > code > > >> > > > > > >> in > > >> > > > > > >>>> the > > >> > > > > > >>>>> Spark package helpers. All can be done in straight > Spark > > >> and > > >> > > > passed > > >> > > > > > >> in > > >> > > > > > >>> to > > >> > > > > > >>>>> create the mc when needed. Again from old weak brain > > cells > > >> > but > > >> > > I > > >> > > > > > >> think > > >> > > > > > >>>> that > > >> > > > > > >>>>> is part of what makes the Mahout shell different than > > teh > > >> > Spark > > >> > > > > shell > > >> > > > > > >>>> plus > > >> > > > > > >>>>> imports, it auto-creates the mc instead of or along > with > > >> an > > >> > sc. > > >> > > > > > >>>>> > > >> > > > > > >>>>> When I get back to my computer I can check. > > >> > > > > > >>>>> > > >> > > > > > >>>>> On May 16, 2016, at 3:40 PM, Andrew Palumbo < > > >> > > [email protected]> > > >> > > > > > >>> wrote: > > >> > > > > > >>>>> Trevor, > > >> > > > > > >>>>> > > >> > > > > > >>>>> Could you post any kryo errors that you may be having? > > >> > > > > > >>>>> > > >> > > > > > >>>>> ________________________________ > > >> > > > > > >>>>> From: Andrew Palumbo <[email protected]> > > >> > > > > > >>>>> Sent: Monday, May 16, 2016 6:25:07 PM > > >> > > > > > >>>>> To: mahout > > >> > > > > > >>>>> Subject: Future Mahout - Zeppelin work > > >> > > > > > >>>>> > > >> > > > > > >>>>> > > >> > > > > > >>>>> > > >> > > > > > >>>>> > > >> > > > > > >>>>> To Dmitriy's point, I agree ggplot is def the > priority, > > >> The > > >> > > > mahout > > >> > > > > > >>> plots > > >> > > > > > >>>>> are at this point are really just a POC, but at some > > >> point we > > >> > > may > > >> > > > > be > > >> > > > > > >>> want > > >> > > > > > >>>>> to integrate some data transformation features into > the > > >> > mahout > > >> > > > > plots > > >> > > > > > >>>>> classes so they're really more future work. > > >> > > > > > >>>>> > > >> > > > > > >>>>> > > >> > > > > > >>>>> long story short: > > >> > > > > > >>>>> > > >> > > > > > >>>>> > > >> > > > > > >>>>>> OK. I'll read through the examples and try to do > > >> something > > >> > > with > > >> > > > > some > > >> > > > > > >>>>> data, then do a ggplot and/or an angular plot on it > > >> (probably > > >> > > > > > >> ggplot). > > >> > > > > > >>>>>> I'll do a quick tutorial. Then I'll reopen discussion > > on > > >> > that > > >> > > > > > >> Zeppelin > > >> > > > > > >>>>> issue about weather we want to go ahead and add > another > > >> > > > > interpreter. > > >> > > > > > >>>>> > > >> > > > > > >>>>> > > >> > > > > > >>>>> Souds Great. > > >> > > > > > >>>>> > > >> > > > > > >>>>> > > >> > > > > > >>>>> Thank you. > > >> > > > > > >>>>> > > >> > > > > > >>>>> ________________________________ > > >> > > > > > >>>>> From: Trevor Grant <[email protected]> > > >> > > > > > >>>>> Sent: Monday, May 16, 2016 5:49:17 PM > > >> > > > > > >>>>> To: Dmitriy Lyubimov > > >> > > > > > >>>>> Cc: Andrew Palumbo; Pat Ferrel; Suneel Marthi > > >> > > > > > >>>>> Subject: Re: Intro - Future Mahout - Zeppelin work > > >> > > > > > >>>>> > > >> > > > > > >>>>> I just signed up for dev, should i just reply all and > cc > > >> dev > > >> > or > > >> > > > > > >> start a > > >> > > > > > >>>>> new thread? > > >> > > > > > >>>>> > > >> > > > > > >>>>> Trevor Grant > > >> > > > > > >>>>> Data Scientist > > >> > > > > > >>>>> https://github.com/rawkintrevo > > >> > > > > > >>>>> [ > > >> https://avatars3.githubusercontent.com/u/5852441?v=3&s=400 > > >> > ]< > > >> > > > > > >>>>> https://github.com/rawkintrevo> > > >> > > > > > >>>>> > > >> > > > > > >>>>> rawkintrevo (Trevor Grant) · GitHub< > > >> > > > https://github.com/rawkintrevo > > >> > > > > > > > >> > > > > > >>>>> github.com > > >> > > > > > >>>>> rawkintrevo has 12 repositories written in Python, > > >> Batchfile, > > >> > > and > > >> > > > > R. > > >> > > > > > >>>>> Follow their code on GitHub. > > >> > > > > > >>>>> > > >> > > > > > >>>>> > > >> > > > > > >>>>> http://stackexchange.com/users/3002022/rawkintrevo > > >> > > > > > >>>>> http://trevorgrant.org > > >> > > > > > >>>>> > > >> > > > > > >>>>> "Fortunate is he, who is able to know the causes of > > >> things." > > >> > > > > -Virgil > > >> > > > > > >>>>> > > >> > > > > > >>>>> > > >> > > > > > >>>>> On Mon, May 16, 2016 at 4:46 PM, Dmitriy Lyubimov < > > >> > > > > [email protected] > > >> > > > > > >>>>> <mailto:[email protected]>> wrote: > > >> > > > > > >>>>> fwiw ggplot2 is pretty darn advanced:) i am a bit > > >> skeptical > > >> > > smile > > >> > > > > > >> would > > >> > > > > > >>>>> have something that ggplot2 would not, the other way > > >> around > > >> > is > > >> > > > much > > >> > > > > > >>> more > > >> > > > > > >>>>> expected by me:) > > >> > > > > > >>>>> > > >> > > > > > >>>>> anyhow if ggplot2 and matplotlib are available in > > Zeppelin > > >> > > > without > > >> > > > > > >>> major > > >> > > > > > >>>>> limitations, it sounds like Zeppelin should be an all > > >> around > > >> > > very > > >> > > > > > >> nice > > >> > > > > > >>>>> venue then. > > >> > > > > > >>>>> > > >> > > > > > >>>>> On Mon, May 16, 2016 at 2:42 PM, Andrew Palumbo < > > >> > > > > [email protected] > > >> > > > > > >>>>> <mailto:[email protected]>> wrote: > > >> > > > > > >>>>> > > >> > > > > > >>>>> yeah we should probably move this over to dev@ > > >> > > > > > >>>>> > > >> > > > > > >>>>> > > >> > > > > > >>>>> sorry- answering a question from a couple emails back > on > > >> the > > >> > > > > thread. > > >> > > > > > >>>>> > > >> > > > > > >>>>> > > >> > > > > > >>>>> If possible, I think it would be great to eventually > > have > > >> > both > > >> > > > > > >> (native > > >> > > > > > >>>>> mahout/smile plots and ggplot), since in the future > > we're > > >> > going > > >> > > > to > > >> > > > > be > > >> > > > > > >>>>> adding more visualization features rather than simple > > >> scatter > > >> > > > plots > > >> > > > > > >> etc > > >> > > > > > >>>>> that may not be covered by ggplot. > > >> > > > > > >>>>> > > >> > > > > > >>>>> > > >> > > > > > >>>>> That's why we were thinking about using angular and > the > > >> pngs. > > >> > > > > > >>>>> > > >> > > > > > >>>>> > > >> > > > > > >>>>> But what youre saying in your last email would be > great! > > >> > > > > > >>>>> > > >> > > > > > >>>>> > > >> > > > > > >>>>> Thank you! > > >> > > > > > >>>>> > > >> > > > > > >>>>> > > >> > > > > > >>>>> ________________________________ > > >> > > > > > >>>>> From: Trevor Grant <[email protected]<mailto: > > >> > > > > > >>>>> [email protected]>> > > >> > > > > > >>>>> Sent: Monday, May 16, 2016 5:33:12 PM > > >> > > > > > >>>>> To: Andrew Palumbo > > >> > > > > > >>>>> Cc: Pat Ferrel; Suneel Marthi; Dmitriy Lyubimov > > >> > > > > > >>>>> > > >> > > > > > >>>>> Subject: Re: Intro - Future Mahout - Zeppelin work > > >> > > > > > >>>>> > > >> > > > > > >>>>> I somehow replied to your last email without seeing > > it... > > >> > > > > > >>>>> > > >> > > > > > >>>>> OK. I'll read through the examples and try to do > > something > > >> > with > > >> > > > > some > > >> > > > > > >>>> data, > > >> > > > > > >>>>> then do a ggplot and/or an angular plot on it > (probably > > >> > > ggplot). > > >> > > > > > >>>>> > > >> > > > > > >>>>> I'll do a quick tutorial. Then I'll reopen discussion > on > > >> that > > >> > > > > > >> Zeppelin > > >> > > > > > >>>>> issue about weather we want to go ahead and add > another > > >> > > > > interpreter. > > >> > > > > > >>>>> > > >> > > > > > >>>>> Trevor Grant > > >> > > > > > >>>>> Data Scientist > > >> > > > > > >>>>> https://github.com/rawkintrevo > > >> > > > > > >>>>> http://stackexchange.com/users/3002022/rawkintrevo > > >> > > > > > >>>>> http://trevorgrant.org > > >> > > > > > >>>>> > > >> > > > > > >>>>> "Fortunate is he, who is able to know the causes of > > >> things." > > >> > > > > -Virgil > > >> > > > > > >>>>> > > >> > > > > > >>>>> > > >> > > > > > >>>>> On Mon, May 16, 2016 at 4:26 PM, Trevor Grant < > > >> > > > > > >>> [email protected] > > >> > > > > > >>>>> <mailto:[email protected]>> wrote: > > >> > > > > > >>>>> sorry for double email but are you thinking > > visualization > > >> > > should > > >> > > > > be a > > >> > > > > > >>>>> library internal to mahout or should we leverage > > zeppelins > > >> > > > > > >>> visualization > > >> > > > > > >>>>> capabilities? > > >> > > > > > >>>>> > > >> > > > > > >>>>> Also, should we move this discussion to dev? > > >> > > > > > >>>>> > > >> > > > > > >>>>> tg > > >> > > > > > >>>>> > > >> > > > > > >>>>> > > >> > > > > > >>>>> Trevor Grant > > >> > > > > > >>>>> Data Scientist > > >> > > > > > >>>>> https://github.com/rawkintrevo > > >> > > > > > >>>>> http://stackexchange.com/users/3002022/rawkintrevo > > >> > > > > > >>>>> http://trevorgrant.org > > >> > > > > > >>>>> > > >> > > > > > >>>>> "Fortunate is he, who is able to know the causes of > > >> things." > > >> > > > > -Virgil > > >> > > > > > >>>>> > > >> > > > > > >>>>> > > >> > > > > > >>>>> On Mon, May 16, 2016 at 4:14 PM, Andrew Palumbo < > > >> > > > > [email protected] > > >> > > > > > >>>>> <mailto:[email protected]>> wrote: > > >> > > > > > >>>>> > > >> > > > > > >>>>> Sorry- to be a little more clear, Part of what we're > > >> trying > > >> > to > > >> > > > is > > >> > > > > to > > >> > > > > > >>> get > > >> > > > > > >>>>> the new plotting features integrated with Zeppelin. We > > >> plan > > >> > on > > >> > > > > adding > > >> > > > > > >>>> more > > >> > > > > > >>>>> advanced plotting. > > >> > > > > > >>>>> > > >> > > > > > >>>>> > > >> > > > > > >>>>> ________________________________ > > >> > > > > > >>>>> From: Andrew Palumbo <[email protected]<mailto: > > >> > > > [email protected] > > >> > > > > >> > > >> > > > > > >>>>> Sent: Monday, May 16, 2016 5:04:49 PM > > >> > > > > > >>>>> To: Pat Ferrel; Trevor Grant > > >> > > > > > >>>>> Cc: Suneel Marthi; Dmitriy Lyubimov > > >> > > > > > >>>>> Subject: Re: Intro - Future Mahout - Zeppelin work > > >> > > > > > >>>>> > > >> > > > > > >>>>> > > >> > > > > > >>>>> Awesome! > > >> > > > > > >>>>> > > >> > > > > > >>>>> > > >> > > > > > >>>>> most of the hard work was done by Dmitriy[??] , I've > > just > > >> > > > reworked > > >> > > > > > >> it a > > >> > > > > > >>>>> couple of times to keep up with spark's refactoring. > > >> > > > > > >>>>> > > >> > > > > > >>>>> > > >> > > > > > >>>>> I think that you will also need to include: > > >> > > > > > >>>>> > > >> > > > > > >>>>> > > >> > > > > > >>>>> > > mahout-spark_2.10-0.12.1-SNAPSHOT-dependency-reduced.jar > > >> > > > > > >>>>> > > >> > > > > > >>>>> > > >> > > > > > >>>>> For the new plotting features that we're working on. > > >> > > > > > >>>>> > > >> > > > > > >>>>> > > >> > > > > > >>>>> the plotting is still a work in progress, and the grid > > and > > >> > > > surface > > >> > > > > > >>> plots > > >> > > > > > >>>>> are not working properly. The plots are swing based > and > > >> can > > >> > > > > > >> currently > > >> > > > > > >>> be > > >> > > > > > >>>>> exported as PNGs. There are a few examples on the > > closed > > >> > PR: > > >> > > > > > >>>>> https://github.com/apache/mahout/pull/230 > > >> > > > > > >>>>> > > >> > > > > > >>>>> > > >> > > > > > >>>>> There is an example script in > > >> > > > examples/bin/spark-shell-plot.mscala > > >> > > > > > >>>>> (commited to master) : > > >> > > > > > >>>>> > > >> > > > > > >> > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > https://github.com/apache/mahout/blob/master/examples/bin/spark-shell-plot.mscala > > >> > > > > > >>>>> > > >> > > > > > >>>>> Thanks! > > >> > > > > > >>>>> > > >> > > > > > >>>>> > > >> > > > > > >>>>> > > >> > > > > > >>>>> ________________________________ > > >> > > > > > >>>>> From: Pat Ferrel <[email protected]<mailto: > > >> > > > > [email protected] > > >> > > > > > >>>>> Sent: Monday, May 16, 2016 4:54:15 PM > > >> > > > > > >>>>> To: Trevor Grant > > >> > > > > > >>>>> Cc: Andrew Palumbo; Suneel Marthi; Dmitriy Lyubimov > > >> > > > > > >>>>> Subject: Re: Intro - Future Mahout - Zeppelin work > > >> > > > > > >>>>> > > >> > > > > > >>>>> This is only the beginning. Andy has been using Smile > > as a > > >> > > > > > >>> visualization > > >> > > > > > >>>>> lib since it is pretty rich in ML support. We are > > looking > > >> at > > >> > > > > > >>> integrating > > >> > > > > > >>>>> some of that with Zeppelin then adding code to feed > the > > >> new > > >> > > > > > >>>> visualizations > > >> > > > > > >>>>> in Mahout. I’m here because I’m fairly familiar with > > >> > AngularJS > > >> > > if > > >> > > > > > >>> that’s > > >> > > > > > >>>>> the way to go. Smile is swing based but can output > pngs, > > >> > maybe > > >> > > > > other > > >> > > > > > >>>> image > > >> > > > > > >>>>> formats—Andy? > > >> > > > > > >>>>> > > >> > > > > > >>>>> BTW Dmitriy is still very involved but has rouble > > getting > > >> > > > > permission > > >> > > > > > >> to > > >> > > > > > >>>>> donate code. > > >> > > > > > >>>>> > > >> > > > > > >>>>> > > >> > > > > > >>>>> On May 16, 2016, at 1:45 PM, Trevor Grant < > > >> > > > > [email protected] > > >> > > > > > >>>>> <mailto:[email protected]>> wrote: > > >> > > > > > >>>>> > > >> > > > > > >>>>> Hey Andrew, > > >> > > > > > >>>>> > > >> > > > > > >>>>> thanks- you basically did all of the hard work for me! > > >> > > > > > >>>>> > > >> > > > > > >>>>> I've got the linear regression example working from: > > >> > > > > > >>>>> > > >> > > > > http://mahout.apache.org/users/sparkbindings/play-with-shell.html > > >> > > > > > >>>>> > > >> > > > > > >>>>> my java is sketchy at best, i tend to over import. I > > >> pulled > > >> > in > > >> > > > the > > >> > > > > > >>>>> following jars: > > >> > > > > > >>>>> > > >> > > > > > >>>>> > > >> > > > > > >> > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > org/apache/mahout/mahout-math/0.12.1-SNAPSHOT/mahout-math-0.12.1-SNAPSHOT.jar > > >> > > > > > >>>>> > > >> > > > > > >> > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > org/apache/mahout/mahout-math-scala_2.10/0.12.1-SNAPSHOT/mahout-math-scala_2.10-0.12.1-SNAPSHOT.jar > > >> > > > > > >>>>> > > >> > > > > > >> > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > org/apache/mahout/mahout-spark_2.10/0.12.1-SNAPSHOT/mahout-spark_2.10-0.12.1-SNAPSHOT.jar > > >> > > > > > >>>>> > > >> > > > > > >> > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > org/apache/mahout/mahout-spark-shell_2.10/0.12.1-SNAPSHOT/mahout-spark-shell_2.10-0.12.1-SNAPSHOT.jar > > >> > > > > > >>>>> I think those are all necessary... should I be > pulling > > in > > >> > > more? > > >> > > > > > >>>>> > > >> > > > > > >>>>> I hate to say it (but will do so bc this isn't public) > > >> this > > >> > > > > > >> integration > > >> > > > > > >>>> is > > >> > > > > > >>>>> super easy from a user perspective, almost too easy- > eg > > >> why > > >> > not > > >> > > > let > > >> > > > > > >> the > > >> > > > > > >>>>> user add it themselves... Add the appropriate maven > > >> > artifacts, > > >> > > > > > >> restart > > >> > > > > > >>>> the > > >> > > > > > >>>>> interpreter and run the following in a notebook: > > >> > > > > > >>>>> ``` > > >> > > > > > >>>>> import org.apache.mahout.math._ > > >> > > > > > >>>>> import org.apache.mahout.math.scalabindings._ > > >> > > > > > >>>>> import org.apache.mahout.math.drm._ > > >> > > > > > >>>>> import org.apache.mahout.math.scalabindings.RLikeOps._ > > >> > > > > > >>>>> import org.apache.mahout.math.drm.RLikeDrmOps._ > > >> > > > > > >>>>> import org.apache.mahout.sparkbindings._ > > >> > > > > > >>>>> > > >> > > > > > >>>>> implicit val sdc: > > >> > > > > > >>> org.apache.mahout.sparkbindings.SparkDistributedContext > > >> > > > > > >>>>> = sc2sdc(sc) > > >> > > > > > >>>>> ``` > > >> > > > > > >>>>> Then whatever code you want and you're off to the > > races... > > >> > > > > > >>>>> > > >> > > > > > >>>>> that said, adding a build profile like -PsparkMahout > and > > >> > > creating > > >> > > > > an > > >> > > > > > >>>>> interpretter like %spark.mahout should be fairly > > straight > > >> > > > forward. > > >> > > > > > >>>>> > > >> > > > > > >>>>> Second question, do you have an example that would be > > more > > >> > > > > > >>> 'visualization > > >> > > > > > >>>>> friendly'? I could pass the results to Angular or R > just > > >> to > > >> > > show > > >> > > > > off > > >> > > > > > >>> how > > >> > > > > > >>>> to > > >> > > > > > >>>>> do it. > > >> > > > > > >>>>> > > >> > > > > > >>>>> Which leads back to the question, is this even worth > > >> > building a > > >> > > > > full > > >> > > > > > >>>>> interpreter for or just make a really nice blog post > > with > > >> > > > examples > > >> > > > > on > > >> > > > > > >>> how > > >> > > > > > >>>>> to integrate with R...? > > >> > > > > > >>>>> > > >> > > > > > >>>>> > > >> > > > > > >>>>> > > >> > > > > > >>>>> > > >> > > > > > >>>>> > > >> > > > > > >>>>> > > >> > > > > > >>>>> > > >> > > > > > >>>>> > > >> > > > > > >>>>> Trevor Grant > > >> > > > > > >>>>> Data Scientist > > >> > > > > > >>>>> https://github.com/rawkintrevo > > >> > > > > > >>>>> http://stackexchange.com/users/3002022/rawkintrevo > > >> > > > > > >>>>> http://trevorgrant.org<http://trevorgrant.org/> > > >> > > > > > >>>>> > > >> > > > > > >>>>> "Fortunate is he, who is able to know the causes of > > >> things." > > >> > > > > -Virgil > > >> > > > > > >>>>> > > >> > > > > > >>>>> > > >> > > > > > >>>>> On Mon, May 16, 2016 at 2:09 PM, Andrew Palumbo < > > >> > > > > [email protected] > > >> > > > > > >>>>> <mailto:[email protected]>> wrote: > > >> > > > > > >>>>> Hi Trevor, welcome! > > >> > > > > > >>>>> > > >> > > > > > >>>>> It's great to have you helping out, thanks very much. > > >> I've > > >> > > done > > >> > > > a > > >> > > > > > >> good > > >> > > > > > >>>>> amount of work on our mahout spark shell .. so let me > > >> know if > > >> > > you > > >> > > > > > >> have > > >> > > > > > >>>> any > > >> > > > > > >>>>> questions there about what we did there.. > > >> > > > > > >>>>> > > >> > > > > > >>>>> Thanks alot! > > >> > > > > > >>>>> > > >> > > > > > >>>>> Andy > > >> > > > > > >>>>> > > >> > > > > > >>>>> > > >> > > > > > >>>>> -------- Original message -------- > > >> > > > > > >>>>> From: Suneel Marthi <[email protected]<mailto: > > >> > > > [email protected] > > >> > > > > >> > > >> > > > > > >>>>> Date: 05/16/2016 2:44 PM (GMT-05:00) > > >> > > > > > >>>>> To: Trevor Grant <[email protected]<mailto: > > >> > > > > > >>>> [email protected] > > >> > > > > > >>>>> Cc: Suneel Marthi <[email protected]<mailto: > > >> > > [email protected] > > >> > > > >>, > > >> > > > > > >> Pat > > >> > > > > > >>>>> Ferrel <[email protected]<mailto: > > >> [email protected] > > >> > >>, > > >> > > > > Andrew > > >> > > > > > >>>>> Palumbo <[email protected]<mailto:[email protected] > >> > > >> > > > > > >>>>> Subject: Re: Intro - Future Mahout - Zeppelin work > > >> > > > > > >>>>> > > >> > > > > > >>>>> Oh yes, he's around. I see him online. > > >> > > > > > >>>>> > > >> > > > > > >>>>> On Mon, May 16, 2016 at 2:42 PM, Trevor Grant < > > >> > > > > > >>> [email protected] > > >> > > > > > >>>>> <mailto:[email protected]>> wrote: > > >> > > > > > >>>>> Is Dmitriy Lyubimov still around? > > >> > > > > > >>>>> > > >> > > > > > >>>>> Looks like he created this issue for Zeppelin a while > > ago. > > >> > (The > > >> > > > old > > >> > > > > > >>> lost > > >> > > > > > >>>>> code to which you were referring?) > > >> > > > > > >>>>> > > >> > > > > > >>>>> https://issues.apache.org/jira/browse/ZEPPELIN-116 > > >> > > > > > >>>>> > > >> > > > > > >>>>> > > >> > > > > > >>>>> tg > > >> > > > > > >>>>> > > >> > > > > > >>>>> > > >> > > > > > >>>>> Trevor Grant > > >> > > > > > >>>>> Data Scientist > > >> > > > > > >>>>> https://github.com/rawkintrevo > > >> > > > > > >>>>> http://stackexchange.com/users/3002022/rawkintrevo > > >> > > > > > >>>>> http://trevorgrant.org<http://trevorgrant.org/> > > >> > > > > > >>>>> > > >> > > > > > >>>>> "Fortunate is he, who is able to know the causes of > > >> things." > > >> > > > > -Virgil > > >> > > > > > >>>>> > > >> > > > > > >>>>> > > >> > > > > > >>>>> On Mon, May 16, 2016 at 1:37 PM, Suneel Marthi < > > >> > > > [email protected] > > >> > > > > > >>>> <mailto: > > >> > > > > > >>>>> [email protected]>> wrote: > > >> > > > > > >>>>> Welcome to the party TG !! > > >> > > > > > >>>>> > > >> > > > > > >>>>> On Mon, May 16, 2016 at 2:28 PM, Trevor Grant < > > >> > > > > > >>> [email protected] > > >> > > > > > >>>>> <mailto:[email protected]>> wrote: > > >> > > > > > >>>>> Hey all, > > >> > > > > > >>>>> > > >> > > > > > >>>>> I'm excited for a chance to help out. I'm actually > > >> getting > > >> > > ready > > >> > > > > to > > >> > > > > > >>>>> download now and start playing around. > > >> > > > > > >>>>> > > >> > > > > > >>>>> I had talked about this briefly but it given a > properly > > >> > > > functioning > > >> > > > > > >>>>> Zeppelin interpreter for Apache Mahout, one could > > leverage > > >> > all > > >> > > of > > >> > > > > the > > >> > > > > > >>>>> Zeppelin visualizations, anything in AngularJS, or > > >> anything > > >> > in > > >> > > R > > >> > > > > > >>> (through > > >> > > > > > >>>>> clever use of Zeppelin's Resource Pools). > > >> > > > > > >>>>> > > >> > > > > > >>>>> I'll work on getting logged in to the slack channel as > > >> well. > > >> > > > > > >>>>> > > >> > > > > > >>>>> Nice to meet you all, looking forward to helping out! > > >> > > > > > >>>>> > > >> > > > > > >>>>> tg > > >> > > > > > >>>>> > > >> > > > > > >>>>> > > >> > > > > > >>>>> Trevor Grant > > >> > > > > > >>>>> Data Scientist > > >> > > > > > >>>>> https://github.com/rawkintrevo > > >> > > > > > >>>>> http://stackexchange.com/users/3002022/rawkintrevo > > >> > > > > > >>>>> http://trevorgrant.org<http://trevorgrant.org/> > > >> > > > > > >>>>> > > >> > > > > > >>>>> "Fortunate is he, who is able to know the causes of > > >> things." > > >> > > > > -Virgil > > >> > > > > > >>>>> > > >> > > > > > >>>>> > > >> > > > > > >>>>> On Sun, May 15, 2016 at 12:56 PM, Suneel Marthi < > > >> > > > > [email protected] > > >> > > > > > >>>>> <mailto:[email protected]>> wrote: > > >> > > > > > >>>>> FYi... > > >> > > > > > >>>>> Trevor was there for my talk, so he has some idea of > > >> Mahout > > >> > > > > Samsara. > > >> > > > > > >>>>> > > >> > > > > > >>>>> On Sun, May 15, 2016 at 1:51 PM, Pat Ferrel < > > >> > > > [email protected] > > >> > > > > > >>>> <mailto: > > >> > > > > > >>>>> [email protected]>> wrote: > > >> > > > > > >>>>> Hey Trevor, > > >> > > > > > >>>>> > > >> > > > > > >>>>> Good to meet you. As you probably know Mahout-Samsara > > is a > > >> > > > > > >>> reincarnation > > >> > > > > > >>>>> of the project in a new body, which is less a > collection > > >> of > > >> > > > > > >> algorithms > > >> > > > > > >>>> than > > >> > > > > > >>>>> a roll-your-own math/algorithm tool. The major benefit > > is > > >> > that > > >> > > > > during > > >> > > > > > >>>>> experimentation and later in production the code is by > > >> nature > > >> > > > > > >> scalable > > >> > > > > > >>> on > > >> > > > > > >>>>> Spark and Flink. Most of the Mahout DSL is R-like and > > >> > supports > > >> > > > > tensor > > >> > > > > > >>>> math > > >> > > > > > >>>>> but we are now looking at streaming online algo > support > > >> too. > > >> > > > > > >>>>> > > >> > > > > > >>>>> In any case you probably know we have a Mahout version > > of > > >> the > > >> > > > Spark > > >> > > > > > >>>> Shell, > > >> > > > > > >>>>> which has been integrated with an old version of > > Zeppelin > > >> > (code > > >> > > > is > > >> > > > > > >>> lost). > > >> > > > > > >>>>> Recently Andy has experimented with some very nice > > >> > > visualizations > > >> > > > > of > > >> > > > > > >> ML > > >> > > > > > >>>>> data (not just analytics data). We as a project are > > >> > interested > > >> > > in > > >> > > > > > >>>> Zeppelin > > >> > > > > > >>>>> integration of our shell and graphics. From what I > > >> understand > > >> > > the > > >> > > > > > >>>> graphics > > >> > > > > > >>>>> extension mechanism of Zeppelin is based on AngularJS, > > >> which > > >> > I > > >> > > > have > > >> > > > > > >>> some > > >> > > > > > >>>>> experience with. > > >> > > > > > >>>>> > > >> > > > > > >>>>> So, we’d like to start the conversation about how to > > >> proceed. > > >> > > We > > >> > > > > > >> would > > >> > > > > > >>>>> love some help but will move ahead in any case. > > >> > > > > > >>>>> > > >> > > > > > >>>>> Pat > > >> > > > > > >>>>> > > >> > > > > > >>>>> > > >> > > > > > >>>>> On May 15, 2016, at 9:52 AM, Suneel Marthi < > > >> > [email protected] > > >> > > > > > >> <mailto: > > >> > > > > > >>>>> [email protected]>> wrote: > > >> > > > > > >>>>> > > >> > > > > > >>>>> Hi Trevor, > > >> > > > > > >>>>> > > >> > > > > > >>>>> Nice meeting u last week in Vancouver. Per our > > >> > conversation, I > > >> > > > > > >> wanted > > >> > > > > > >>> to > > >> > > > > > >>>>> introduce u to Andrew Palumbo (Mahout Chair) and Pat > > >> Ferrel > > >> > > > (Mahout > > >> > > > > > >>> PMC). > > >> > > > > > >>>>> As I mentioned in my talk, we are actively looking at > > >> > Zeppelin > > >> > > > > > >>>> integration > > >> > > > > > >>>>> with Mahout (primarily for spark) and would appreciate > > >> your > > >> > > help > > >> > > > > (as > > >> > > > > > >>> also > > >> > > > > > >>>>> all things DL and ML). > > >> > > > > > >>>>> > > >> > > > > > >>>>> We definitely can use all your help as we r revamping > > the > > >> > > Mahout > > >> > > > > > >>> project > > >> > > > > > >>>>> and shedding its legacy MapReduce image. > > >> > > > > > >>>>> > > >> > > > > > >>>>> I sent u an invite to the Mahout slack channel, > > >> > > > mahout.apache.org< > > >> > > > > > >>>>> http://mahout.apache.org/> - that's where we all > > hangout > > >> and > > >> > > not > > >> > > > > > >>> having > > >> > > > > > >>>>> to worry about avoiding naughty words. > > >> > > > > > >>>>> > > >> > > > > > >>>>> Looking forward to working with you > > >> > > > > > >>>>> > > >> > > > > > >>>>> Suneel > > >> > > > > > >>>>> > > >> > > > > > >>>>> > > >> > > > > > >>>> > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > > > > > > >
