Hi Eric, We r talking about the same PR which is a tweak of existing Spark-Zeppelin interpreter. What we r looking at is a specific Mahout-Spark-Zeppelin interpreter that is independent of above?
BTW Eric, nice to see u on Mahout mailing lists, u didn't make it to Vancouver this time? On Sun, May 29, 2016 at 10:57 PM, Eric Charles <[email protected]> wrote: > Have you seen [ZEPPELIN-116] Add Mahout Support for Spark Interpreter? > > https://github.com/apache/incubator-zeppelin/pull/928 > > It declares in the spark interpreter the mahout deps, and creates the sdc > (spark distributed context). > > On 29/05/16 19:16, Suneel Marthi wrote: > >> On Sun, May 29, 2016 at 12:07 PM, Trevor Grant <[email protected]> >> wrote: >> >> OK cool. Just wanted to make sure I wasn't stealing anyone's baby or >>> duplicating efforts. >>> >>> Two things: >>> >>> 1- The blog post referenced the linear-regression example notebook twice- >>> I've updated it to reference the ggplot integration. E.g. import this >>> note: >>> >>> >>> https://raw.githubusercontent.com/rawkintrevo/mahout-zeppelin/master/%5BMAHOUT%5D%5BPROVING-GROUNDS%5DSpark-Mahout%2Bggplot2.json >>> (I still need to update with a blurb about sampling, however it is done >>> in >>> that note...) So to any who tried the blog, I huge appology because that >>> notebook is where all of the 'magic happened', (all of the screen shots / >>> gg-plots / etc happened there). >>> >>> 2- I have a working prototype of the Zeppelin integration: >>> 'mahout-terp' branch of : >>> https://github.com/rawkintrevo/incubator-zeppelin >>> if you build, and set 'spark.mahout' to 'true' in the Spark Interpretter >>> properties, you have a Mahout interpreter. This is the minimally invasive >>> way to do it, I'll be opening a PR soon, we'll see what the gang over at >>> Zeppelin say. >>> I'll still need docs and an example notebook, but I'm waiting to make >>> sure >>> I don't need to do a major refactor before I get carried away with those >>> activities. >>> >>> In essence when 'spark-mahout' is 'true' you jump right in on r-like dsl >>> and you have a sdc declared based on the underlying sc. >>> >>> >> I am not sure if messing with the very "sacrosanct" Zeppelin-Spark >> interpreter is gonna go down well with the Spark insanity. I would prefer >> having a separate MAhout-Spark-Zeppelin interpreter under Zeppelin project >> if that's acceptable to the Zeppelin folks, even though most of it might >> be >> repeatee. >> >> What do others have to say? >> >> >> have a good holiday weekend, >>> >>> tg >>> >>> >>> >>> Trevor Grant >>> Data Scientist >>> https://github.com/rawkintrevo >>> http://stackexchange.com/users/3002022/rawkintrevo >>> http://trevorgrant.org >>> >>> *"Fortunate is he, who is able to know the causes of things." -Virgil* >>> >>> >>> On Sun, May 29, 2016 at 10:49 AM, Andrew Palumbo <[email protected]> >>> wrote: >>> >>> Thx Trevor, >>>> Re: m-1854, It was something that we started when were first discussing >>>> using the smile plots for and trying to pipe them over to Zeppelin .. >>>> As >>>> far as I know there was not progress started on it.. I've unassigned it. >>>> >>>> Feel free to Assign any Jiras to yourself. I think that m-1854 is >>>> >>> similar >>> >>>> to the mahout-spark-shell, so I may be able to help out there. >>>> >>>> >>>> ________________________________________ >>>> From: Trevor Grant <[email protected]> >>>> Sent: Saturday, May 28, 2016 11:21:44 PM >>>> To: [email protected] >>>> Subject: Re: Future Mahout - Zeppelin work >>>> >>>> Created a subtask on 1855 for tsv strings. >>>> >>>> Looking at 1854 assigned to Pat Ferrel, what's your progress to date? >>>> >>> How >>> >>>> can I help? >>>> >>>> tg >>>> >>>> >>>> >>>> Trevor Grant >>>> Data Scientist >>>> https://github.com/rawkintrevo >>>> http://stackexchange.com/users/3002022/rawkintrevo >>>> http://trevorgrant.org >>>> >>>> *"Fortunate is he, who is able to know the causes of things." -Virgil* >>>> >>>> >>>> On Thu, May 26, 2016 at 2:34 PM, Andrew Palumbo <[email protected]> >>>> wrote: >>>> >>>> Great! >>>>> >>>>> When you free up and have the time, could you create some Jiras for >>>>> >>>> these? >>>> >>>>> >>>>> We actually have MAHOUT-1852 open for Histograms already, and >>>>> >>>> MAHOUT-1854 >>> >>>> and MAHOUT-1855 (early Zeppelin integration Jiras). I can close m-1854 >>>>> >>>> and >>>> >>>>> m-1855 out and we can start new ones if they're not relevant anymore or >>>>> >>>> we >>>> >>>>> can just go with those. >>>>> >>>>> Thanks >>>>> >>>>> ________________________________________ >>>>> From: Trevor Grant <[email protected]> >>>>> Sent: Thursday, May 26, 2016 3:17:22 PM >>>>> To: [email protected] >>>>> Subject: Re: Future Mahout - Zeppelin work >>>>> >>>>> Short answer: it is high priority. I think it will be a Mahout >>>>> >>>> interpreter >>>> >>>>> into Zeppelin, and given that plans are on hold for a Flink-Mahout in >>>>> >>>> the >>> >>>> short term, I think it should be a piggy-back spark interpreter (e.g. >>>>> exposed through something like %spark.mahout). So I have thoughts, >>>>> >>>> but >>> >>>> no >>>> >>>>> plan. Been busy with a couple of other commitments. >>>>> >>>>> On the Mahout side we need: >>>>> A function that will convert small matrices into TSV strings >>>>> Convenience functions for sampling super-large matrices into things >>>>> >>>> like >>> >>>> histograms, etc, that one would want to plot. I.e. histogram bucketing? >>>>> (less important for the moment) >>>>> >>>>> On the Zeppelin Size we need: >>>>> an interpreter. >>>>> >>>>> >>>>> Trevor Grant >>>>> Data Scientist >>>>> https://github.com/rawkintrevo >>>>> http://stackexchange.com/users/3002022/rawkintrevo >>>>> http://trevorgrant.org >>>>> >>>>> *"Fortunate is he, who is able to know the causes of things." -Virgil* >>>>> >>>>> >>>>> On Thu, May 26, 2016 at 1:22 PM, Suneel Marthi <[email protected]> >>>>> >>>> wrote: >>>> >>>>> >>>>> While on this subject, do we have a plan yet of integrating Zeppelin >>>>>> >>>>> into >>>> >>>>> Mahout (or the converse) of having Mahout specific interpreter for >>>>>> Zeppelin? I think that shuld be high priority in the short term. >>>>>> >>>>>> On Thu, May 26, 2016 at 1:17 PM, Trevor Grant < >>>>>> >>>>> [email protected]> >>>> >>>>> wrote: >>>>>> >>>>>> Ahh, like the "Sample From Matrix" paragraph in the notebook. >>>>>>> >>>>>>> Yea that seems like a good add. If not this afternoon, I'll include >>>>>>> >>>>>> it >>>> >>>>> Saturday. >>>>>>> >>>>>>> >>>>>>> Trevor Grant >>>>>>> Data Scientist >>>>>>> https://github.com/rawkintrevo >>>>>>> http://stackexchange.com/users/3002022/rawkintrevo >>>>>>> http://trevorgrant.org >>>>>>> >>>>>>> *"Fortunate is he, who is able to know the causes of things." >>>>>>> >>>>>> -Virgil* >>>> >>>>> >>>>>>> >>>>>>> On Thu, May 26, 2016 at 11:52 AM, Andrew Palumbo < >>>>>>> >>>>>> [email protected] >>> >>>> >>>>> wrote: >>>>>>> >>>>>>> Trevor, I was reading over your blog last night again- first time >>>>>>>> >>>>>>> since >>>>> >>>>>> you updated. It is great! >>>>>>>> >>>>>>>> I have one suggestion being adding in a code line on how the the >>>>>>>> >>>>>>> sampling >>>>>> >>>>>>> of the DRM -> in-core Matrix is done: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> https://github.com/apache/mahout/blob/master/math-scala/src/main/scala/org/apache/mahout/math/drm/package.scala#L148 >>> >>>> >>>>>>>> eg something like: >>>>>>>> >>>>>>>> mxSin = drmSampleKRows(drmSin, 1000, replacement = false) >>>>>>>> >>>>>>>> Maybe you omitted this intentionally? >>>>>>>> >>>>>>>> Andy >>>>>>>> >>>>>>>> ________________________________________ >>>>>>>> From: Trevor Grant <[email protected]> >>>>>>>> Sent: Friday, May 20, 2016 7:56:20 PM >>>>>>>> To: [email protected] >>>>>>>> Subject: Re: Future Mahout - Zeppelin work >>>>>>>> >>>>>>>> Unfortunately Zeppelin dev has been so rapid, 0.6-SNAPSHOT as a >>>>>>>> >>>>>>> version >>>>> >>>>>> is >>>>>>> >>>>>>>> uninformative to me. I'd say if possible, you're first >>>>>>>> >>>>>>> troubleshooting >>>>> >>>>>> measure would be to re clone or do a "git fetch upstream" to get >>>>>>>> >>>>>>> up >>> >>>> to >>>>> >>>>>> the >>>>>>> >>>>>>>> very latest >>>>>>>> >>>>>>>> Sorry for delayed reply >>>>>>>> Tg >>>>>>>> On May 20, 2016 5:36 PM, "Andrew Musselman" < >>>>>>>> >>>>>>> [email protected]> >>>>>> >>>>>>> wrote: >>>>>>>> >>>>>>>> Trevor, my zeppelin source is at this version: >>>>>>>>> >>>>>>>>> <groupId>org.apache.zeppelin</groupId> >>>>>>>>> <artifactId>zeppelin</artifactId> >>>>>>>>> <packaging>pom</packaging> >>>>>>>>> <version>0.6.0-incubating-SNAPSHOT</version> >>>>>>>>> <name>Zeppelin</name> >>>>>>>>> <description>Zeppelin project</description> >>>>>>>>> <url>http://zeppelin.incubator.apache.org/</url> >>>>>>>>> >>>>>>>>> And yes you're right the artifacts weren't added to the >>>>>>>>> >>>>>>>> dependencies; >>>>> >>>>>> is >>>>>>> >>>>>>>> that a feature in more modern zep? >>>>>>>>> >>>>>>>>> On Fri, May 20, 2016 at 3:02 PM, Dmitriy Lyubimov < >>>>>>>>> >>>>>>>> [email protected] >>>>> >>>>>> >>>>>>> wrote: >>>>>>>>> >>>>>>>>> no parenthesis. >>>>>>>>>> >>>>>>>>>> import o.a.m.sparkbindings._ >>>>>>>>>> .... >>>>>>>>>> myRdd = myDrm.rdd >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, May 20, 2016 at 2:57 PM, Suneel Marthi < >>>>>>>>>> >>>>>>>>> [email protected] >>>>> >>>>>> >>>>>>> wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, May 20, 2016 at 3:18 PM, Trevor Grant < >>>>>>>>>>> >>>>>>>>>> [email protected]> >>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> Hey Pat, >>>>>>>>>>>> >>>>>>>>>>>> If you spit out a TSV - you can import into pyspark / >>>>>>>>>>>> >>>>>>>>>>> matplotlib >>>>>> >>>>>>> from >>>>>>>> >>>>>>>>> the >>>>>>>>>> >>>>>>>>>>> resource pool in essentially the same way and use that >>>>>>>>>>>> >>>>>>>>>>> plotting >>>>> >>>>>> library >>>>>>>>> >>>>>>>>>> if >>>>>>>>>>> >>>>>>>>>>>> you prefer. In fact you could import the tsv into pandas >>>>>>>>>>>> >>>>>>>>>>> and >>>> >>>>> use >>>>>> >>>>>>> all >>>>>>>> >>>>>>>>> of >>>>>>>>>> >>>>>>>>>>> the pandas plotting as well (though I think it is for the >>>>>>>>>>>> >>>>>>>>>>> most >>>>> >>>>>> part, >>>>>>>> >>>>>>>>> also >>>>>>>>>> >>>>>>>>>>> matplotlib with some convenience functions). >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> https://www.zeppelinhub.com/viewer/notebooks/aHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2ZlbGl4Y2hldW5nL3NwYXJrLW5vdGVib29rLWV4YW1wbGVzL21hc3Rlci9aZXBwZWxpbl9ub3RlYm9vay8yQU1YNUNWQ1Uvbm90ZS5qc29u >>> >>>> >>>>>>>>>>>> In Zeppelin, unless you specify otherwise, pyspark, >>>>>>>>>>>> >>>>>>>>>>> sparkr, >>> >>>> spark-sql, >>>>>>>>> >>>>>>>>>> and >>>>>>>>>>> >>>>>>>>>>>> scala-spark all share the same spark context you can >>>>>>>>>>>> >>>>>>>>>>> create >>> >>>> RDDs >>>>>> >>>>>>> in >>>>>>> >>>>>>>> one >>>>>>>>> >>>>>>>>>> language and access them / work on them in another (so I >>>>>>>>>>>> >>>>>>>>>>> understand). >>>>>>>> >>>>>>>>> >>>>>>>>>>>> So in Mahout can you "save" a matrix as a RDD? e.g. >>>>>>>>>>>> >>>>>>>>>>> something >>>> >>>>> like >>>>>>> >>>>>>>> >>>>>>>>>>>> val myRDD = myDRM.asRDD() >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> val myRDD = myDRM.rdd() >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> And would 'myRDD' then exist in the spark context? >>>>>>>>>>>> >>>>>>>>>>>> yes it will be in sparkContext >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Trevor Grant >>>>>>>>>>>> Data Scientist >>>>>>>>>>>> https://github.com/rawkintrevo >>>>>>>>>>>> http://stackexchange.com/users/3002022/rawkintrevo >>>>>>>>>>>> http://trevorgrant.org >>>>>>>>>>>> >>>>>>>>>>>> *"Fortunate is he, who is able to know the causes of >>>>>>>>>>>> >>>>>>>>>>> things." >>>> >>>>> -Virgil* >>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Fri, May 20, 2016 at 12:21 PM, Pat Ferrel < >>>>>>>>>>>> >>>>>>>>>>> [email protected]> >>>>>>>> >>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Agreed. >>>>>>>>>>>>> >>>>>>>>>>>>> BTW I don’t want to stall progress but being the most >>>>>>>>>>>>> >>>>>>>>>>>> ignorant >>>>>> >>>>>>> of >>>>>>> >>>>>>>> plot >>>>>>>>>> >>>>>>>>>>> libs, I’ll ask if we should consider python and >>>>>>>>>>>>> >>>>>>>>>>>> matplotlib. >>>> >>>>> In >>>>>> >>>>>>> another >>>>>>>>>> >>>>>>>>>>> project we use python because of the RDD support on >>>>>>>>>>>>> >>>>>>>>>>>> Spark >>> >>>> though >>>>>>> >>>>>>>> the >>>>>>>>> >>>>>>>>>> visualizations are extremely limited in our case. If we >>>>>>>>>>>>> >>>>>>>>>>>> can >>>> >>>>> pass >>>>>>> >>>>>>>> an >>>>>>>> >>>>>>>>> RDD >>>>>>>>>> >>>>>>>>>>> to >>>>>>>>>>>> >>>>>>>>>>>>> pyspark it would allow custom reductions in python >>>>>>>>>>>>> >>>>>>>>>>>> before >>> >>>> plotting, >>>>>>>> >>>>>>>>> even >>>>>>>>>>> >>>>>>>>>>>> though we will support many natively in Mahout. I’m >>>>>>>>>>>>> >>>>>>>>>>>> guessing >>>>> >>>>>> that >>>>>>> >>>>>>>> this >>>>>>>>>> >>>>>>>>>>> would cross a context boundary and require a write to >>>>>>>>>>>>> >>>>>>>>>>>> disk? >>>> >>>>> >>>>>>>>>>>>> So 2 questions: >>>>>>>>>>>>> 1) what does the inter language support look like with >>>>>>>>>>>>> >>>>>>>>>>>> Spark >>>>> >>>>>> python >>>>>>>> >>>>>>>>> vs >>>>>>>>>> >>>>>>>>>>> SparkR, can we transfer RDDs? >>>>>>>>>>>>> 2) are the plot libs significantly different? >>>>>>>>>>>>> >>>>>>>>>>>>> On May 20, 2016, at 9:54 AM, Trevor Grant < >>>>>>>>>>>>> >>>>>>>>>>>> [email protected]> >>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Dmitriy really nailed it on the head in his reply to >>>>>>>>>>>>> >>>>>>>>>>>> the >>> >>>> post >>>>> >>>>>> which >>>>>>>> >>>>>>>>> I'll >>>>>>>>>>> >>>>>>>>>>>> rebroadcast below. In essence the whole reason you are >>>>>>>>>>>>> >>>>>>>>>>>> (theoretically) >>>>>>>>>> >>>>>>>>>>> using Mahout is the data is to big to fit in memory. >>>>>>>>>>>>> >>>>>>>>>>>> If >>> >>>> it's >>>>> >>>>>> to >>>>>>> >>>>>>>> big >>>>>>>>> >>>>>>>>>> to >>>>>>>>>> >>>>>>>>>>> fit >>>>>>>>>>>> >>>>>>>>>>>>> in memory, well then its probably too big to plot each >>>>>>>>>>>>> >>>>>>>>>>>> point >>>>> >>>>>> (e.g. >>>>>>>> >>>>>>>>> trillions of row, you only have so many pixels). For >>>>>>>>>>>>> >>>>>>>>>>>> the >>>> >>>>> example >>>>>>>> >>>>>>>>> I >>>>>>>>> >>>>>>>>>> randomly sampled a matrix. >>>>>>>>>>>>> >>>>>>>>>>>>> So as Dmitriy says, in Mahout we need to have functions >>>>>>>>>>>>> >>>>>>>>>>>> that >>>>> >>>>>> will >>>>>>> >>>>>>>> 'preprocess' the data into something plotable. >>>>>>>>>>>>> >>>>>>>>>>>>> For the Zepplin-Plotting thing, we need to have a >>>>>>>>>>>>> >>>>>>>>>>>> function >>>> >>>>> that >>>>>> >>>>>>> will >>>>>>>>> >>>>>>>>>> spit >>>>>>>>>>> >>>>>>>>>>>> out a tsv like string of the data we wanted plotted. >>>>>>>>>>>>> >>>>>>>>>>>>> I agree an honest Mahout interpreter in Zeppelin is >>>>>>>>>>>>> >>>>>>>>>>>> probably >>>>> >>>>>> worth >>>>>>>> >>>>>>>>> doing. >>>>>>>>>>> >>>>>>>>>>>> There are a couple of ways to go about it. I opened up >>>>>>>>>>>>> >>>>>>>>>>>> the >>>> >>>>> discussion >>>>>>>>> >>>>>>>>>> on >>>>>>>>>>> >>>>>>>>>>>> dev@Zeppelin and didn't get any replies. I'm going to >>>>>>>>>>>>> >>>>>>>>>>>> take >>>> >>>>> that >>>>>>> >>>>>>>> to >>>>>>>> >>>>>>>>> mean >>>>>>>>>>> >>>>>>>>>>>> we >>>>>>>>>>>> >>>>>>>>>>>>> can do it in a way that makes the most sense to Mahout >>>>>>>>>>>>> >>>>>>>>>>>> users... >>>>>> >>>>>>> >>>>>>>>>>>>> First steps are to include some methods in Mahout that >>>>>>>>>>>>> >>>>>>>>>>>> will >>>> >>>>> do >>>>>> >>>>>>> that >>>>>>>> >>>>>>>>> preprocessing, and one that will turn something into a >>>>>>>>>>>>> >>>>>>>>>>>> tsv >>>> >>>>> string. >>>>>>>> >>>>>>>>> >>>>>>>>>>>>> I have some general ideas on possible approached to >>>>>>>>>>>>> >>>>>>>>>>>> making >>>> >>>>> an >>>>> >>>>>> honest-mahout >>>>>>>>>>>> >>>>>>>>>>>>> interpreter but I want to play in the code and look at >>>>>>>>>>>>> >>>>>>>>>>>> the >>>> >>>>> Flink-Mahout >>>>>>>>>> >>>>>>>>>>> shell a bit before I try to organize my thoughts and >>>>>>>>>>>>> >>>>>>>>>>>> present >>>>> >>>>>> them. >>>>>>>> >>>>>>>>> >>>>>>>>>>>>> ...(2) not sure what is the point of supporting >>>>>>>>>>>>> >>>>>>>>>>>> distributed >>>> >>>>> anything. >>>>>>>>> >>>>>>>>>> It >>>>>>>>>>> >>>>>>>>>>>> is >>>>>>>>>>>> >>>>>>>>>>>>> distributed presumably because it is hard to keep it in >>>>>>>>>>>>> >>>>>>>>>>>> memory. >>>>>> >>>>>>> Therefore, >>>>>>>>>>>> >>>>>>>>>>>>> plotting anything distributed potentially presents 2 >>>>>>>>>>>>> >>>>>>>>>>>> problems: >>>>>> >>>>>>> storage >>>>>>>>>> >>>>>>>>>>> space and overplotting due to number of points. The >>>>>>>>>>>>> >>>>>>>>>>>> idea >>> >>>> is >>>> >>>>> that >>>>>>> >>>>>>>> we >>>>>>>> >>>>>>>>> have >>>>>>>>>>> >>>>>>>>>>>> to >>>>>>>>>>>> >>>>>>>>>>>>> work out algorithms that condense big data information >>>>>>>>>>>>> >>>>>>>>>>>> into >>>> >>>>> small >>>>>>> >>>>>>>> plottable >>>>>>>>>>>> >>>>>>>>>>>>> information (like density grids, for example, or >>>>>>>>>>>>> >>>>>>>>>>>> histograms).... >>>>>>> >>>>>>>> >>>>>>>>>>>>> Trevor Grant >>>>>>>>>>>>> Data Scientist >>>>>>>>>>>>> https://github.com/rawkintrevo >>>>>>>>>>>>> http://stackexchange.com/users/3002022/rawkintrevo >>>>>>>>>>>>> http://trevorgrant.org >>>>>>>>>>>>> >>>>>>>>>>>>> *"Fortunate is he, who is able to know the causes of >>>>>>>>>>>>> >>>>>>>>>>>> things." >>>>> >>>>>> -Virgil* >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, May 20, 2016 at 10:22 AM, Pat Ferrel < >>>>>>>>>>>>> >>>>>>>>>>>> [email protected]> >>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Great job Trevor, we’ll need this detail to smooth >>>>>>>>>>>>>> >>>>>>>>>>>>> out >>> >>>> the >>>>> >>>>>> sharp >>>>>>>> >>>>>>>>> edges >>>>>>>>>>> >>>>>>>>>>>> and >>>>>>>>>>>>> >>>>>>>>>>>>>> any guidance from you or the Zeppelin community will >>>>>>>>>>>>>> >>>>>>>>>>>>> be a >>>> >>>>> big >>>>>> >>>>>>> help. >>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On May 20, 2016, at 8:13 AM, Shannon Quinn < >>>>>>>>>>>>>> >>>>>>>>>>>>> [email protected]> >>>>>>> >>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> Agreed, thoroughly enjoying the blog post. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 5/19/16 12:01 AM, Andrew Palumbo wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Well done, Trevor! I've not yet had a chance to try >>>>>>>>>>>>>>> >>>>>>>>>>>>>> this >>>>> >>>>>> in >>>>>> >>>>>>> zeppelin >>>>>>>>>>> >>>>>>>>>>>> but I just read the blog which is great! >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -------- Original message -------- >>>>>>>>>>>>>>> From: Trevor Grant <[email protected]> >>>>>>>>>>>>>>> Date: 05/18/2016 2:44 PM (GMT-05:00) >>>>>>>>>>>>>>> To: [email protected] >>>>>>>>>>>>>>> Subject: Re: Future Mahout - Zeppelin work >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Ah thank you. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Fixing now. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Trevor Grant >>>>>>>>>>>>>>> Data Scientist >>>>>>>>>>>>>>> https://github.com/rawkintrevo >>>>>>>>>>>>>>> http://stackexchange.com/users/3002022/rawkintrevo >>>>>>>>>>>>>>> http://trevorgrant.org >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> *"Fortunate is he, who is able to know the causes of >>>>>>>>>>>>>>> >>>>>>>>>>>>>> things." >>>>>>> >>>>>>>> -Virgil* >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Wed, May 18, 2016 at 1:04 PM, Andrew Palumbo < >>>>>>>>>>>>>>> >>>>>>>>>>>>>> [email protected] >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hey Trevor- Just refreshed your readme. The jar >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> that I >>>> >>>>> mentioned >>>>>>>>> >>>>>>>>>> is >>>>>>>>>>> >>>>>>>>>>>> actually: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> /home/username/.m2/repository/org/apache/mahout/mahout-spark_2.10/0.12.1-SNAPSHOT/mahout-spark_2.10-0.12.1-SNAPSHOT-dependency-reduced.jar >>> >>>> >>>>>>>>>>>>>>>> rather than: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> /home/username/.m2/repository/org/apache/mahout/mahout-spark-shell_2.10/0.12.1-SNAPSHOT/mahout-spark_2.10-0.12.1-SNAPSHOT-dependency-reduced.jar >>> >>>> >>>>>>>>>>>>>>>> (In the spark module that is) >>>>>>>>>>>>>>>> ________________________________________ >>>>>>>>>>>>>>>> From: Trevor Grant <[email protected]> >>>>>>>>>>>>>>>> Sent: Wednesday, May 18, 2016 11:02:43 AM >>>>>>>>>>>>>>>> To: [email protected] >>>>>>>>>>>>>>>> Subject: Re: Future Mahout - Zeppelin work >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ah yes- I remember you pointing that out to me too. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I got side tracked yesterday for most of the day on >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> an >>>> >>>>> adventure >>>>>>>>> >>>>>>>>>> in >>>>>>>>>> >>>>>>>>>>> getting >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Zeppelin to work right after I accidently updated >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> to >>> >>>> the >>>>> >>>>>> new >>>>>>> >>>>>>>> snapshot >>>>>>>>>>> >>>>>>>>>>>> (free >>>>>>>>>>>>>> >>>>>>>>>>>>>>> hint: the secret was to clear my cache *face-palm*) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I'm going to add that dependency to the readme.md >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> now. >>>> >>>>> >>>>>>>>>>>>>>>> thanks, >>>>>>>>>>>>>>>> tg >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Trevor Grant >>>>>>>>>>>>>>>> Data Scientist >>>>>>>>>>>>>>>> https://github.com/rawkintrevo >>>>>>>>>>>>>>>> http://stackexchange.com/users/3002022/rawkintrevo >>>>>>>>>>>>>>>> http://trevorgrant.org >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> *"Fortunate is he, who is able to know the causes >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> of >>> >>>> things." >>>>>>> >>>>>>>> -Virgil* >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Wed, May 18, 2016 at 9:59 AM, Andrew Palumbo < >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [email protected]> >>>>>>>>>>> >>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Trevor this is very cool- I have not been able to >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> look >>>> >>>>> at >>>>>> >>>>>>> it >>>>>>> >>>>>>>> closely >>>>>>>>>>> >>>>>>>>>>>> yet >>>>>>>>>>>>>> >>>>>>>>>>>>>>> but just a small point: I believe that you'll also >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> need >>>>> >>>>>> to >>>>>> >>>>>>> add >>>>>>>> >>>>>>>>> the >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> mahout-spark_2.10-0.12.1-SNAPSHOT-dependency-reduced.jar >>>>> >>>>>> >>>>>>>>>>>>>>>>> For things like the classification stats, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> confusion >>> >>>> matrix, >>>>>>> >>>>>>>> and >>>>>>>>> >>>>>>>>>> t-digest. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Andy >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ________________________________________ >>>>>>>>>>>>>>>>> From: Trevor Grant <[email protected]> >>>>>>>>>>>>>>>>> Sent: Wednesday, May 18, 2016 10:47:21 AM >>>>>>>>>>>>>>>>> To: [email protected] >>>>>>>>>>>>>>>>> Subject: Re: Future Mahout - Zeppelin work >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I still need to update my readme/env per Pat's >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> comments >>>>> >>>>>> below, >>>>>>>> >>>>>>>>> however >>>>>>>>>>>> >>>>>>>>>>>>> with >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> out further ado, I present two notebooks that >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> integrate >>>>> >>>>>> Mahout + >>>>>>>>> >>>>>>>>>> Spark >>>>>>>>>>>> >>>>>>>>>>>>> + >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Zeppelin + ggplot2 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>
