Ok, cool. So what state is Crunch in? I take it is in a fairly advanced state. So every api mentioned in the FlumeJava paper is working , right? Or there's something that is not working specifically?
On Fri, Oct 26, 2012 at 2:31 PM, Josh Wills <[email protected]> wrote: > Hey Dmitriy, > > Got a fork going and looking forward to playing with crunchR this weekend-- > thanks! > > J > > On Wed, Oct 24, 2012 at 1:28 PM, Dmitriy Lyubimov <[email protected]> wrote: > >> Project template https://github.com/dlyubimov/crunchR >> >> Default profile does not compile R artifact . R profile compiles R >> artifact. for convenience, it is enabled by supplying -DR to mvn >> command line, e.g. >> >> mvn install -DR >> >> there's also a helper that installs the snapshot version of the >> package in the crunchR module. >> >> There's RJava and JRI java dependencies which i did not find anywhere >> in public maven repos; so it is installed into my github maven repo so >> far. Should compile for 3rd party. >> >> -DR compilation requires R, RJava and optionally, RProtoBuf. R Doc >> compilation requires roxygen2 (i think). >> >> For some reason RProtoBuf fails to import into another package, got a >> weird exception when i put @import RProtoBuf into crunchR, so >> RProtoBuf is now in "Suggests" category. Down the road that may be a >> problem though... >> >> other than the template, not much else has been done so far... finding >> hadoop libraries and adding it to the package path on initialization >> via "hadoop classpath"... adding Crunch jars and its non-"provided" >> transitives to the crunchR's java part... >> >> No legal stuff... >> >> No readmes... complete stealth at this point. >> >> On Thu, Oct 18, 2012 at 12:35 PM, Dmitriy Lyubimov <[email protected]> >> wrote: >> > Ok, cool. I will try to roll project template by some time next week. >> > we can start with prototyping and benchmarking something really >> > simple, such as parallelDo(). >> > >> > My interim goal is to perhaps take some more or less simple algorithm >> > from Mahout and demonstrate it can be solved with Rcrunch (or whatever >> > name it has to be) in a comparable time (performance) but with much >> > fewer lines of code. (say one of factorization or clustering things) >> > >> > >> > On Wed, Oct 17, 2012 at 10:24 PM, Rahul <[email protected]> wrote: >> >> I am not much of R user but I am interested to see how well we can >> integrate >> >> the two. I would be happy to help. >> >> >> >> regards, >> >> Rahul >> >> >> >> On 18-10-2012 04:04, Josh Wills wrote: >> >>> >> >>> On Wed, Oct 17, 2012 at 3:07 PM, Dmitriy Lyubimov <[email protected]> >> >>> wrote: >> >>>> >> >>>> Yep, ok. >> >>>> >> >>>> I imagine it has to be an R module so I can set up a maven project >> >>>> with java/R code tree (I have been doing that a lot lately). Or if you >> >>>> have a template to look at, it would be useful i guess too. >> >>> >> >>> No, please go right ahead. >> >>> >> >>>> >> >>>> On Wed, Oct 17, 2012 at 3:02 PM, Josh Wills <[email protected]> >> wrote: >> >>>>> >> >>>>> I'd like it to be separate at first, but I am happy to help. Github >> >>>>> repo? >> >>>>> On Oct 17, 2012 2:57 PM, "Dmitriy Lyubimov" <[email protected]> >> wrote: >> >>>>> >> >>>>>> Ok maybe there's a benefit to try a JRI/RJava prototype on top of >> >>>>>> Crunch for something simple. This should both save time and prove or >> >>>>>> disprove if Crunch via RJava integration is viable. >> >>>>>> >> >>>>>> On my part i can try to do it within Crunch framework or we can keep >> >>>>>> it completely separate. >> >>>>>> >> >>>>>> -d >> >>>>>> >> >>>>>> On Wed, Oct 17, 2012 at 2:08 PM, Josh Wills <[email protected]> >> >>>>>> wrote: >> >>>>>>> >> >>>>>>> I am an avid R user and would be into it-- who gave the talk? Was >> it >> >>>>>>> Murray Stokely? >> >>>>>>> >> >>>>>>> On Wed, Oct 17, 2012 at 2:05 PM, Dmitriy Lyubimov < >> [email protected]> >> >>>>>> >> >>>>>> wrote: >> >>>>>>>> >> >>>>>>>> Hello, >> >>>>>>>> >> >>>>>>>> I was pretty excited to learn of Google's experience of R mapping >> of >> >>>>>>>> flume java on one of recent BARUGs. I think a lot of applications >> >>>>>>>> similar to what we do in Mahout could be prototyped using flume R. >> >>>>>>>> >> >>>>>>>> I did not quite get the details of Google implementation of R >> >>>>>>>> mapping, >> >>>>>>>> but i am not sure if just a direct mapping from R to Crunch would >> be >> >>>>>>>> sufficient (and, for most part, efficient). RJava/JRI and jni >> seem to >> >>>>>>>> be a pretty terrible performer to do that directly. >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> on top of it, I am thinknig if this project could have a >> contributed >> >>>>>>>> adapter to Mahout's distributed matrices, that would be just a >> very >> >>>>>>>> good synergy. >> >>>>>>>> >> >>>>>>>> Is there anyone interested in contributing/advising for open >> source >> >>>>>>>> version of flume R support? Just gauging interest, Crunch list >> seems >> >>>>>>>> like a natural place to poke. >> >>>>>>>> >> >>>>>>>> Thanks . >> >>>>>>>> >> >>>>>>>> -Dmitriy >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> -- >> >>>>>>> Director of Data Science >> >>>>>>> Cloudera >> >>>>>>> Twitter: @josh_wills >> >>> >> >>> >> >>> >> >> >> > > > > -- > Director of Data Science > Cloudera <http://www.cloudera.com> > Twitter: @josh_wills <http://twitter.com/josh_wills>
