Hey Dmitriy, Got a fork going and looking forward to playing with crunchR this weekend-- thanks!
J On Wed, Oct 24, 2012 at 1:28 PM, Dmitriy Lyubimov <[email protected]> wrote: > Project template https://github.com/dlyubimov/crunchR > > Default profile does not compile R artifact . R profile compiles R > artifact. for convenience, it is enabled by supplying -DR to mvn > command line, e.g. > > mvn install -DR > > there's also a helper that installs the snapshot version of the > package in the crunchR module. > > There's RJava and JRI java dependencies which i did not find anywhere > in public maven repos; so it is installed into my github maven repo so > far. Should compile for 3rd party. > > -DR compilation requires R, RJava and optionally, RProtoBuf. R Doc > compilation requires roxygen2 (i think). > > For some reason RProtoBuf fails to import into another package, got a > weird exception when i put @import RProtoBuf into crunchR, so > RProtoBuf is now in "Suggests" category. Down the road that may be a > problem though... > > other than the template, not much else has been done so far... finding > hadoop libraries and adding it to the package path on initialization > via "hadoop classpath"... adding Crunch jars and its non-"provided" > transitives to the crunchR's java part... > > No legal stuff... > > No readmes... complete stealth at this point. > > On Thu, Oct 18, 2012 at 12:35 PM, Dmitriy Lyubimov <[email protected]> > wrote: > > Ok, cool. I will try to roll project template by some time next week. > > we can start with prototyping and benchmarking something really > > simple, such as parallelDo(). > > > > My interim goal is to perhaps take some more or less simple algorithm > > from Mahout and demonstrate it can be solved with Rcrunch (or whatever > > name it has to be) in a comparable time (performance) but with much > > fewer lines of code. (say one of factorization or clustering things) > > > > > > On Wed, Oct 17, 2012 at 10:24 PM, Rahul <[email protected]> wrote: > >> I am not much of R user but I am interested to see how well we can > integrate > >> the two. I would be happy to help. > >> > >> regards, > >> Rahul > >> > >> On 18-10-2012 04:04, Josh Wills wrote: > >>> > >>> On Wed, Oct 17, 2012 at 3:07 PM, Dmitriy Lyubimov <[email protected]> > >>> wrote: > >>>> > >>>> Yep, ok. > >>>> > >>>> I imagine it has to be an R module so I can set up a maven project > >>>> with java/R code tree (I have been doing that a lot lately). Or if you > >>>> have a template to look at, it would be useful i guess too. > >>> > >>> No, please go right ahead. > >>> > >>>> > >>>> On Wed, Oct 17, 2012 at 3:02 PM, Josh Wills <[email protected]> > wrote: > >>>>> > >>>>> I'd like it to be separate at first, but I am happy to help. Github > >>>>> repo? > >>>>> On Oct 17, 2012 2:57 PM, "Dmitriy Lyubimov" <[email protected]> > wrote: > >>>>> > >>>>>> Ok maybe there's a benefit to try a JRI/RJava prototype on top of > >>>>>> Crunch for something simple. This should both save time and prove or > >>>>>> disprove if Crunch via RJava integration is viable. > >>>>>> > >>>>>> On my part i can try to do it within Crunch framework or we can keep > >>>>>> it completely separate. > >>>>>> > >>>>>> -d > >>>>>> > >>>>>> On Wed, Oct 17, 2012 at 2:08 PM, Josh Wills <[email protected]> > >>>>>> wrote: > >>>>>>> > >>>>>>> I am an avid R user and would be into it-- who gave the talk? Was > it > >>>>>>> Murray Stokely? > >>>>>>> > >>>>>>> On Wed, Oct 17, 2012 at 2:05 PM, Dmitriy Lyubimov < > [email protected]> > >>>>>> > >>>>>> wrote: > >>>>>>>> > >>>>>>>> Hello, > >>>>>>>> > >>>>>>>> I was pretty excited to learn of Google's experience of R mapping > of > >>>>>>>> flume java on one of recent BARUGs. I think a lot of applications > >>>>>>>> similar to what we do in Mahout could be prototyped using flume R. > >>>>>>>> > >>>>>>>> I did not quite get the details of Google implementation of R > >>>>>>>> mapping, > >>>>>>>> but i am not sure if just a direct mapping from R to Crunch would > be > >>>>>>>> sufficient (and, for most part, efficient). RJava/JRI and jni > seem to > >>>>>>>> be a pretty terrible performer to do that directly. > >>>>>>>> > >>>>>>>> > >>>>>>>> on top of it, I am thinknig if this project could have a > contributed > >>>>>>>> adapter to Mahout's distributed matrices, that would be just a > very > >>>>>>>> good synergy. > >>>>>>>> > >>>>>>>> Is there anyone interested in contributing/advising for open > source > >>>>>>>> version of flume R support? Just gauging interest, Crunch list > seems > >>>>>>>> like a natural place to poke. > >>>>>>>> > >>>>>>>> Thanks . > >>>>>>>> > >>>>>>>> -Dmitriy > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> -- > >>>>>>> Director of Data Science > >>>>>>> Cloudera > >>>>>>> Twitter: @josh_wills > >>> > >>> > >>> > >> > -- Director of Data Science Cloudera <http://www.cloudera.com> Twitter: @josh_wills <http://twitter.com/josh_wills>
