On Fri, Oct 26, 2012 at 2:40 PM, Dmitriy Lyubimov <[email protected]> wrote: > Ok, cool. > > So what state is Crunch in? I take it is in a fairly advanced state. > So every api mentioned in the FlumeJava paper is working , right? Or > there's something that is not working specifically?
I think the only thing in the paper that we don't have in a working state is MSCR fusion. It's mostly just a question of prioritizing it and getting the work done. > > On Fri, Oct 26, 2012 at 2:31 PM, Josh Wills <[email protected]> wrote: >> Hey Dmitriy, >> >> Got a fork going and looking forward to playing with crunchR this weekend-- >> thanks! >> >> J >> >> On Wed, Oct 24, 2012 at 1:28 PM, Dmitriy Lyubimov <[email protected]> wrote: >> >>> Project template https://github.com/dlyubimov/crunchR >>> >>> Default profile does not compile R artifact . R profile compiles R >>> artifact. for convenience, it is enabled by supplying -DR to mvn >>> command line, e.g. >>> >>> mvn install -DR >>> >>> there's also a helper that installs the snapshot version of the >>> package in the crunchR module. >>> >>> There's RJava and JRI java dependencies which i did not find anywhere >>> in public maven repos; so it is installed into my github maven repo so >>> far. Should compile for 3rd party. >>> >>> -DR compilation requires R, RJava and optionally, RProtoBuf. R Doc >>> compilation requires roxygen2 (i think). >>> >>> For some reason RProtoBuf fails to import into another package, got a >>> weird exception when i put @import RProtoBuf into crunchR, so >>> RProtoBuf is now in "Suggests" category. Down the road that may be a >>> problem though... >>> >>> other than the template, not much else has been done so far... finding >>> hadoop libraries and adding it to the package path on initialization >>> via "hadoop classpath"... adding Crunch jars and its non-"provided" >>> transitives to the crunchR's java part... >>> >>> No legal stuff... >>> >>> No readmes... complete stealth at this point. >>> >>> On Thu, Oct 18, 2012 at 12:35 PM, Dmitriy Lyubimov <[email protected]> >>> wrote: >>> > Ok, cool. I will try to roll project template by some time next week. >>> > we can start with prototyping and benchmarking something really >>> > simple, such as parallelDo(). >>> > >>> > My interim goal is to perhaps take some more or less simple algorithm >>> > from Mahout and demonstrate it can be solved with Rcrunch (or whatever >>> > name it has to be) in a comparable time (performance) but with much >>> > fewer lines of code. (say one of factorization or clustering things) >>> > >>> > >>> > On Wed, Oct 17, 2012 at 10:24 PM, Rahul <[email protected]> wrote: >>> >> I am not much of R user but I am interested to see how well we can >>> integrate >>> >> the two. I would be happy to help. >>> >> >>> >> regards, >>> >> Rahul >>> >> >>> >> On 18-10-2012 04:04, Josh Wills wrote: >>> >>> >>> >>> On Wed, Oct 17, 2012 at 3:07 PM, Dmitriy Lyubimov <[email protected]> >>> >>> wrote: >>> >>>> >>> >>>> Yep, ok. >>> >>>> >>> >>>> I imagine it has to be an R module so I can set up a maven project >>> >>>> with java/R code tree (I have been doing that a lot lately). Or if you >>> >>>> have a template to look at, it would be useful i guess too. >>> >>> >>> >>> No, please go right ahead. >>> >>> >>> >>>> >>> >>>> On Wed, Oct 17, 2012 at 3:02 PM, Josh Wills <[email protected]> >>> wrote: >>> >>>>> >>> >>>>> I'd like it to be separate at first, but I am happy to help. Github >>> >>>>> repo? >>> >>>>> On Oct 17, 2012 2:57 PM, "Dmitriy Lyubimov" <[email protected]> >>> wrote: >>> >>>>> >>> >>>>>> Ok maybe there's a benefit to try a JRI/RJava prototype on top of >>> >>>>>> Crunch for something simple. This should both save time and prove or >>> >>>>>> disprove if Crunch via RJava integration is viable. >>> >>>>>> >>> >>>>>> On my part i can try to do it within Crunch framework or we can keep >>> >>>>>> it completely separate. >>> >>>>>> >>> >>>>>> -d >>> >>>>>> >>> >>>>>> On Wed, Oct 17, 2012 at 2:08 PM, Josh Wills <[email protected]> >>> >>>>>> wrote: >>> >>>>>>> >>> >>>>>>> I am an avid R user and would be into it-- who gave the talk? Was >>> it >>> >>>>>>> Murray Stokely? >>> >>>>>>> >>> >>>>>>> On Wed, Oct 17, 2012 at 2:05 PM, Dmitriy Lyubimov < >>> [email protected]> >>> >>>>>> >>> >>>>>> wrote: >>> >>>>>>>> >>> >>>>>>>> Hello, >>> >>>>>>>> >>> >>>>>>>> I was pretty excited to learn of Google's experience of R mapping >>> of >>> >>>>>>>> flume java on one of recent BARUGs. I think a lot of applications >>> >>>>>>>> similar to what we do in Mahout could be prototyped using flume R. >>> >>>>>>>> >>> >>>>>>>> I did not quite get the details of Google implementation of R >>> >>>>>>>> mapping, >>> >>>>>>>> but i am not sure if just a direct mapping from R to Crunch would >>> be >>> >>>>>>>> sufficient (and, for most part, efficient). RJava/JRI and jni >>> seem to >>> >>>>>>>> be a pretty terrible performer to do that directly. >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> on top of it, I am thinknig if this project could have a >>> contributed >>> >>>>>>>> adapter to Mahout's distributed matrices, that would be just a >>> very >>> >>>>>>>> good synergy. >>> >>>>>>>> >>> >>>>>>>> Is there anyone interested in contributing/advising for open >>> source >>> >>>>>>>> version of flume R support? Just gauging interest, Crunch list >>> seems >>> >>>>>>>> like a natural place to poke. >>> >>>>>>>> >>> >>>>>>>> Thanks . >>> >>>>>>>> >>> >>>>>>>> -Dmitriy >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> -- >>> >>>>>>> Director of Data Science >>> >>>>>>> Cloudera >>> >>>>>>> Twitter: @josh_wills >>> >>> >>> >>> >>> >>> >>> >> >>> >> >> >> >> -- >> Director of Data Science >> Cloudera <http://www.cloudera.com> >> Twitter: @josh_wills <http://twitter.com/josh_wills>
