Ok, cool.

So what state is Crunch in? I take it is in a fairly advanced state.
So every api mentioned in the  FlumeJava paper is working , right? Or
there's something that is not working specifically?

On Fri, Oct 26, 2012 at 2:31 PM, Josh Wills <[email protected]> wrote:
> Hey Dmitriy,
>
> Got a fork going and looking forward to playing with crunchR this weekend--
> thanks!
>
> J
>
> On Wed, Oct 24, 2012 at 1:28 PM, Dmitriy Lyubimov <[email protected]> wrote:
>
>> Project template https://github.com/dlyubimov/crunchR
>>
>> Default profile does not compile R artifact . R profile compiles R
>> artifact. for convenience, it is enabled by supplying -DR to mvn
>> command line, e.g.
>>
>> mvn install -DR
>>
>> there's also a helper that installs the snapshot version of the
>> package in the crunchR module.
>>
>> There's RJava and JRI java dependencies which i did not find anywhere
>> in public maven repos; so it is installed into my github maven repo so
>> far. Should compile for 3rd party.
>>
>> -DR compilation requires R, RJava and optionally, RProtoBuf. R Doc
>> compilation requires roxygen2 (i think).
>>
>> For some reason RProtoBuf fails to import into another package, got a
>> weird exception when i put @import RProtoBuf into crunchR, so
>> RProtoBuf is now in "Suggests" category. Down the road that may be a
>> problem though...
>>
>> other than the template, not much else has been done so far... finding
>> hadoop libraries and adding it to the package path on initialization
>> via "hadoop classpath"... adding Crunch jars and its non-"provided"
>> transitives to the crunchR's java part...
>>
>> No legal stuff...
>>
>> No readmes... complete stealth at this point.
>>
>> On Thu, Oct 18, 2012 at 12:35 PM, Dmitriy Lyubimov <[email protected]>
>> wrote:
>> > Ok, cool. I will try to roll project template by some time next week.
>> > we can start with prototyping and benchmarking something really
>> > simple, such as parallelDo().
>> >
>> > My interim goal is to perhaps take some more or less simple algorithm
>> > from Mahout and demonstrate it can be solved with Rcrunch (or whatever
>> > name it has to be) in a comparable time (performance) but with much
>> > fewer lines of code. (say one of factorization or clustering things)
>> >
>> >
>> > On Wed, Oct 17, 2012 at 10:24 PM, Rahul <[email protected]> wrote:
>> >> I am not much of R user but I am interested to see how well we can
>> integrate
>> >> the two. I would be happy to help.
>> >>
>> >> regards,
>> >> Rahul
>> >>
>> >> On 18-10-2012 04:04, Josh Wills wrote:
>> >>>
>> >>> On Wed, Oct 17, 2012 at 3:07 PM, Dmitriy Lyubimov <[email protected]>
>> >>> wrote:
>> >>>>
>> >>>> Yep, ok.
>> >>>>
>> >>>> I imagine it has to be an R module so I can set up a maven project
>> >>>> with java/R code tree (I have been doing that a lot lately). Or if you
>> >>>> have a template to look at, it would be useful i guess too.
>> >>>
>> >>> No, please go right ahead.
>> >>>
>> >>>>
>> >>>> On Wed, Oct 17, 2012 at 3:02 PM, Josh Wills <[email protected]>
>> wrote:
>> >>>>>
>> >>>>> I'd like it to be separate at first, but I am happy to help. Github
>> >>>>> repo?
>> >>>>> On Oct 17, 2012 2:57 PM, "Dmitriy Lyubimov" <[email protected]>
>> wrote:
>> >>>>>
>> >>>>>> Ok maybe there's a benefit to try a JRI/RJava prototype on top of
>> >>>>>> Crunch for something simple. This should both save time and prove or
>> >>>>>> disprove if Crunch via RJava integration is viable.
>> >>>>>>
>> >>>>>> On my part i can try to do it within Crunch framework or we can keep
>> >>>>>> it completely separate.
>> >>>>>>
>> >>>>>> -d
>> >>>>>>
>> >>>>>> On Wed, Oct 17, 2012 at 2:08 PM, Josh Wills <[email protected]>
>> >>>>>> wrote:
>> >>>>>>>
>> >>>>>>> I am an avid R user and would be into it-- who gave the talk? Was
>> it
>> >>>>>>> Murray Stokely?
>> >>>>>>>
>> >>>>>>> On Wed, Oct 17, 2012 at 2:05 PM, Dmitriy Lyubimov <
>> [email protected]>
>> >>>>>>
>> >>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>> Hello,
>> >>>>>>>>
>> >>>>>>>> I was pretty excited to learn of Google's experience of R mapping
>> of
>> >>>>>>>> flume java on one of recent BARUGs. I think a lot of applications
>> >>>>>>>> similar to what we do in Mahout could be prototyped using flume R.
>> >>>>>>>>
>> >>>>>>>> I did not quite get the details of Google implementation of R
>> >>>>>>>> mapping,
>> >>>>>>>> but i am not sure if just a direct mapping from R to Crunch would
>> be
>> >>>>>>>> sufficient (and, for most part, efficient). RJava/JRI and jni
>> seem to
>> >>>>>>>> be a pretty terrible performer to do that directly.
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> on top of it, I am thinknig if this project could have a
>> contributed
>> >>>>>>>> adapter to Mahout's distributed matrices, that would be just a
>> very
>> >>>>>>>> good synergy.
>> >>>>>>>>
>> >>>>>>>> Is there anyone interested in contributing/advising for open
>> source
>> >>>>>>>> version of flume R support? Just gauging interest, Crunch list
>> seems
>> >>>>>>>> like a natural place to poke.
>> >>>>>>>>
>> >>>>>>>> Thanks .
>> >>>>>>>>
>> >>>>>>>> -Dmitriy
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> --
>> >>>>>>> Director of Data Science
>> >>>>>>> Cloudera
>> >>>>>>> Twitter: @josh_wills
>> >>>
>> >>>
>> >>>
>> >>
>>
>
>
>
> --
> Director of Data Science
> Cloudera <http://www.cloudera.com>
> Twitter: @josh_wills <http://twitter.com/josh_wills>

Reply via email to