Re: Flume R -- any interest?

Dmitriy Lyubimov Fri, 16 Nov 2012 14:52:37 -0800

no it is fully distributed testing.

It is ok, StatEt handles log4j logging for me so i see the logs. I was
wondering if any end-to-end diagnostics is already embedded in Crunch  but
reporting backend errors to front end is notoriously hard (and sometimes,
impossible) with hadoop, so I assume it doesn't make sense to report
client-only stuff thru exception while the other stuff still requires
checking isSucceeded().




On Fri, Nov 16, 2012 at 11:07 AM, Josh Wills <[email protected]> wrote:

> Are you running this using LocalJobRunner? Does calling
> Pipeline.enableDebug() before run() help? If it doesn't, it'll help
> settle a debate I'm having w/Matthias. ;-)
>
> On Fri, Nov 16, 2012 at 10:22 AM, Dmitriy Lyubimov <[email protected]>
> wrote:
> > I see the error in the logs but Pipeline.run() has never thrown anything.
> > isSucceeded() subsequently returns false. Is there any way to extract
> > client-side problem rather than just being able to state that job failed?
> > or it is ok and the only diagnostics by design?
> >
> > ============
> > 68124 [Thread-8] INFO  org.apache.crunch.impl.mr.exec.CrunchJob  -
> > org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
> > does not exist: hdfs://localhost:11010/crunchr-example/input
> > at
> >
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:231)
> > at
> >
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:248)
> > at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:944)
> > at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:961)
> > at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
> > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:880)
> > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at javax.security.auth.Subject.doAs(Subject.java:396)
> > at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
> > at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833)
> > at org.apache.hadoop.mapreduce.Job.submit(Job.java:476)
> > at
> >
> org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob.submit(CrunchControlledJob.java:331)
> > at org.apache.crunch.impl.mr.exec.CrunchJob.submit(CrunchJob.java:135)
> > at
> >
> org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchJobControl.startReadyJobs(CrunchJobControl.java:251)
> > at
> >
> org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchJobControl.run(CrunchJobControl.java:279)
> > at java.lang.Thread.run(Thread.java:662)
> >
> >
> > On Mon, Nov 12, 2012 at 5:41 PM, Dmitriy Lyubimov <[email protected]>
> wrote:
> >
> >> for hadoop nodes i guess yet another option to soft-link the .so into
> >> hadoop's native lib folder
> >>
> >>
> >> On Mon, Nov 12, 2012 at 5:37 PM, Dmitriy Lyubimov <[email protected]
> >wrote:
> >>
> >>> I actually want to defer this to hadoop admins, we just need to create
> a
> >>> procedure for setting up nodes. Ideally as simple as possible.
> something
> >>> like
> >>>
> >>> 1) setup R
> >>> 2) install.packages("rJava","RProtoBuf","crunchR")
> >>> 3) R CMD javareconf
> >>> 3) add result of R --vanilla <<< 'system.file("jri", package="rJava")
> to
> >>> either mapred command lines or LD_LIBRARY_PATH...
> >>>
> >>> but it will depend on their versions of hadoop, jre etc. I hoped crunch
> >>> might have something to hide a lot of that complexity (since it is
> about
> >>> hiding complexities, for the most part :)  ) besides hadoop has a way
> to
> >>> ship .so's to the backend so if crunch had an api to do something
> similar
> >>> it is conceivable that driver might yank and ship it too to hide that
> >>> complexity as well. But then there's a host of issues how to handle
> >>> potentially different rJava versions installed on different nodes...
> So, it
> >>> increasingly looks like something we might want to defer to sysops to
> do
> >>> with approximate set of requirements .
> >>>
> >>>
> >>> On Mon, Nov 12, 2012 at 5:29 PM, Josh Wills <[email protected]>
> wrote:
> >>>
> >>>> On Mon, Nov 12, 2012 at 5:17 PM, Dmitriy Lyubimov <[email protected]>
> >>>> wrote:
> >>>>
> >>>> > so java tasks need to be able to load libjri.so from
> >>>> > whatever system.file("jri", package="rJava") says.
> >>>> >
> >>>> > Traditionally, these issues were handled with -Djava.library.path.
> >>>> > Apparently there's nothing java task can do to enable loadLibrary()
> >>>> command
> >>>> > to see the damn library once started. But -Djava.library.path
> requires
> >>>> for
> >>>> > nodes to configure and lock jvm command line from modifications of
> the
> >>>> > client.  which is fine.
> >>>> >
> >>>> > I also discovered that LD_LIBRARY_PATH actually works with jre 1.6
> >>>> (again).
> >>>> >
> >>>> > but... any other suggestions about best practice configuring crunch
> to
> >>>> run
> >>>> > user's .so's?
> >>>> >
> >>>>
> >>>> Not off the top of my head. I suspect that whatever you come up with
> will
> >>>> become the "best practice." :)
> >>>>
> >>>> >
> >>>> > thanks.
> >>>> >
> >>>> >
> >>>> >
> >>>> >
> >>>> >
> >>>> >
> >>>> > On Sun, Nov 11, 2012 at 1:41 PM, Josh Wills <[email protected]>
> >>>> wrote:
> >>>> >
> >>>> > > I believe that is a safe assumption, at least right now.
> >>>> > >
> >>>> > >
> >>>> > > On Sun, Nov 11, 2012 at 1:38 PM, Dmitriy Lyubimov <
> [email protected]
> >>>> >
> >>>> > > wrote:
> >>>> > >
> >>>> > > > Question.
> >>>> > > >
> >>>> > > > So in Crunch api, initialize() doesn't get an emitter. and the
> >>>> process
> >>>> > > gets
> >>>> > > > emitter every time.
> >>>> > > >
> >>>> > > > However, my guess any single reincranation of a DoFn object in
> the
> >>>> > > backend
> >>>> > > > will always be getting the same emitter thru its lifecycle. Is
> it
> >>>> an
> >>>> > > > admissible assumption or there's currently a counter example to
> >>>> that?
> >>>> > > >
> >>>> > > > The problem is that as i implement the two way pipeline of input
> >>>> and
> >>>> > > > emitter data between R and Java, I am bulking these calls
> together
> >>>> for
> >>>> > > > performance reasons. Each individual datum in these chunks of
> data
> >>>> will
> >>>> > > not
> >>>> > > > have attached emitter function information to them in any way.
> >>>> (well it
> >>>> > > > could but it would be a performance killer and i bet emitter
> never
> >>>> > > > changes).
> >>>> > > >
> >>>> > > > So, thoughts? can i assume emitter never changes between first
> and
> >>>> lass
> >>>> > > > call to DoFn instance?
> >>>> > > >
> >>>> > > > thanks.
> >>>> > > >
> >>>> > > >
> >>>> > > > On Mon, Oct 29, 2012 at 6:32 PM, Dmitriy Lyubimov <
> >>>> [email protected]>
> >>>> > > > wrote:
> >>>> > > >
> >>>> > > > > yes...
> >>>> > > > >
> >>>> > > > > i think it worked for me before, although just adding all jars
> >>>> from R
> >>>> > > > > package distribution would be a little bit more appropriate
> >>>> approach
> >>>> > > > > -- but it creates a problem with jars in dependent R
> packages. I
> >>>> > think
> >>>> > > > > it would be much easier to just compile a hadoop-job file and
> >>>> stick
> >>>> > it
> >>>> > > > > in rather than doing cherry-picking of individual jars from
> who
> >>>> knows
> >>>> > > > > how many locations.
> >>>> > > > >
> >>>> > > > > i think i used the hadoop job format with distributed cache
> >>>> before
> >>>> > and
> >>>> > > > > it worked... at least with Pig "register jar" functionality.
> >>>> > > > >
> >>>> > > > > ok i guess i will just try if it works.
> >>>> > > > >
> >>>> > > > > On Mon, Oct 29, 2012 at 6:24 PM, Josh Wills <
> [email protected]
> >>>> >
> >>>> > > wrote:
> >>>> > > > > > On Mon, Oct 29, 2012 at 5:46 PM, Dmitriy Lyubimov <
> >>>> > [email protected]
> >>>> > > >
> >>>> > > > > wrote:
> >>>> > > > > >
> >>>> > > > > >> Great! so it is in Crunch.
> >>>> > > > > >>
> >>>> > > > > >> does it support hadoop-job jar format or only pure java
> jars?
> >>>> > > > > >>
> >>>> > > > > >
> >>>> > > > > > I think just pure jars-- you're referring to hadoop-job
> format
> >>>> as
> >>>> > > > having
> >>>> > > > > > all the dependencies in a lib/ directory within the jar?
> >>>> > > > > >
> >>>> > > > > >
> >>>> > > > > >>
> >>>> > > > > >> On Mon, Oct 29, 2012 at 5:10 PM, Josh Wills <
> >>>> [email protected]>
> >>>> > > > > wrote:
> >>>> > > > > >> > On Mon, Oct 29, 2012 at 5:04 PM, Dmitriy Lyubimov <
> >>>> > > > [email protected]>
> >>>> > > > > >> wrote:
> >>>> > > > > >> >
> >>>> > > > > >> >> I think i need functionality to add more jars (or
> external
> >>>> > > > > hadoop-jar)
> >>>> > > > > >> >> to drive that from an R package. Just setting job jar by
> >>>> class
> >>>> > is
> >>>> > > > not
> >>>> > > > > >> >> enough. I can push overall job-jar as an addiitonal jar
> to
> >>>> R
> >>>> > > > package;
> >>>> > > > > >> >> however, i cannot really run hadoop command line on it,
> i
> >>>> need
> >>>> > to
> >>>> > > > set
> >>>> > > > > >> >> up classpath thru RJava.
> >>>> > > > > >> >>
> >>>> > > > > >> >> Traditional single hadoop job jar will unlikely work
> here
> >>>> since
> >>>> > > we
> >>>> > > > > >> >> cannot hardcode pipelines in java code but rather have
> to
> >>>> > > construct
> >>>> > > > > >> >> them on the fly. (well, we could serialize pipeline
> >>>> definitions
> >>>> > > > from
> >>>> > > > > R
> >>>> > > > > >> >> and then replay them in a driver -- but that's too
> >>>> cumbersome
> >>>> > and
> >>>> > > > > more
> >>>> > > > > >> >> work than it has to be.) There's no reason why i
> shouldn't
> >>>> be
> >>>> > > able
> >>>> > > > to
> >>>> > > > > >> >> do pig-like "register jar" or "setJobJar" (mahout-like)
> >>>> when
> >>>> > > > kicking
> >>>> > > > > >> >> off a pipeline.
> >>>> > > > > >> >>
> >>>> > > > > >> >
> >>>> > > > > >> > o.a.c.util.DistCache.addJarToDistributedCache?
> >>>> > > > > >> >
> >>>> > > > > >> >
> >>>> > > > > >> >>
> >>>> > > > > >> >>
> >>>> > > > > >> >> On Mon, Oct 29, 2012 at 10:17 AM, Dmitriy Lyubimov <
> >>>> > > > > [email protected]>
> >>>> > > > > >> >> wrote:
> >>>> > > > > >> >> > Ok, sounds very promising...
> >>>> > > > > >> >> >
> >>>> > > > > >> >> > i'll try to start digging on the driver part this week
> >>>> then
> >>>> > > > > (Pipeline
> >>>> > > > > >> >> > wrapper in R5).
> >>>> > > > > >> >> >
> >>>> > > > > >> >> > On Sun, Oct 28, 2012 at 11:56 AM, Josh Wills <
> >>>> > > > [email protected]
> >>>> > > > > >
> >>>> > > > > >> >> wrote:
> >>>> > > > > >> >> >> On Fri, Oct 26, 2012 at 2:40 PM, Dmitriy Lyubimov <
> >>>> > > > > [email protected]
> >>>> > > > > >> >
> >>>> > > > > >> >> wrote:
> >>>> > > > > >> >> >>> Ok, cool.
> >>>> > > > > >> >> >>>
> >>>> > > > > >> >> >>> So what state is Crunch in? I take it is in a fairly
> >>>> > advanced
> >>>> > > > > state.
> >>>> > > > > >> >> >>> So every api mentioned in the  FlumeJava paper is
> >>>> working ,
> >>>> > > > > right?
> >>>> > > > > >> Or
> >>>> > > > > >> >> >>> there's something that is not working specifically?
> >>>> > > > > >> >> >>
> >>>> > > > > >> >> >> I think the only thing in the paper that we don't
> have
> >>>> in a
> >>>> > > > > working
> >>>> > > > > >> >> >> state is MSCR fusion. It's mostly just a question of
> >>>> > > > prioritizing
> >>>> > > > > it
> >>>> > > > > >> >> >> and getting the work done.
> >>>> > > > > >> >> >>
> >>>> > > > > >> >> >>>
> >>>> > > > > >> >> >>> On Fri, Oct 26, 2012 at 2:31 PM, Josh Wills <
> >>>> > > > [email protected]
> >>>> > > > > >
> >>>> > > > > >> >> wrote:
> >>>> > > > > >> >> >>>> Hey Dmitriy,
> >>>> > > > > >> >> >>>>
> >>>> > > > > >> >> >>>> Got a fork going and looking forward to playing
> with
> >>>> > crunchR
> >>>> > > > > this
> >>>> > > > > >> >> weekend--
> >>>> > > > > >> >> >>>> thanks!
> >>>> > > > > >> >> >>>>
> >>>> > > > > >> >> >>>> J
> >>>> > > > > >> >> >>>>
> >>>> > > > > >> >> >>>> On Wed, Oct 24, 2012 at 1:28 PM, Dmitriy Lyubimov <
> >>>> > > > > >> [email protected]>
> >>>> > > > > >> >> wrote:
> >>>> > > > > >> >> >>>>
> >>>> > > > > >> >> >>>>> Project template
> >>>> https://github.com/dlyubimov/crunchR
> >>>> > > > > >> >> >>>>>
> >>>> > > > > >> >> >>>>> Default profile does not compile R artifact . R
> >>>> profile
> >>>> > > > > compiles R
> >>>> > > > > >> >> >>>>> artifact. for convenience, it is enabled by
> >>>> supplying -DR
> >>>> > > to
> >>>> > > > > mvn
> >>>> > > > > >> >> >>>>> command line, e.g.
> >>>> > > > > >> >> >>>>>
> >>>> > > > > >> >> >>>>> mvn install -DR
> >>>> > > > > >> >> >>>>>
> >>>> > > > > >> >> >>>>> there's also a helper that installs the snapshot
> >>>> version
> >>>> > of
> >>>> > > > the
> >>>> > > > > >> >> >>>>> package in the crunchR module.
> >>>> > > > > >> >> >>>>>
> >>>> > > > > >> >> >>>>> There's RJava and JRI java dependencies which i
> did
> >>>> not
> >>>> > > find
> >>>> > > > > >> anywhere
> >>>> > > > > >> >> >>>>> in public maven repos; so it is installed into my
> >>>> github
> >>>> > > > maven
> >>>> > > > > >> repo
> >>>> > > > > >> >> so
> >>>> > > > > >> >> >>>>> far. Should compile for 3rd party.
> >>>> > > > > >> >> >>>>>
> >>>> > > > > >> >> >>>>> -DR compilation requires R, RJava and optionally,
> >>>> > > RProtoBuf.
> >>>> > > > R
> >>>> > > > > Doc
> >>>> > > > > >> >> >>>>> compilation requires roxygen2 (i think).
> >>>> > > > > >> >> >>>>>
> >>>> > > > > >> >> >>>>> For some reason RProtoBuf fails to import into
> >>>> another
> >>>> > > > package,
> >>>> > > > > >> got a
> >>>> > > > > >> >> >>>>> weird exception when i put @import RProtoBuf into
> >>>> > crunchR,
> >>>> > > so
> >>>> > > > > >> >> >>>>> RProtoBuf is now in "Suggests" category. Down the
> >>>> road
> >>>> > that
> >>>> > > > may
> >>>> > > > > >> be a
> >>>> > > > > >> >> >>>>> problem though...
> >>>> > > > > >> >> >>>>>
> >>>> > > > > >> >> >>>>> other than the template, not much else has been
> done
> >>>> so
> >>>> > > > far...
> >>>> > > > > >> >> finding
> >>>> > > > > >> >> >>>>> hadoop libraries and adding it to the package
> path on
> >>>> > > > > >> initialization
> >>>> > > > > >> >> >>>>> via "hadoop classpath"... adding Crunch jars and
> its
> >>>> > > > > >> non-"provided"
> >>>> > > > > >> >> >>>>> transitives to the crunchR's java part...
> >>>> > > > > >> >> >>>>>
> >>>> > > > > >> >> >>>>> No legal stuff...
> >>>> > > > > >> >> >>>>>
> >>>> > > > > >> >> >>>>> No readmes... complete stealth at this point.
> >>>> > > > > >> >> >>>>>
> >>>> > > > > >> >> >>>>> On Thu, Oct 18, 2012 at 12:35 PM, Dmitriy
> Lyubimov <
> >>>> > > > > >> >> [email protected]>
> >>>> > > > > >> >> >>>>> wrote:
> >>>> > > > > >> >> >>>>> > Ok, cool. I will try to roll project template by
> >>>> some
> >>>> > > time
> >>>> > > > > next
> >>>> > > > > >> >> week.
> >>>> > > > > >> >> >>>>> > we can start with prototyping and benchmarking
> >>>> > something
> >>>> > > > > really
> >>>> > > > > >> >> >>>>> > simple, such as parallelDo().
> >>>> > > > > >> >> >>>>> >
> >>>> > > > > >> >> >>>>> > My interim goal is to perhaps take some more or
> >>>> less
> >>>> > > simple
> >>>> > > > > >> >> algorithm
> >>>> > > > > >> >> >>>>> > from Mahout and demonstrate it can be solved
> with
> >>>> > Rcrunch
> >>>> > > > (or
> >>>> > > > > >> >> whatever
> >>>> > > > > >> >> >>>>> > name it has to be) in a comparable time
> >>>> (performance)
> >>>> > but
> >>>> > > > > with
> >>>> > > > > >> much
> >>>> > > > > >> >> >>>>> > fewer lines of code. (say one of factorization
> or
> >>>> > > > clustering
> >>>> > > > > >> >> things)
> >>>> > > > > >> >> >>>>> >
> >>>> > > > > >> >> >>>>> >
> >>>> > > > > >> >> >>>>> > On Wed, Oct 17, 2012 at 10:24 PM, Rahul <
> >>>> > > [email protected]
> >>>> > > > >
> >>>> > > > > >> wrote:
> >>>> > > > > >> >> >>>>> >> I am not much of R user but I am interested to
> >>>> see how
> >>>> > > > well
> >>>> > > > > we
> >>>> > > > > >> can
> >>>> > > > > >> >> >>>>> integrate
> >>>> > > > > >> >> >>>>> >> the two. I would be happy to help.
> >>>> > > > > >> >> >>>>> >>
> >>>> > > > > >> >> >>>>> >> regards,
> >>>> > > > > >> >> >>>>> >> Rahul
> >>>> > > > > >> >> >>>>> >>
> >>>> > > > > >> >> >>>>> >> On 18-10-2012 04:04, Josh Wills wrote:
> >>>> > > > > >> >> >>>>> >>>
> >>>> > > > > >> >> >>>>> >>> On Wed, Oct 17, 2012 at 3:07 PM, Dmitriy
> >>>> Lyubimov <
> >>>> > > > > >> >> [email protected]>
> >>>> > > > > >> >> >>>>> >>> wrote:
> >>>> > > > > >> >> >>>>> >>>>
> >>>> > > > > >> >> >>>>> >>>> Yep, ok.
> >>>> > > > > >> >> >>>>> >>>>
> >>>> > > > > >> >> >>>>> >>>> I imagine it has to be an R module so I can
> set
> >>>> up a
> >>>> > > > maven
> >>>> > > > > >> >> project
> >>>> > > > > >> >> >>>>> >>>> with java/R code tree (I have been doing
> that a
> >>>> lot
> >>>> > > > > lately).
> >>>> > > > > >> Or
> >>>> > > > > >> >> if you
> >>>> > > > > >> >> >>>>> >>>> have a template to look at, it would be
> useful i
> >>>> > guess
> >>>> > > > > too.
> >>>> > > > > >> >> >>>>> >>>
> >>>> > > > > >> >> >>>>> >>> No, please go right ahead.
> >>>> > > > > >> >> >>>>> >>>
> >>>> > > > > >> >> >>>>> >>>>
> >>>> > > > > >> >> >>>>> >>>> On Wed, Oct 17, 2012 at 3:02 PM, Josh Wills <
> >>>> > > > > >> >> [email protected]>
> >>>> > > > > >> >> >>>>> wrote:
> >>>> > > > > >> >> >>>>> >>>>>
> >>>> > > > > >> >> >>>>> >>>>> I'd like it to be separate at first, but I
> am
> >>>> happy
> >>>> > > to
> >>>> > > > > help.
> >>>> > > > > >> >> Github
> >>>> > > > > >> >> >>>>> >>>>> repo?
> >>>> > > > > >> >> >>>>> >>>>> On Oct 17, 2012 2:57 PM, "Dmitriy Lyubimov"
> <
> >>>> > > > > >> [email protected]
> >>>> > > > > >> >> >
> >>>> > > > > >> >> >>>>> wrote:
> >>>> > > > > >> >> >>>>> >>>>>
> >>>> > > > > >> >> >>>>> >>>>>> Ok maybe there's a benefit to try a
> JRI/RJava
> >>>> > > > prototype
> >>>> > > > > on
> >>>> > > > > >> >> top of
> >>>> > > > > >> >> >>>>> >>>>>> Crunch for something simple. This should
> both
> >>>> save
> >>>> > > > time
> >>>> > > > > and
> >>>> > > > > >> >> prove or
> >>>> > > > > >> >> >>>>> >>>>>> disprove if Crunch via RJava integration is
> >>>> > viable.
> >>>> > > > > >> >> >>>>> >>>>>>
> >>>> > > > > >> >> >>>>> >>>>>> On my part i can try to do it within Crunch
> >>>> > > framework
> >>>> > > > > or we
> >>>> > > > > >> >> can keep
> >>>> > > > > >> >> >>>>> >>>>>> it completely separate.
> >>>> > > > > >> >> >>>>> >>>>>>
> >>>> > > > > >> >> >>>>> >>>>>> -d
> >>>> > > > > >> >> >>>>> >>>>>>
> >>>> > > > > >> >> >>>>> >>>>>> On Wed, Oct 17, 2012 at 2:08 PM, Josh
> Wills <
> >>>> > > > > >> >> [email protected]>
> >>>> > > > > >> >> >>>>> >>>>>> wrote:
> >>>> > > > > >> >> >>>>> >>>>>>>
> >>>> > > > > >> >> >>>>> >>>>>>> I am an avid R user and would be into it--
> >>>> who
> >>>> > gave
> >>>> > > > the
> >>>> > > > > >> >> talk? Was
> >>>> > > > > >> >> >>>>> it
> >>>> > > > > >> >> >>>>> >>>>>>> Murray Stokely?
> >>>> > > > > >> >> >>>>> >>>>>>>
> >>>> > > > > >> >> >>>>> >>>>>>> On Wed, Oct 17, 2012 at 2:05 PM, Dmitriy
> >>>> > Lyubimov <
> >>>> > > > > >> >> >>>>> [email protected]>
> >>>> > > > > >> >> >>>>> >>>>>>
> >>>> > > > > >> >> >>>>> >>>>>> wrote:
> >>>> > > > > >> >> >>>>> >>>>>>>>
> >>>> > > > > >> >> >>>>> >>>>>>>> Hello,
> >>>> > > > > >> >> >>>>> >>>>>>>>
> >>>> > > > > >> >> >>>>> >>>>>>>> I was pretty excited to learn of Google's
> >>>> > > experience
> >>>> > > > > of R
> >>>> > > > > >> >> mapping
> >>>> > > > > >> >> >>>>> of
> >>>> > > > > >> >> >>>>> >>>>>>>> flume java on one of recent BARUGs. I
> think
> >>>> a
> >>>> > lot
> >>>> > > of
> >>>> > > > > >> >> applications
> >>>> > > > > >> >> >>>>> >>>>>>>> similar to what we do in Mahout could be
> >>>> > > prototyped
> >>>> > > > > using
> >>>> > > > > >> >> flume R.
> >>>> > > > > >> >> >>>>> >>>>>>>>
> >>>> > > > > >> >> >>>>> >>>>>>>> I did not quite get the details of Google
> >>>> > > > > implementation
> >>>> > > > > >> of
> >>>> > > > > >> >> R
> >>>> > > > > >> >> >>>>> >>>>>>>> mapping,
> >>>> > > > > >> >> >>>>> >>>>>>>> but i am not sure if just a direct
> mapping
> >>>> from
> >>>> > R
> >>>> > > to
> >>>> > > > > >> Crunch
> >>>> > > > > >> >> would
> >>>> > > > > >> >> >>>>> be
> >>>> > > > > >> >> >>>>> >>>>>>>> sufficient (and, for most part,
> efficient).
> >>>> > > > RJava/JRI
> >>>> > > > > and
> >>>> > > > > >> >> jni
> >>>> > > > > >> >> >>>>> seem to
> >>>> > > > > >> >> >>>>> >>>>>>>> be a pretty terrible performer to do that
> >>>> > > directly.
> >>>> > > > > >> >> >>>>> >>>>>>>>
> >>>> > > > > >> >> >>>>> >>>>>>>>
> >>>> > > > > >> >> >>>>> >>>>>>>> on top of it, I am thinknig if this
> project
> >>>> > could
> >>>> > > > > have a
> >>>> > > > > >> >> >>>>> contributed
> >>>> > > > > >> >> >>>>> >>>>>>>> adapter to Mahout's distributed matrices,
> >>>> that
> >>>> > > would
> >>>> > > > > be
> >>>> > > > > >> >> just a
> >>>> > > > > >> >> >>>>> very
> >>>> > > > > >> >> >>>>> >>>>>>>> good synergy.
> >>>> > > > > >> >> >>>>> >>>>>>>>
> >>>> > > > > >> >> >>>>> >>>>>>>> Is there anyone interested in
> >>>> > > contributing/advising
> >>>> > > > > for
> >>>> > > > > >> open
> >>>> > > > > >> >> >>>>> source
> >>>> > > > > >> >> >>>>> >>>>>>>> version of flume R support? Just gauging
> >>>> > interest,
> >>>> > > > > Crunch
> >>>> > > > > >> >> list
> >>>> > > > > >> >> >>>>> seems
> >>>> > > > > >> >> >>>>> >>>>>>>> like a natural place to poke.
> >>>> > > > > >> >> >>>>> >>>>>>>>
> >>>> > > > > >> >> >>>>> >>>>>>>> Thanks .
> >>>> > > > > >> >> >>>>> >>>>>>>>
> >>>> > > > > >> >> >>>>> >>>>>>>> -Dmitriy
> >>>> > > > > >> >> >>>>> >>>>>>>
> >>>> > > > > >> >> >>>>> >>>>>>>
> >>>> > > > > >> >> >>>>> >>>>>>>
> >>>> > > > > >> >> >>>>> >>>>>>> --
> >>>> > > > > >> >> >>>>> >>>>>>> Director of Data Science
> >>>> > > > > >> >> >>>>> >>>>>>> Cloudera
> >>>> > > > > >> >> >>>>> >>>>>>> Twitter: @josh_wills
> >>>> > > > > >> >> >>>>> >>>
> >>>> > > > > >> >> >>>>> >>>
> >>>> > > > > >> >> >>>>> >>>
> >>>> > > > > >> >> >>>>> >>
> >>>> > > > > >> >> >>>>>
> >>>> > > > > >> >> >>>>
> >>>> > > > > >> >> >>>>
> >>>> > > > > >> >> >>>>
> >>>> > > > > >> >> >>>> --
> >>>> > > > > >> >> >>>> Director of Data Science
> >>>> > > > > >> >> >>>> Cloudera <http://www.cloudera.com>
> >>>> > > > > >> >> >>>> Twitter: @josh_wills <
> http://twitter.com/josh_wills>
> >>>> > > > > >> >>
> >>>> > > > > >> >
> >>>> > > > > >> >
> >>>> > > > > >> >
> >>>> > > > > >> > --
> >>>> > > > > >> > Director of Data Science
> >>>> > > > > >> > Cloudera <http://www.cloudera.com>
> >>>> > > > > >> > Twitter: @josh_wills <http://twitter.com/josh_wills>
> >>>> > > > > >>
> >>>> > > > > >
> >>>> > > > > >
> >>>> > > > > >
> >>>> > > > > > --
> >>>> > > > > > Director of Data Science
> >>>> > > > > > Cloudera <http://www.cloudera.com>
> >>>> > > > > > Twitter: @josh_wills <http://twitter.com/josh_wills>
> >>>> > > > >
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Director of Data Science
> >>>> Cloudera <http://www.cloudera.com>
> >>>> Twitter: @josh_wills <http://twitter.com/josh_wills>
> >>>>
> >>>
> >>>
> >>
>
>
>
> --
> Director of Data Science
> Cloudera
> Twitter: @josh_wills
>

Re: Flume R -- any interest?

Reply via email to