How do I hook into CrunchTaskContext to do a task cleanup (as opposed to a DoFn etc.) ?
On Fri, Nov 16, 2012 at 2:52 PM, Dmitriy Lyubimov <[email protected]> wrote: > no it is fully distributed testing. > > It is ok, StatEt handles log4j logging for me so i see the logs. I was > wondering if any end-to-end diagnostics is already embedded in Crunch but > reporting backend errors to front end is notoriously hard (and sometimes, > impossible) with hadoop, so I assume it doesn't make sense to report > client-only stuff thru exception while the other stuff still requires > checking isSucceeded(). > > > > On Fri, Nov 16, 2012 at 11:07 AM, Josh Wills <[email protected]> wrote: > >> Are you running this using LocalJobRunner? Does calling >> Pipeline.enableDebug() before run() help? If it doesn't, it'll help >> settle a debate I'm having w/Matthias. ;-) >> >> On Fri, Nov 16, 2012 at 10:22 AM, Dmitriy Lyubimov <[email protected]> >> wrote: >> > I see the error in the logs but Pipeline.run() has never thrown >> anything. >> > isSucceeded() subsequently returns false. Is there any way to extract >> > client-side problem rather than just being able to state that job >> failed? >> > or it is ok and the only diagnostics by design? >> > >> > ============ >> > 68124 [Thread-8] INFO org.apache.crunch.impl.mr.exec.CrunchJob - >> > org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path >> > does not exist: hdfs://localhost:11010/crunchr-example/input >> > at >> > >> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:231) >> > at >> > >> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:248) >> > at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:944) >> > at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:961) >> > at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170) >> > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:880) >> > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833) >> > at java.security.AccessController.doPrivileged(Native Method) >> > at javax.security.auth.Subject.doAs(Subject.java:396) >> > at >> > >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157) >> > at >> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833) >> > at org.apache.hadoop.mapreduce.Job.submit(Job.java:476) >> > at >> > >> org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob.submit(CrunchControlledJob.java:331) >> > at org.apache.crunch.impl.mr.exec.CrunchJob.submit(CrunchJob.java:135) >> > at >> > >> org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchJobControl.startReadyJobs(CrunchJobControl.java:251) >> > at >> > >> org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchJobControl.run(CrunchJobControl.java:279) >> > at java.lang.Thread.run(Thread.java:662) >> > >> > >> > On Mon, Nov 12, 2012 at 5:41 PM, Dmitriy Lyubimov <[email protected]> >> wrote: >> > >> >> for hadoop nodes i guess yet another option to soft-link the .so into >> >> hadoop's native lib folder >> >> >> >> >> >> On Mon, Nov 12, 2012 at 5:37 PM, Dmitriy Lyubimov <[email protected] >> >wrote: >> >> >> >>> I actually want to defer this to hadoop admins, we just need to >> create a >> >>> procedure for setting up nodes. Ideally as simple as possible. >> something >> >>> like >> >>> >> >>> 1) setup R >> >>> 2) install.packages("rJava","RProtoBuf","crunchR") >> >>> 3) R CMD javareconf >> >>> 3) add result of R --vanilla <<< 'system.file("jri", package="rJava") >> to >> >>> either mapred command lines or LD_LIBRARY_PATH... >> >>> >> >>> but it will depend on their versions of hadoop, jre etc. I hoped >> crunch >> >>> might have something to hide a lot of that complexity (since it is >> about >> >>> hiding complexities, for the most part :) ) besides hadoop has a way >> to >> >>> ship .so's to the backend so if crunch had an api to do something >> similar >> >>> it is conceivable that driver might yank and ship it too to hide that >> >>> complexity as well. But then there's a host of issues how to handle >> >>> potentially different rJava versions installed on different nodes... >> So, it >> >>> increasingly looks like something we might want to defer to sysops to >> do >> >>> with approximate set of requirements . >> >>> >> >>> >> >>> On Mon, Nov 12, 2012 at 5:29 PM, Josh Wills <[email protected]> >> wrote: >> >>> >> >>>> On Mon, Nov 12, 2012 at 5:17 PM, Dmitriy Lyubimov <[email protected] >> > >> >>>> wrote: >> >>>> >> >>>> > so java tasks need to be able to load libjri.so from >> >>>> > whatever system.file("jri", package="rJava") says. >> >>>> > >> >>>> > Traditionally, these issues were handled with -Djava.library.path. >> >>>> > Apparently there's nothing java task can do to enable loadLibrary() >> >>>> command >> >>>> > to see the damn library once started. But -Djava.library.path >> requires >> >>>> for >> >>>> > nodes to configure and lock jvm command line from modifications of >> the >> >>>> > client. which is fine. >> >>>> > >> >>>> > I also discovered that LD_LIBRARY_PATH actually works with jre 1.6 >> >>>> (again). >> >>>> > >> >>>> > but... any other suggestions about best practice configuring >> crunch to >> >>>> run >> >>>> > user's .so's? >> >>>> > >> >>>> >> >>>> Not off the top of my head. I suspect that whatever you come up with >> will >> >>>> become the "best practice." :) >> >>>> >> >>>> > >> >>>> > thanks. >> >>>> > >> >>>> > >> >>>> > >> >>>> > >> >>>> > >> >>>> > >> >>>> > On Sun, Nov 11, 2012 at 1:41 PM, Josh Wills <[email protected]> >> >>>> wrote: >> >>>> > >> >>>> > > I believe that is a safe assumption, at least right now. >> >>>> > > >> >>>> > > >> >>>> > > On Sun, Nov 11, 2012 at 1:38 PM, Dmitriy Lyubimov < >> [email protected] >> >>>> > >> >>>> > > wrote: >> >>>> > > >> >>>> > > > Question. >> >>>> > > > >> >>>> > > > So in Crunch api, initialize() doesn't get an emitter. and the >> >>>> process >> >>>> > > gets >> >>>> > > > emitter every time. >> >>>> > > > >> >>>> > > > However, my guess any single reincranation of a DoFn object in >> the >> >>>> > > backend >> >>>> > > > will always be getting the same emitter thru its lifecycle. Is >> it >> >>>> an >> >>>> > > > admissible assumption or there's currently a counter example to >> >>>> that? >> >>>> > > > >> >>>> > > > The problem is that as i implement the two way pipeline of >> input >> >>>> and >> >>>> > > > emitter data between R and Java, I am bulking these calls >> together >> >>>> for >> >>>> > > > performance reasons. Each individual datum in these chunks of >> data >> >>>> will >> >>>> > > not >> >>>> > > > have attached emitter function information to them in any way. >> >>>> (well it >> >>>> > > > could but it would be a performance killer and i bet emitter >> never >> >>>> > > > changes). >> >>>> > > > >> >>>> > > > So, thoughts? can i assume emitter never changes between first >> and >> >>>> lass >> >>>> > > > call to DoFn instance? >> >>>> > > > >> >>>> > > > thanks. >> >>>> > > > >> >>>> > > > >> >>>> > > > On Mon, Oct 29, 2012 at 6:32 PM, Dmitriy Lyubimov < >> >>>> [email protected]> >> >>>> > > > wrote: >> >>>> > > > >> >>>> > > > > yes... >> >>>> > > > > >> >>>> > > > > i think it worked for me before, although just adding all >> jars >> >>>> from R >> >>>> > > > > package distribution would be a little bit more appropriate >> >>>> approach >> >>>> > > > > -- but it creates a problem with jars in dependent R >> packages. I >> >>>> > think >> >>>> > > > > it would be much easier to just compile a hadoop-job file and >> >>>> stick >> >>>> > it >> >>>> > > > > in rather than doing cherry-picking of individual jars from >> who >> >>>> knows >> >>>> > > > > how many locations. >> >>>> > > > > >> >>>> > > > > i think i used the hadoop job format with distributed cache >> >>>> before >> >>>> > and >> >>>> > > > > it worked... at least with Pig "register jar" functionality. >> >>>> > > > > >> >>>> > > > > ok i guess i will just try if it works. >> >>>> > > > > >> >>>> > > > > On Mon, Oct 29, 2012 at 6:24 PM, Josh Wills < >> [email protected] >> >>>> > >> >>>> > > wrote: >> >>>> > > > > > On Mon, Oct 29, 2012 at 5:46 PM, Dmitriy Lyubimov < >> >>>> > [email protected] >> >>>> > > > >> >>>> > > > > wrote: >> >>>> > > > > > >> >>>> > > > > >> Great! so it is in Crunch. >> >>>> > > > > >> >> >>>> > > > > >> does it support hadoop-job jar format or only pure java >> jars? >> >>>> > > > > >> >> >>>> > > > > > >> >>>> > > > > > I think just pure jars-- you're referring to hadoop-job >> format >> >>>> as >> >>>> > > > having >> >>>> > > > > > all the dependencies in a lib/ directory within the jar? >> >>>> > > > > > >> >>>> > > > > > >> >>>> > > > > >> >> >>>> > > > > >> On Mon, Oct 29, 2012 at 5:10 PM, Josh Wills < >> >>>> [email protected]> >> >>>> > > > > wrote: >> >>>> > > > > >> > On Mon, Oct 29, 2012 at 5:04 PM, Dmitriy Lyubimov < >> >>>> > > > [email protected]> >> >>>> > > > > >> wrote: >> >>>> > > > > >> > >> >>>> > > > > >> >> I think i need functionality to add more jars (or >> external >> >>>> > > > > hadoop-jar) >> >>>> > > > > >> >> to drive that from an R package. Just setting job jar >> by >> >>>> class >> >>>> > is >> >>>> > > > not >> >>>> > > > > >> >> enough. I can push overall job-jar as an addiitonal >> jar to >> >>>> R >> >>>> > > > package; >> >>>> > > > > >> >> however, i cannot really run hadoop command line on >> it, i >> >>>> need >> >>>> > to >> >>>> > > > set >> >>>> > > > > >> >> up classpath thru RJava. >> >>>> > > > > >> >> >> >>>> > > > > >> >> Traditional single hadoop job jar will unlikely work >> here >> >>>> since >> >>>> > > we >> >>>> > > > > >> >> cannot hardcode pipelines in java code but rather have >> to >> >>>> > > construct >> >>>> > > > > >> >> them on the fly. (well, we could serialize pipeline >> >>>> definitions >> >>>> > > > from >> >>>> > > > > R >> >>>> > > > > >> >> and then replay them in a driver -- but that's too >> >>>> cumbersome >> >>>> > and >> >>>> > > > > more >> >>>> > > > > >> >> work than it has to be.) There's no reason why i >> shouldn't >> >>>> be >> >>>> > > able >> >>>> > > > to >> >>>> > > > > >> >> do pig-like "register jar" or "setJobJar" (mahout-like) >> >>>> when >> >>>> > > > kicking >> >>>> > > > > >> >> off a pipeline. >> >>>> > > > > >> >> >> >>>> > > > > >> > >> >>>> > > > > >> > o.a.c.util.DistCache.addJarToDistributedCache? >> >>>> > > > > >> > >> >>>> > > > > >> > >> >>>> > > > > >> >> >> >>>> > > > > >> >> >> >>>> > > > > >> >> On Mon, Oct 29, 2012 at 10:17 AM, Dmitriy Lyubimov < >> >>>> > > > > [email protected]> >> >>>> > > > > >> >> wrote: >> >>>> > > > > >> >> > Ok, sounds very promising... >> >>>> > > > > >> >> > >> >>>> > > > > >> >> > i'll try to start digging on the driver part this >> week >> >>>> then >> >>>> > > > > (Pipeline >> >>>> > > > > >> >> > wrapper in R5). >> >>>> > > > > >> >> > >> >>>> > > > > >> >> > On Sun, Oct 28, 2012 at 11:56 AM, Josh Wills < >> >>>> > > > [email protected] >> >>>> > > > > > >> >>>> > > > > >> >> wrote: >> >>>> > > > > >> >> >> On Fri, Oct 26, 2012 at 2:40 PM, Dmitriy Lyubimov < >> >>>> > > > > [email protected] >> >>>> > > > > >> > >> >>>> > > > > >> >> wrote: >> >>>> > > > > >> >> >>> Ok, cool. >> >>>> > > > > >> >> >>> >> >>>> > > > > >> >> >>> So what state is Crunch in? I take it is in a >> fairly >> >>>> > advanced >> >>>> > > > > state. >> >>>> > > > > >> >> >>> So every api mentioned in the FlumeJava paper is >> >>>> working , >> >>>> > > > > right? >> >>>> > > > > >> Or >> >>>> > > > > >> >> >>> there's something that is not working specifically? >> >>>> > > > > >> >> >> >> >>>> > > > > >> >> >> I think the only thing in the paper that we don't >> have >> >>>> in a >> >>>> > > > > working >> >>>> > > > > >> >> >> state is MSCR fusion. It's mostly just a question of >> >>>> > > > prioritizing >> >>>> > > > > it >> >>>> > > > > >> >> >> and getting the work done. >> >>>> > > > > >> >> >> >> >>>> > > > > >> >> >>> >> >>>> > > > > >> >> >>> On Fri, Oct 26, 2012 at 2:31 PM, Josh Wills < >> >>>> > > > [email protected] >> >>>> > > > > > >> >>>> > > > > >> >> wrote: >> >>>> > > > > >> >> >>>> Hey Dmitriy, >> >>>> > > > > >> >> >>>> >> >>>> > > > > >> >> >>>> Got a fork going and looking forward to playing >> with >> >>>> > crunchR >> >>>> > > > > this >> >>>> > > > > >> >> weekend-- >> >>>> > > > > >> >> >>>> thanks! >> >>>> > > > > >> >> >>>> >> >>>> > > > > >> >> >>>> J >> >>>> > > > > >> >> >>>> >> >>>> > > > > >> >> >>>> On Wed, Oct 24, 2012 at 1:28 PM, Dmitriy Lyubimov >> < >> >>>> > > > > >> [email protected]> >> >>>> > > > > >> >> wrote: >> >>>> > > > > >> >> >>>> >> >>>> > > > > >> >> >>>>> Project template >> >>>> https://github.com/dlyubimov/crunchR >> >>>> > > > > >> >> >>>>> >> >>>> > > > > >> >> >>>>> Default profile does not compile R artifact . R >> >>>> profile >> >>>> > > > > compiles R >> >>>> > > > > >> >> >>>>> artifact. for convenience, it is enabled by >> >>>> supplying -DR >> >>>> > > to >> >>>> > > > > mvn >> >>>> > > > > >> >> >>>>> command line, e.g. >> >>>> > > > > >> >> >>>>> >> >>>> > > > > >> >> >>>>> mvn install -DR >> >>>> > > > > >> >> >>>>> >> >>>> > > > > >> >> >>>>> there's also a helper that installs the snapshot >> >>>> version >> >>>> > of >> >>>> > > > the >> >>>> > > > > >> >> >>>>> package in the crunchR module. >> >>>> > > > > >> >> >>>>> >> >>>> > > > > >> >> >>>>> There's RJava and JRI java dependencies which i >> did >> >>>> not >> >>>> > > find >> >>>> > > > > >> anywhere >> >>>> > > > > >> >> >>>>> in public maven repos; so it is installed into my >> >>>> github >> >>>> > > > maven >> >>>> > > > > >> repo >> >>>> > > > > >> >> so >> >>>> > > > > >> >> >>>>> far. Should compile for 3rd party. >> >>>> > > > > >> >> >>>>> >> >>>> > > > > >> >> >>>>> -DR compilation requires R, RJava and optionally, >> >>>> > > RProtoBuf. >> >>>> > > > R >> >>>> > > > > Doc >> >>>> > > > > >> >> >>>>> compilation requires roxygen2 (i think). >> >>>> > > > > >> >> >>>>> >> >>>> > > > > >> >> >>>>> For some reason RProtoBuf fails to import into >> >>>> another >> >>>> > > > package, >> >>>> > > > > >> got a >> >>>> > > > > >> >> >>>>> weird exception when i put @import RProtoBuf into >> >>>> > crunchR, >> >>>> > > so >> >>>> > > > > >> >> >>>>> RProtoBuf is now in "Suggests" category. Down the >> >>>> road >> >>>> > that >> >>>> > > > may >> >>>> > > > > >> be a >> >>>> > > > > >> >> >>>>> problem though... >> >>>> > > > > >> >> >>>>> >> >>>> > > > > >> >> >>>>> other than the template, not much else has been >> done >> >>>> so >> >>>> > > > far... >> >>>> > > > > >> >> finding >> >>>> > > > > >> >> >>>>> hadoop libraries and adding it to the package >> path on >> >>>> > > > > >> initialization >> >>>> > > > > >> >> >>>>> via "hadoop classpath"... adding Crunch jars and >> its >> >>>> > > > > >> non-"provided" >> >>>> > > > > >> >> >>>>> transitives to the crunchR's java part... >> >>>> > > > > >> >> >>>>> >> >>>> > > > > >> >> >>>>> No legal stuff... >> >>>> > > > > >> >> >>>>> >> >>>> > > > > >> >> >>>>> No readmes... complete stealth at this point. >> >>>> > > > > >> >> >>>>> >> >>>> > > > > >> >> >>>>> On Thu, Oct 18, 2012 at 12:35 PM, Dmitriy >> Lyubimov < >> >>>> > > > > >> >> [email protected]> >> >>>> > > > > >> >> >>>>> wrote: >> >>>> > > > > >> >> >>>>> > Ok, cool. I will try to roll project template >> by >> >>>> some >> >>>> > > time >> >>>> > > > > next >> >>>> > > > > >> >> week. >> >>>> > > > > >> >> >>>>> > we can start with prototyping and benchmarking >> >>>> > something >> >>>> > > > > really >> >>>> > > > > >> >> >>>>> > simple, such as parallelDo(). >> >>>> > > > > >> >> >>>>> > >> >>>> > > > > >> >> >>>>> > My interim goal is to perhaps take some more or >> >>>> less >> >>>> > > simple >> >>>> > > > > >> >> algorithm >> >>>> > > > > >> >> >>>>> > from Mahout and demonstrate it can be solved >> with >> >>>> > Rcrunch >> >>>> > > > (or >> >>>> > > > > >> >> whatever >> >>>> > > > > >> >> >>>>> > name it has to be) in a comparable time >> >>>> (performance) >> >>>> > but >> >>>> > > > > with >> >>>> > > > > >> much >> >>>> > > > > >> >> >>>>> > fewer lines of code. (say one of factorization >> or >> >>>> > > > clustering >> >>>> > > > > >> >> things) >> >>>> > > > > >> >> >>>>> > >> >>>> > > > > >> >> >>>>> > >> >>>> > > > > >> >> >>>>> > On Wed, Oct 17, 2012 at 10:24 PM, Rahul < >> >>>> > > [email protected] >> >>>> > > > > >> >>>> > > > > >> wrote: >> >>>> > > > > >> >> >>>>> >> I am not much of R user but I am interested to >> >>>> see how >> >>>> > > > well >> >>>> > > > > we >> >>>> > > > > >> can >> >>>> > > > > >> >> >>>>> integrate >> >>>> > > > > >> >> >>>>> >> the two. I would be happy to help. >> >>>> > > > > >> >> >>>>> >> >> >>>> > > > > >> >> >>>>> >> regards, >> >>>> > > > > >> >> >>>>> >> Rahul >> >>>> > > > > >> >> >>>>> >> >> >>>> > > > > >> >> >>>>> >> On 18-10-2012 04:04, Josh Wills wrote: >> >>>> > > > > >> >> >>>>> >>> >> >>>> > > > > >> >> >>>>> >>> On Wed, Oct 17, 2012 at 3:07 PM, Dmitriy >> >>>> Lyubimov < >> >>>> > > > > >> >> [email protected]> >> >>>> > > > > >> >> >>>>> >>> wrote: >> >>>> > > > > >> >> >>>>> >>>> >> >>>> > > > > >> >> >>>>> >>>> Yep, ok. >> >>>> > > > > >> >> >>>>> >>>> >> >>>> > > > > >> >> >>>>> >>>> I imagine it has to be an R module so I can >> set >> >>>> up a >> >>>> > > > maven >> >>>> > > > > >> >> project >> >>>> > > > > >> >> >>>>> >>>> with java/R code tree (I have been doing >> that a >> >>>> lot >> >>>> > > > > lately). >> >>>> > > > > >> Or >> >>>> > > > > >> >> if you >> >>>> > > > > >> >> >>>>> >>>> have a template to look at, it would be >> useful i >> >>>> > guess >> >>>> > > > > too. >> >>>> > > > > >> >> >>>>> >>> >> >>>> > > > > >> >> >>>>> >>> No, please go right ahead. >> >>>> > > > > >> >> >>>>> >>> >> >>>> > > > > >> >> >>>>> >>>> >> >>>> > > > > >> >> >>>>> >>>> On Wed, Oct 17, 2012 at 3:02 PM, Josh Wills >> < >> >>>> > > > > >> >> [email protected]> >> >>>> > > > > >> >> >>>>> wrote: >> >>>> > > > > >> >> >>>>> >>>>> >> >>>> > > > > >> >> >>>>> >>>>> I'd like it to be separate at first, but I >> am >> >>>> happy >> >>>> > > to >> >>>> > > > > help. >> >>>> > > > > >> >> Github >> >>>> > > > > >> >> >>>>> >>>>> repo? >> >>>> > > > > >> >> >>>>> >>>>> On Oct 17, 2012 2:57 PM, "Dmitriy >> Lyubimov" < >> >>>> > > > > >> [email protected] >> >>>> > > > > >> >> > >> >>>> > > > > >> >> >>>>> wrote: >> >>>> > > > > >> >> >>>>> >>>>> >> >>>> > > > > >> >> >>>>> >>>>>> Ok maybe there's a benefit to try a >> JRI/RJava >> >>>> > > > prototype >> >>>> > > > > on >> >>>> > > > > >> >> top of >> >>>> > > > > >> >> >>>>> >>>>>> Crunch for something simple. This should >> both >> >>>> save >> >>>> > > > time >> >>>> > > > > and >> >>>> > > > > >> >> prove or >> >>>> > > > > >> >> >>>>> >>>>>> disprove if Crunch via RJava integration >> is >> >>>> > viable. >> >>>> > > > > >> >> >>>>> >>>>>> >> >>>> > > > > >> >> >>>>> >>>>>> On my part i can try to do it within >> Crunch >> >>>> > > framework >> >>>> > > > > or we >> >>>> > > > > >> >> can keep >> >>>> > > > > >> >> >>>>> >>>>>> it completely separate. >> >>>> > > > > >> >> >>>>> >>>>>> >> >>>> > > > > >> >> >>>>> >>>>>> -d >> >>>> > > > > >> >> >>>>> >>>>>> >> >>>> > > > > >> >> >>>>> >>>>>> On Wed, Oct 17, 2012 at 2:08 PM, Josh >> Wills < >> >>>> > > > > >> >> [email protected]> >> >>>> > > > > >> >> >>>>> >>>>>> wrote: >> >>>> > > > > >> >> >>>>> >>>>>>> >> >>>> > > > > >> >> >>>>> >>>>>>> I am an avid R user and would be into >> it-- >> >>>> who >> >>>> > gave >> >>>> > > > the >> >>>> > > > > >> >> talk? Was >> >>>> > > > > >> >> >>>>> it >> >>>> > > > > >> >> >>>>> >>>>>>> Murray Stokely? >> >>>> > > > > >> >> >>>>> >>>>>>> >> >>>> > > > > >> >> >>>>> >>>>>>> On Wed, Oct 17, 2012 at 2:05 PM, Dmitriy >> >>>> > Lyubimov < >> >>>> > > > > >> >> >>>>> [email protected]> >> >>>> > > > > >> >> >>>>> >>>>>> >> >>>> > > > > >> >> >>>>> >>>>>> wrote: >> >>>> > > > > >> >> >>>>> >>>>>>>> >> >>>> > > > > >> >> >>>>> >>>>>>>> Hello, >> >>>> > > > > >> >> >>>>> >>>>>>>> >> >>>> > > > > >> >> >>>>> >>>>>>>> I was pretty excited to learn of >> Google's >> >>>> > > experience >> >>>> > > > > of R >> >>>> > > > > >> >> mapping >> >>>> > > > > >> >> >>>>> of >> >>>> > > > > >> >> >>>>> >>>>>>>> flume java on one of recent BARUGs. I >> think >> >>>> a >> >>>> > lot >> >>>> > > of >> >>>> > > > > >> >> applications >> >>>> > > > > >> >> >>>>> >>>>>>>> similar to what we do in Mahout could be >> >>>> > > prototyped >> >>>> > > > > using >> >>>> > > > > >> >> flume R. >> >>>> > > > > >> >> >>>>> >>>>>>>> >> >>>> > > > > >> >> >>>>> >>>>>>>> I did not quite get the details of >> Google >> >>>> > > > > implementation >> >>>> > > > > >> of >> >>>> > > > > >> >> R >> >>>> > > > > >> >> >>>>> >>>>>>>> mapping, >> >>>> > > > > >> >> >>>>> >>>>>>>> but i am not sure if just a direct >> mapping >> >>>> from >> >>>> > R >> >>>> > > to >> >>>> > > > > >> Crunch >> >>>> > > > > >> >> would >> >>>> > > > > >> >> >>>>> be >> >>>> > > > > >> >> >>>>> >>>>>>>> sufficient (and, for most part, >> efficient). >> >>>> > > > RJava/JRI >> >>>> > > > > and >> >>>> > > > > >> >> jni >> >>>> > > > > >> >> >>>>> seem to >> >>>> > > > > >> >> >>>>> >>>>>>>> be a pretty terrible performer to do >> that >> >>>> > > directly. >> >>>> > > > > >> >> >>>>> >>>>>>>> >> >>>> > > > > >> >> >>>>> >>>>>>>> >> >>>> > > > > >> >> >>>>> >>>>>>>> on top of it, I am thinknig if this >> project >> >>>> > could >> >>>> > > > > have a >> >>>> > > > > >> >> >>>>> contributed >> >>>> > > > > >> >> >>>>> >>>>>>>> adapter to Mahout's distributed >> matrices, >> >>>> that >> >>>> > > would >> >>>> > > > > be >> >>>> > > > > >> >> just a >> >>>> > > > > >> >> >>>>> very >> >>>> > > > > >> >> >>>>> >>>>>>>> good synergy. >> >>>> > > > > >> >> >>>>> >>>>>>>> >> >>>> > > > > >> >> >>>>> >>>>>>>> Is there anyone interested in >> >>>> > > contributing/advising >> >>>> > > > > for >> >>>> > > > > >> open >> >>>> > > > > >> >> >>>>> source >> >>>> > > > > >> >> >>>>> >>>>>>>> version of flume R support? Just gauging >> >>>> > interest, >> >>>> > > > > Crunch >> >>>> > > > > >> >> list >> >>>> > > > > >> >> >>>>> seems >> >>>> > > > > >> >> >>>>> >>>>>>>> like a natural place to poke. >> >>>> > > > > >> >> >>>>> >>>>>>>> >> >>>> > > > > >> >> >>>>> >>>>>>>> Thanks . >> >>>> > > > > >> >> >>>>> >>>>>>>> >> >>>> > > > > >> >> >>>>> >>>>>>>> -Dmitriy >> >>>> > > > > >> >> >>>>> >>>>>>> >> >>>> > > > > >> >> >>>>> >>>>>>> >> >>>> > > > > >> >> >>>>> >>>>>>> >> >>>> > > > > >> >> >>>>> >>>>>>> -- >> >>>> > > > > >> >> >>>>> >>>>>>> Director of Data Science >> >>>> > > > > >> >> >>>>> >>>>>>> Cloudera >> >>>> > > > > >> >> >>>>> >>>>>>> Twitter: @josh_wills >> >>>> > > > > >> >> >>>>> >>> >> >>>> > > > > >> >> >>>>> >>> >> >>>> > > > > >> >> >>>>> >>> >> >>>> > > > > >> >> >>>>> >> >> >>>> > > > > >> >> >>>>> >> >>>> > > > > >> >> >>>> >> >>>> > > > > >> >> >>>> >> >>>> > > > > >> >> >>>> >> >>>> > > > > >> >> >>>> -- >> >>>> > > > > >> >> >>>> Director of Data Science >> >>>> > > > > >> >> >>>> Cloudera <http://www.cloudera.com> >> >>>> > > > > >> >> >>>> Twitter: @josh_wills < >> http://twitter.com/josh_wills> >> >>>> > > > > >> >> >> >>>> > > > > >> > >> >>>> > > > > >> > >> >>>> > > > > >> > >> >>>> > > > > >> > -- >> >>>> > > > > >> > Director of Data Science >> >>>> > > > > >> > Cloudera <http://www.cloudera.com> >> >>>> > > > > >> > Twitter: @josh_wills <http://twitter.com/josh_wills> >> >>>> > > > > >> >> >>>> > > > > > >> >>>> > > > > > >> >>>> > > > > > >> >>>> > > > > > -- >> >>>> > > > > > Director of Data Science >> >>>> > > > > > Cloudera <http://www.cloudera.com> >> >>>> > > > > > Twitter: @josh_wills <http://twitter.com/josh_wills> >> >>>> > > > > >> >>>> > > > >> >>>> > > >> >>>> > >> >>>> >> >>>> >> >>>> >> >>>> -- >> >>>> Director of Data Science >> >>>> Cloudera <http://www.cloudera.com> >> >>>> Twitter: @josh_wills <http://twitter.com/josh_wills> >> >>>> >> >>> >> >>> >> >> >> >> >> >> -- >> Director of Data Science >> Cloudera >> Twitter: @josh_wills >> > >
