ah, indeed i see it there, ok i will add it to pom repos, thanks On Nov 18, 2012 11:20 AM, "Josh Wills" <[email protected]> wrote:
> On Sun, Nov 18, 2012 at 11:08 AM, Dmitriy Lyubimov <[email protected] > >wrote: > > > Question: is the Crunch 0.4.0 release available thru a maven repository? > > How have you installed it into your local repo? > > > > It should be-- I think Matthias published the Maven artifacts on Friday. Of > course, I might have just had it installed locally b/c I was testing the > release. :) > > > > > > > > On Sun, Nov 18, 2012 at 10:30 AM, Josh Wills <[email protected]> > wrote: > > > > > On Sun, Nov 18, 2012 at 10:13 AM, Dmitriy Lyubimov <[email protected]> > > > wrote: > > > > Thank you, Josh. Your insights are greatly appreciated. > > > > > > > > RProtoBuf has a bug with <<- operator. I already contacted the > authors > > > and > > > > they confirmed it however it is not clear when they are going to fix > > it. > > > > > > > > (code to reproduce: > > > >> library(RProtoBuf) > > > >> a <<- "A" > > > > causes an error) > > > > > > > > Actually RProtoBuf is not used right now. I will move it into > > > "recommended" > > > > realm again if it makes things easier. > > > > > > > > For me, the hardest part was to make jvm +hadoop to see JRI library > > > > actually. I am still not sure about the best course of action here > but > > > > there is definitely more than one way > > > > > > > > Also my apologies for code styling, it is probably the ugliest code > > i've > > > > ever written, but i will tidy it up once past the proof of concept > > stage. > > > > > > No judgements, man. You should have seen the first rev of Crunch. ;-) > > > > > > > > > > > -d > > > > > > > > > > > > On Sun, Nov 18, 2012 at 9:37 AM, Josh Wills <[email protected]> > > > wrote: > > > > > > > >> Dmitrity, > > > >> > > > >> Just sent you a pull request based on playing with the code on OS X. > > It > > > >> contains a README about my experience getting things working. > > > >> > > > >> Unfortunately, I haven't succeeded in getting crunchR loaded, I'm > > > running > > > >> into some issues w/RProtoBuf on OS X. I'll give it another go this > > week > > > on > > > >> my Linux machine at work. > > > >> > > > >> J > > > >> > > > >> > > > >> On Sat, Nov 17, 2012 at 12:49 PM, Dmitriy Lyubimov < > [email protected] > > > >> >wrote: > > > >> > > > >> > Josh, > > > >> > > > > >> > ok the following commit > > > >> > > > > >> > ============== > > > >> > commit 67605360838f810fa5ddf99abb3ef2962d3f05e3 > > > >> > Author: Dmitriy Lyubimov <[email protected]> > > > >> > Date: Sat Nov 17 12:29:27 2012 -0800 > > > >> > > > > >> > example1 succeeds > > > >> > > > > >> > ==================== > > > >> > > > > >> > runs example 1 for me successfully in a fully distributed way > which > > is > > > >> > first step (map-only thing) for the word count. > > > >> > > > > >> > (I think there's a hickup somewhere here because in the output i > > also > > > >> seem > > > >> > to see some empty lines, so the strsplit() part is perhaps set up > > > >> somewhat > > > >> > incorrectly here, but it's not the point right now): > > > >> > > > > >> > ====Example1.R=========== > > > >> > > > > >> > library(crunchR) > > > >> > > > > >> > pipeline <- crunchR.MRPipeline$new("test-pipeline") > > > >> > > > > >> > inputPCol <- pipeline$readTextFile("/crunchr-examples/input") > > > >> > > > > >> > outputPCol <- inputPCol$parallelDo( > > > >> > function(line) emit( strsplit(tolower(line),"[^[:alnum:]]")[[1]] ) > > > >> > ) > > > >> > > > > >> > outputPCol$writeTextFile("/crunchr-examples/output") > > > >> > > > > >> > result <- pipeline$run() > > > >> > > > > >> > if ( !result$succeeded() ) stop ("pipeline failed.") > > > >> > > > > >> > ======================================== > > > >> > > > > >> > I think R-java communication now should support multiple doFn ok > and > > > they > > > >> > will be properly shut down and executed and synchronized even if > > they > > > >> emit > > > >> > in the cleanup phase. > > > >> > > > > >> > This example assumes a lot of defaults (such as RTypes which are > by > > > >> default > > > >> > character vector singleton in and character vector out for a > DoFn). > > > Also > > > >> > obviously uses text in-text out at this point only. > > > >> > > > > >> > > > > >> > To run, install the package and upload the test input > (test-prep.sh) > > > >> > Assuming you have compiled the maven part, the R package snapshot > > > could > > > >> be > > > >> > installed by running "install-snapshot-rpkg.sh". > > > >> > > > > >> > You also need to make sure your backend tasks see JRI library. > there > > > are > > > >> > multiple ways to do it i guess but for the purposes of testing the > > > >> > following just works for me in my mapred-site: > > > >> > > > > >> > <property> > > > >> > <name>mapred.child.java.opts</name> > > > >> > > > > >> > > > > >> > > > > >> > > > > > > > <value>-Djava.library.path=/home/dmitriy/R/x86_64-pc-linux-gnu-library/2/rJava/jri > > > >> > </value> > > > >> > <final>false</final> > > > >> > </property> > > > >> > > > > >> > > > > >> > I think at this point you guys might help me by doing review of > that > > > >> stuff, > > > >> > asking questions and making suggestions how to go by incorporating > > > other > > > >> > types of doFn and perhaps a way to complete the word count > example, > > > >> perhaps > > > >> > running comparative benchmarks with a java-only word count, how > much > > > >> > overhead we seem to be suffering here. > > > >> > > > > >> > I use StatEt in eclipse. Although it is a huge way forward, the > > > process > > > >> is > > > >> > still extremely tedious since I don't know unit testing framework > > in R > > > >> well > > > >> > (so i just scribble some stuff on the side to unit-test this and > > that) > > > >> and > > > >> > the integration test running cycle is significant enough. > > > >> > > > > >> > Which is why any help and suggestions are very welcome! > > > >> > > > > >> > I will definitely add support for reading/writing sequence files > and > > > >> > Protobufs, as well as Mahout DRM's . > > > >> > > > > >> > > > > >> > Thanks. > > > >> > -Dmitrity > > > >> > > > > >> > > > > > > > > > > > > -- > > > Director of Data Science > > > Cloudera > > > Twitter: @josh_wills > > > > > > > > > -- > Director of Data Science > Cloudera <http://www.cloudera.com> > Twitter: @josh_wills <http://twitter.com/josh_wills> >
