I/O is on the DFS. In the case of HBase, this is not taken into account. Instead we just start scanners for each map on each region.
J-D On Fri, Sep 19, 2008 at 1:22 PM, Ding, Hui <[EMAIL PROTECTED]> wrote: > Thanks for this suggestion on the shell, I will take a look into that. > But I still don't understand why streaming won't work very well, it is > able > To do m/r jobs using the supplied exec right? So all the map/reduce > programs take input/output from their own local filesystem or from the > hdfs? > > -----Original Message----- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of > Jean-Daniel Cryans > Sent: Thursday, September 18, 2008 6:30 PM > To: [email protected] > Subject: [LIKELY JUNK]Re: Running map/reduce written in Ruby on Hbase > > Hui Ding, > > This wouldn't work very well. Streaming is defined so that you pass > programs > (any) that can take in input and an output in the filesystem, not HBase > tables. You should instead try to use JRuby like we do for the shell. It > requires some more setup, but since it all runs inside the JVM it > eventually > works. > > I see that more and more users are interested in using JRuby/Jython for > MR > jobs and I know that some companies already uses a wrapper for that > ("Happy" > anyone?). I'm sure many would be insterested in seeing this kind of > work. > > J-D > > On Thu, Sep 18, 2008 at 7:57 PM, Ding, Hui <[EMAIL PROTECTED]> wrote: > > > Hi all, > > > > I wanted to run some map/reduce job but I'd like to do that in Ruby, > is > > this possible with Hadoop Streaming? > > My understanding is that I will provide mapper/reducer in Ruby and > > supply that to Hadoop Streamining, and since hbase can be a > source/sink > > of map/reduce, I should be able to access the tables, right? > > > > And as far as setup is concered, I just need to have a ruby > interpreter > > set up on each of the machine in the cluster? > > > > Thanks a lot! > > >
