John,

Have you looked at Cascading? 

   http://www.cascading.org/ 

It sounds like you could use two HBase table backed inputs, and then
make use of the filter and join type functions that Cascading provides,
and then use a HBase table backed output to collect the result -- in a
way that is natural for that framework. 

Best regards,

   - Andy




----- Original Message ----
> From: john smith <[email protected]>
> To: [email protected]
> Sent: Fri, January 8, 2010 2:00:09 AM
> Subject: Re: MR in HBase
> 
> Stack,
> 
> The requirement is that I need to I need to scan two tables A,B for  an MR
> job ,Order is not important . That is , the reduce phase  contains both keys
> from both A,B.
> 
> Presently what iam doing is that I am using TableMap for "A" .. And in one
> of the mappers , I am reading the entire B using a scanner. But this is a
> big overhead right ! Because non-local  B data will we transferred (over
> network) to the machine executing that Map phase . Instead what
> I was thinking is that , there is some kind of variant of TableMap which
> scans for both A,B and emit the corresponding keys . Order is not at all
> important  and also no random lookups . I need the entire B table keys in
> some way or the other with least overhead !
> 
> Also therz one more solution I was thinking ..  Suppose Iam scanning some
> particular region using table map . I can get that particular region names
> using some func in the API , then I can build a scanner on B over that
> particular region and emit all the keys from B . This doesn't require and
> network transfer of data . Is this solution feasible ?? If yes any hints on
> what classes to use from API ?
> 
> Thanks ,
> J-S
> 
> On Fri, Jan 8, 2010 at 10:46 AM, stack wrote:
> 
> > This is a little tough.  Do both tables have same number of regions?  Are
> > you walking through the two tables serially in your mapreduce or do you
> > want
> > to do random lookups into the second table dependent on the row you are
> > currently processing in table one?
> >
> > St.Ack
> >
> >
> > On Thu, Jan 7, 2010 at 7:51 PM, john smith wrote:
> >
> > > Hi all,
> > >
> > > My requirement is that , I must read two tables (belonging to the same
> > > region server) in the same Map .
> > >
> > > Normally TableMap supports only 1 table at a time and right now I am
> > > reading
> > > the entire 2nd table in any one
> > > of the maps , This is a big overhead . So can any one suggest some
> > > modification of TableMap or a different
> > > approach which can read 2 tables simultaneously at the same time . This
> > can
> > > be very useful to us!
> > >
> > > Thanks
> > > J-S
> > >
> >



      

Reply via email to