John, Have you looked at Cascading?
http://www.cascading.org/ It sounds like you could use two HBase table backed inputs, and then make use of the filter and join type functions that Cascading provides, and then use a HBase table backed output to collect the result -- in a way that is natural for that framework. Best regards, - Andy ----- Original Message ---- > From: john smith <[email protected]> > To: [email protected] > Sent: Fri, January 8, 2010 2:00:09 AM > Subject: Re: MR in HBase > > Stack, > > The requirement is that I need to I need to scan two tables A,B for an MR > job ,Order is not important . That is , the reduce phase contains both keys > from both A,B. > > Presently what iam doing is that I am using TableMap for "A" .. And in one > of the mappers , I am reading the entire B using a scanner. But this is a > big overhead right ! Because non-local B data will we transferred (over > network) to the machine executing that Map phase . Instead what > I was thinking is that , there is some kind of variant of TableMap which > scans for both A,B and emit the corresponding keys . Order is not at all > important and also no random lookups . I need the entire B table keys in > some way or the other with least overhead ! > > Also therz one more solution I was thinking .. Suppose Iam scanning some > particular region using table map . I can get that particular region names > using some func in the API , then I can build a scanner on B over that > particular region and emit all the keys from B . This doesn't require and > network transfer of data . Is this solution feasible ?? If yes any hints on > what classes to use from API ? > > Thanks , > J-S > > On Fri, Jan 8, 2010 at 10:46 AM, stack wrote: > > > This is a little tough. Do both tables have same number of regions? Are > > you walking through the two tables serially in your mapreduce or do you > > want > > to do random lookups into the second table dependent on the row you are > > currently processing in table one? > > > > St.Ack > > > > > > On Thu, Jan 7, 2010 at 7:51 PM, john smith wrote: > > > > > Hi all, > > > > > > My requirement is that , I must read two tables (belonging to the same > > > region server) in the same Map . > > > > > > Normally TableMap supports only 1 table at a time and right now I am > > > reading > > > the entire 2nd table in any one > > > of the maps , This is a big overhead . So can any one suggest some > > > modification of TableMap or a different > > > approach which can read 2 tables simultaneously at the same time . This > > can > > > be very useful to us! > > > > > > Thanks > > > J-S > > > > >
