Mridul Can you be more clear .. I didn't get you !
On Fri, Jan 8, 2010 at 6:13 PM, Mridul Muralidharan <[email protected]>wrote: > > > If you just want to scan both tables for your mapper, assuming there is no > easier way to do it - cant you not write a composite input format which > delegates to both tables input formats ? > > > Regards, > Mridul > > > john smith wrote: > >> Stack, >> >> The requirement is that I need to I need to scan two tables A,B for an MR >> job ,Order is not important . That is , the reduce phase contains both >> keys >> from both A,B. >> >> Presently what iam doing is that I am using TableMap for "A" .. And in one >> of the mappers , I am reading the entire B using a scanner. But this is a >> big overhead right ! Because non-local B data will we transferred (over >> network) to the machine executing that Map phase . Instead what >> I was thinking is that , there is some kind of variant of TableMap which >> scans for both A,B and emit the corresponding keys . Order is not at all >> important and also no random lookups . I need the entire B table keys in >> some way or the other with least overhead ! >> >> Also therz one more solution I was thinking .. Suppose Iam scanning some >> particular region using table map . I can get that particular region names >> using some func in the API , then I can build a scanner on B over that >> particular region and emit all the keys from B . This doesn't require and >> network transfer of data . Is this solution feasible ?? If yes any hints >> on >> what classes to use from API ? >> >> Thanks , >> J-S >> >> On Fri, Jan 8, 2010 at 10:46 AM, stack <[email protected]> wrote: >> >> This is a little tough. Do both tables have same number of regions? Are >>> you walking through the two tables serially in your mapreduce or do you >>> want >>> to do random lookups into the second table dependent on the row you are >>> currently processing in table one? >>> >>> St.Ack >>> >>> >>> On Thu, Jan 7, 2010 at 7:51 PM, john smith <[email protected]> >>> wrote: >>> >>> Hi all, >>>> >>>> My requirement is that , I must read two tables (belonging to the same >>>> region server) in the same Map . >>>> >>>> Normally TableMap supports only 1 table at a time and right now I am >>>> reading >>>> the entire 2nd table in any one >>>> of the maps , This is a big overhead . So can any one suggest some >>>> modification of TableMap or a different >>>> approach which can read 2 tables simultaneously at the same time . This >>>> >>> can >>> >>>> be very useful to us! >>>> >>>> Thanks >>>> J-S >>>> >>>> >
