If you just want to scan both tables for your mapper, assuming there is no easier way to do it - cant you not write a composite input format which delegates to both tables input formats ?


Regards,
Mridul

john smith wrote:
Stack,

The requirement is that I need to I need to scan two tables A,B for  an MR
job ,Order is not important . That is , the reduce phase  contains both keys
from both A,B.

Presently what iam doing is that I am using TableMap for "A" .. And in one
of the mappers , I am reading the entire B using a scanner. But this is a
big overhead right ! Because non-local  B data will we transferred (over
network) to the machine executing that Map phase . Instead what
I was thinking is that , there is some kind of variant of TableMap which
scans for both A,B and emit the corresponding keys . Order is not at all
important  and also no random lookups . I need the entire B table keys in
some way or the other with least overhead !

Also therz one more solution I was thinking ..  Suppose Iam scanning some
particular region using table map . I can get that particular region names
using some func in the API , then I can build a scanner on B over that
particular region and emit all the keys from B . This doesn't require and
network transfer of data . Is this solution feasible ?? If yes any hints on
what classes to use from API ?

Thanks ,
J-S

On Fri, Jan 8, 2010 at 10:46 AM, stack <[email protected]> wrote:

This is a little tough.  Do both tables have same number of regions?  Are
you walking through the two tables serially in your mapreduce or do you
want
to do random lookups into the second table dependent on the row you are
currently processing in table one?

St.Ack


On Thu, Jan 7, 2010 at 7:51 PM, john smith <[email protected]> wrote:

Hi all,

My requirement is that , I must read two tables (belonging to the same
region server) in the same Map .

Normally TableMap supports only 1 table at a time and right now I am
reading
the entire 2nd table in any one
of the maps , This is a big overhead . So can any one suggest some
modification of TableMap or a different
approach which can read 2 tables simultaneously at the same time . This
can
be very useful to us!

Thanks
J-S


Reply via email to