Re: MR in HBase

Mridul Muralidharan Fri, 08 Jan 2010 10:57:12 -0800

Hi,

This is assuming there is no easier way to do it (someone from hbaseteam can comment better !).

But the usual way to handle this for mapreduce is to create a compositeinput format : which delegates to the underlying formats to generate thesplits, and the corresponding record reader's based on the split.

I have not done this for hbase though - but looking atTableInputFormatBase, it looks possible to implement ...


Specifically for hbase, something along the lines of :

--- start dirty pseudo code ---

CustomTableInputFormat extends TableInputFormatBase and implementssetConf() to configure the table(s) required.

public class CustomTableInputFormat extendsInputFormat<ImmutableBytesWritable, Result> {


  private CustomTableInputFormat delegate1;
  private CustomTableInputFormat delegate2;

  public void setConf(){
    delegate1 = createTable1InputFormat();
    delegate2 = createTable2InputFormat();
  }

public List<InputSplit> getSplits(JobContext context) throwsIOException {

    List<InputSplit> retval = new LinkedList<InputSplit>();
    retval.addAll(delegate1.getSplits(context));
    retval.addAll(delegate1.getSplits(context));
    return retval;
  }


  public abstract
    RecordReader<K,V> createRecordReader(InputSplit split,
                                         TaskAttemptContext context
                                        ) throws IOException,
                                                 InterruptedException {
    if (split for table1) return delegate.createRecordReader();
    else if (split for table2) return delegate.createRecordReader();
    else throw exception
  }

}

--- end pseudo code ---

Regards,
Mridul

john smith wrote:

Mridul

Can you be more clear .. I didn't get you !

On Fri, Jan 8, 2010 at 6:13 PM, Mridul Muralidharan
<[email protected]>wrote:


If you just want to scan both tables for your mapper, assuming there is no
easier way to do it - cant you not write a composite input format which
delegates to both tables input formats ?


Regards,
Mridul


john smith wrote:

Stack,

The requirement is that I need to I need to scan two tables A,B for  an MR
job ,Order is not important . That is , the reduce phase  contains both
keys
from both A,B.

Presently what iam doing is that I am using TableMap for "A" .. And in one
of the mappers , I am reading the entire B using a scanner. But this is a
big overhead right ! Because non-local  B data will we transferred (over
network) to the machine executing that Map phase . Instead what
I was thinking is that , there is some kind of variant of TableMap which
scans for both A,B and emit the corresponding keys . Order is not at all
important  and also no random lookups . I need the entire B table keys in
some way or the other with least overhead !

Also therz one more solution I was thinking ..  Suppose Iam scanning some
particular region using table map . I can get that particular region names
using some func in the API , then I can build a scanner on B over that
particular region and emit all the keys from B . This doesn't require and
network transfer of data . Is this solution feasible ?? If yes any hints
on
what classes to use from API ?

Thanks ,
J-S

On Fri, Jan 8, 2010 at 10:46 AM, stack <[email protected]> wrote:

 This is a little tough.  Do both tables have same number of regions?  Are

you walking through the two tables serially in your mapreduce or do you
want
to do random lookups into the second table dependent on the row you are
currently processing in table one?

St.Ack


On Thu, Jan 7, 2010 at 7:51 PM, john smith <[email protected]>
wrote:

 Hi all,

My requirement is that , I must read two tables (belonging to the same
region server) in the same Map .

Normally TableMap supports only 1 table at a time and right now I am
reading
the entire 2nd table in any one
of the maps , This is a big overhead . So can any one suggest some
modification of TableMap or a different
approach which can read 2 tables simultaneously at the same time . This

can

be very useful to us!

Thanks
J-S

Re: MR in HBase

Reply via email to