The behavior of TableInputFormat is to schedule one mapper for every table 
region. 

In addition to what others have said already, if your reducer is doing little 
more than storing data back into HBase (via TableOutputFormat), then you can 
consider writing results back to HBase directly from the mapper to avoid 
incurring the overhead of sort/shuffle/merge which happens within the Hadoop 
job framework as map outputs are input into reducers. For that type of use case 
-- using the Hadoop mapreduce subsystem as essentially a grid scheduler -- 
something like job.setNumReducers(0) will do the trick. 

Best regards,

   - Andy




________________________________
From: john smith <js1987.sm...@gmail.com>
To: hbase-user@hadoop.apache.org
Sent: Friday, August 21, 2009 12:42:36 AM
Subject: Doubt in HBase

Hi all ,

I have one small doubt . Kindly answer it even if it sounds silly.

Iam using Map Reduce in HBase in distributed mode .  I have a table which
spans across 5 region servers . I am using TableInputFormat to read the data
from the tables in the map . When i run the program , by default how many
map regions are created ? Is it one per region server or more ?

Also after the map task is over.. reduce task is taking a bit more time . Is
it due to moving the map output across the regionservers? i.e, moving the
values of same key to a particular reduce phase to start the reducer? Is
there any way i can optimize the code (e.g. by storing data of same reducer
nearby )

Thanks :)



      

Reply via email to