>From the web interface...
number of regions =5 number of tables = 3 Thanks On Sun, Apr 11, 2010 at 2:23 PM, Amandeep Khurana <ama...@gmail.com> wrote: > How many regions do you have? > > > Amandeep Khurana > Computer Science Graduate Student > University of California, Santa Cruz > > > On Sun, Apr 11, 2010 at 1:39 AM, john smith <js1987.sm...@gmail.com> > wrote: > > > Amandeep , > > > > Thanks for the explanation . What is the default value to the num of maps > ? > > Is it not equal to the num of regions ? > > > > Right now I am running HBase in pseudo distributed mode . If I set num of > > map tasks to 100000 (some big num).. > > > > I get numSplits=1 > > > > If I dont set any thing .. numSplits =2; > > > > > > Can you explain this. > > > > Thanks > > j.S > > > > On Sun, Apr 11, 2010 at 1:50 PM, Amandeep Khurana <ama...@gmail.com> > > wrote: > > > > > If you set the number of map tasks as a higher number than the number > of > > > regions (I generally set it to 100000 or something like that), the > number > > > of > > > splits = number of regions. If you keep it lower, then it combines > > regions > > > in a single split. > > > > > > > > > Amandeep Khurana > > > Computer Science Graduate Student > > > University of California, Santa Cruz > > > > > > > > > On Sun, Apr 11, 2010 at 1:15 AM, john smith <js1987.sm...@gmail.com> > > > wrote: > > > > > > > Amandeep, > > > > > > > > I guess that is not true ,.. See the explanation as in docs .. > > > > > > > > > > > > "Splits are created in number equal to the smallest between numSplits > > and > > > > the number of HRegion< > > > > > > > > > > http://hadoop.apache.org/hbase/docs/r0.20.3/api/org/apache/hadoop/hbase/regionserver/HRegion.html > > > > >s > > > > in the table. If the number of splits is smaller than the number of > > > > HRegion< > > > > > > > > > > http://hadoop.apache.org/hbase/docs/r0.20.3/api/org/apache/hadoop/hbase/regionserver/HRegion.html > > > > >s > > > > then splits are spanned across multiple > > > > HRegion< > > > > > > > > > > http://hadoop.apache.org/hbase/docs/r0.20.3/api/org/apache/hadoop/hbase/regionserver/HRegion.html > > > > >s > > > > and are grouped the most evenly possible. In the case splits are > uneven > > > the > > > > bigger splits are placed first in the InputSplit array. " > > > > > > > > > > > > depending on whether numSplits < (or >) num of regions .. it choses > > real > > > > number of splits and the same is done in the code > > > > > > > > // Code > > > > int realNumSplits = numSplits > startKeys.length? startKeys.length: > > > > numSplits; > > > > > > > > Here startKeys.length is the number of regions... > > > > > > > > Am I true? > > > > > > > > Thanks > > > > j.S > > > > > > > > > > > > > > > > On Sun, Apr 11, 2010 at 1:33 PM, Amandeep Khurana <ama...@gmail.com> > > > > wrote: > > > > > > > > > The number of splits is equal to the number of regions... > > > > > > > > > > > > > > > > > > > > On Sun, Apr 11, 2010 at 12:54 AM, john smith < > js1987.sm...@gmail.com > > > > > > > > wrote: > > > > > > > > > > > Hi , > > > > > > > > > > > > In the method "public org.apache.hadoop.mapred.InputSplit[] > > > > *getSplits* > > > > > > (org.apache.hadoop.mapred.JobConf job, > > > > > > > > > > > > int > > numSplits) > > > " > > > > > > > > > > > > how is the "numSplits" decided ? I've seen differnt values of > > > > > > numSplits for different MR jobs . Any reason for this ? > > > > > > > > > > > > Also what if I ignore numsplits and always split at region > > > > > > boundaries.I guess that , splitting at region boundaries makes > more > > > > > > sense and improves some what data locality. > > > > > > > > > > > > Any comments on the above statement? > > > > > > > > > > > > Thanks > > > > > > > > > > > > j.S > > > > > > > > > > > > > > > > > > > > >