Thanks for correction. Then I'll have followed questions. :) After restarting hbase, will all regions served by it's local RS? Will all RSs share regions fairly (in terms of loading)?
Thanks, Victor On Fri, Feb 19, 2010 at 12:44 PM, Jean-Daniel Cryans <jdcry...@apache.org> wrote: > Victor, > > HDFS writes by default on local datanode first so it only requires a > few compactions for a region to be completely hosted on the same > machine as the region server serving it. Worst case is 24 hours, the > time for a major compaction to happen. > > J-D > > On Thu, Feb 18, 2010 at 7:36 PM, Victor Hsieh <victorhs...@gmail.com> wrote: >> One tricky thing is that if the region size is larger (default max >> size is 256MB) that HDFS block size (default 64MB), it's still >> necessary to go through network. >> >> Victor >> >> On Thu, Feb 18, 2010 at 12:22 AM, Jean-Daniel Cryans >> <jdcry...@apache.org> wrote: >>> Bryan, >>> >>> What you are describing is already implemented and from my experience >90% >>> of my tasks are usually run on the region server that has the mapped region. >>> >>> See o.a.h.h.mapreduce.TableSplit.getLocations() >>> >>> J-D >>> >>> On Wed, Feb 17, 2010 at 12:10 AM, Bryan McCormick <br...@readpath.com>wrote: >>> >>>> Quick question about data local vs rack local tasks when running map reduce >>>> jobs against hbase. I've just run a job against a table that was split into >>>> 1,645 tasks. Looking at the job page it's reporting that 1,445 of those >>>> jobs >>>> were rack local compared to 200 that were data local. I'm taking these >>>> counters to mean that most of the jobs were running on a server that wasn't >>>> the same as the relevant region server. Is it possible or are there plans >>>> to add some logic into the scheduler to prefer jobs to run on the same >>>> server as the regionserver if possible? >>>> >>>> With HBase is there a similar way to tell if a region on a regionserver has >>>> a copy of the files that it needs to serve the region on a local datanode >>>> instead of having to cross the network to get it? >>>> >>>> I know that when you're writing new data into a table and it splits, the >>>> default is to have the first datanode copy be local. But after a fairly >>>> large table has been brought up and down several times with all of the >>>> regions being reassigned, is there logic when assigning regions to put them >>>> on a data local server? >>>> >>>> Thanks, >>>> Bryan >>>> >>> >> >