Bryan, What you are describing is already implemented and from my experience >90% of my tasks are usually run on the region server that has the mapped region.
See o.a.h.h.mapreduce.TableSplit.getLocations() J-D On Wed, Feb 17, 2010 at 12:10 AM, Bryan McCormick <br...@readpath.com>wrote: > Quick question about data local vs rack local tasks when running map reduce > jobs against hbase. I've just run a job against a table that was split into > 1,645 tasks. Looking at the job page it's reporting that 1,445 of those jobs > were rack local compared to 200 that were data local. I'm taking these > counters to mean that most of the jobs were running on a server that wasn't > the same as the relevant region server. Is it possible or are there plans > to add some logic into the scheduler to prefer jobs to run on the same > server as the regionserver if possible? > > With HBase is there a similar way to tell if a region on a regionserver has > a copy of the files that it needs to serve the region on a local datanode > instead of having to cross the network to get it? > > I know that when you're writing new data into a table and it splits, the > default is to have the first datanode copy be local. But after a fairly > large table has been brought up and down several times with all of the > regions being reassigned, is there logic when assigning regions to put them > on a data local server? > > Thanks, > Bryan >