Re: Data Local Questions

Jean-Daniel Cryans Wed, 17 Feb 2010 08:23:15 -0800

Bryan,

What you are describing is already implemented and from my experience >90%
of my tasks are usually run on the region server that has the mapped region.


See o.a.h.h.mapreduce.TableSplit.getLocations()

J-D

On Wed, Feb 17, 2010 at 12:10 AM, Bryan McCormick <br...@readpath.com>wrote:

> Quick question about data local vs rack local tasks when running map reduce
> jobs against hbase. I've just run a job against a table that was split into
> 1,645 tasks. Looking at the job page it's reporting that 1,445 of those jobs
> were rack local compared to 200 that were data local.  I'm taking these
> counters to mean that most of the jobs were running on a server that wasn't
> the same as the relevant region server.  Is it possible or are there plans
> to add some logic into the scheduler to prefer jobs to run on the same
> server as the regionserver if possible?
>
> With HBase is there a similar way to tell if a region on a regionserver has
> a copy of the files that it needs to serve the region on a local datanode
> instead of having to cross the network to get it?
>
> I know that when you're writing new data into a table and it splits, the
> default is to have the first datanode copy be local. But after a fairly
> large table has been brought up and down several times with all of the
> regions being reassigned, is there logic when assigning regions to put them
> on a data local server?
>
> Thanks,
> Bryan
>

Re: Data Local Questions

Reply via email to