Victor,

HDFS writes by default on local datanode first so it only requires a
few compactions for a region to be completely hosted on the same
machine as the region server serving it. Worst case is 24 hours, the
time for a major compaction to happen.

J-D

On Thu, Feb 18, 2010 at 7:36 PM, Victor Hsieh <victorhs...@gmail.com> wrote:
> One tricky thing is that if the region size is larger (default max
> size is 256MB) that HDFS block size (default 64MB), it's still
> necessary to go through network.
>
> Victor
>
> On Thu, Feb 18, 2010 at 12:22 AM, Jean-Daniel Cryans
> <jdcry...@apache.org> wrote:
>> Bryan,
>>
>> What you are describing is already implemented and from my experience >90%
>> of my tasks are usually run on the region server that has the mapped region.
>>
>> See o.a.h.h.mapreduce.TableSplit.getLocations()
>>
>> J-D
>>
>> On Wed, Feb 17, 2010 at 12:10 AM, Bryan McCormick <br...@readpath.com>wrote:
>>
>>> Quick question about data local vs rack local tasks when running map reduce
>>> jobs against hbase. I've just run a job against a table that was split into
>>> 1,645 tasks. Looking at the job page it's reporting that 1,445 of those jobs
>>> were rack local compared to 200 that were data local.  I'm taking these
>>> counters to mean that most of the jobs were running on a server that wasn't
>>> the same as the relevant region server.  Is it possible or are there plans
>>> to add some logic into the scheduler to prefer jobs to run on the same
>>> server as the regionserver if possible?
>>>
>>> With HBase is there a similar way to tell if a region on a regionserver has
>>> a copy of the files that it needs to serve the region on a local datanode
>>> instead of having to cross the network to get it?
>>>
>>> I know that when you're writing new data into a table and it splits, the
>>> default is to have the first datanode copy be local. But after a fairly
>>> large table has been brought up and down several times with all of the
>>> regions being reassigned, is there logic when assigning regions to put them
>>> on a data local server?
>>>
>>> Thanks,
>>> Bryan
>>>
>>
>

Reply via email to