Re: Data Local Questions

Victor Hsieh Thu, 18 Feb 2010 21:02:59 -0800

Thanks for correction.  Then I'll have followed questions. :)

After restarting hbase, will all regions served by it's local RS?
Will all RSs share regions fairly (in terms of loading)?


Thanks,
Victor

On Fri, Feb 19, 2010 at 12:44 PM, Jean-Daniel Cryans
<jdcry...@apache.org> wrote:
> Victor,
>
> HDFS writes by default on local datanode first so it only requires a
> few compactions for a region to be completely hosted on the same
> machine as the region server serving it. Worst case is 24 hours, the
> time for a major compaction to happen.
>
> J-D
>
> On Thu, Feb 18, 2010 at 7:36 PM, Victor Hsieh <victorhs...@gmail.com> wrote:
>> One tricky thing is that if the region size is larger (default max
>> size is 256MB) that HDFS block size (default 64MB), it's still
>> necessary to go through network.
>>
>> Victor
>>
>> On Thu, Feb 18, 2010 at 12:22 AM, Jean-Daniel Cryans
>> <jdcry...@apache.org> wrote:
>>> Bryan,
>>>
>>> What you are describing is already implemented and from my experience >90%
>>> of my tasks are usually run on the region server that has the mapped region.
>>>
>>> See o.a.h.h.mapreduce.TableSplit.getLocations()
>>>
>>> J-D
>>>
>>> On Wed, Feb 17, 2010 at 12:10 AM, Bryan McCormick <br...@readpath.com>wrote:
>>>
>>>> Quick question about data local vs rack local tasks when running map reduce
>>>> jobs against hbase. I've just run a job against a table that was split into
>>>> 1,645 tasks. Looking at the job page it's reporting that 1,445 of those 
>>>> jobs
>>>> were rack local compared to 200 that were data local.  I'm taking these
>>>> counters to mean that most of the jobs were running on a server that wasn't
>>>> the same as the relevant region server.  Is it possible or are there plans
>>>> to add some logic into the scheduler to prefer jobs to run on the same
>>>> server as the regionserver if possible?
>>>>
>>>> With HBase is there a similar way to tell if a region on a regionserver has
>>>> a copy of the files that it needs to serve the region on a local datanode
>>>> instead of having to cross the network to get it?
>>>>
>>>> I know that when you're writing new data into a table and it splits, the
>>>> default is to have the first datanode copy be local. But after a fairly
>>>> large table has been brought up and down several times with all of the
>>>> regions being reassigned, is there logic when assigning regions to put them
>>>> on a data local server?
>>>>
>>>> Thanks,
>>>> Bryan
>>>>
>>>
>>
>

Re: Data Local Questions

Reply via email to