Quick question about data local vs rack local tasks when running map reduce jobs against hbase. I've just run a job against a table that was split into 1,645 tasks. Looking at the job page it's reporting that 1,445 of those jobs were rack local compared to 200 that were data local. I'm taking these counters to mean that most of the jobs were running on a server that wasn't the same as the relevant region server. Is it possible or are there plans to add some logic into the scheduler to prefer jobs to run on the same server as the regionserver if possible?

With HBase is there a similar way to tell if a region on a regionserver has a copy of the files that it needs to serve the region on a local datanode instead of having to cross the network to get it?

I know that when you're writing new data into a table and it splits, the default is to have the first datanode copy be local. But after a fairly large table has been brought up and down several times with all of the regions being reassigned, is there logic when assigning regions to put them on a data local server?

Thanks,
Bryan

Reply via email to