Data Local Questions

Bryan McCormick Wed, 17 Feb 2010 00:11:27 -0800

Quick question about data local vs rack local tasks when running mapreduce jobs against hbase. I've just run a job against a table thatwas split into 1,645 tasks. Looking at the job page it's reportingthat 1,445 of those jobs were rack local compared to 200 that weredata local. I'm taking these counters to mean that most of the jobswere running on a server that wasn't the same as the relevant regionserver. Is it possible or are there plans to add some logic into thescheduler to prefer jobs to run on the same server as the regionserverif possible?

With HBase is there a similar way to tell if a region on aregionserver has a copy of the files that it needs to serve the regionon a local datanode instead of having to cross the network to get it?

I know that when you're writing new data into a table and it splits,the default is to have the first datanode copy be local. But after afairly large table has been brought up and down several times with allof the regions being reassigned, is there logic when assigning regionsto put them on a data local server?


Thanks,
Bryan

Data Local Questions

Reply via email to