Hi all,

According to the map task scheduling rules, it prefers a task with data
local. And seeing the Data-local map counter(in the job report), it does
have a very high locality for all the map tasks. However, when observing the
metrics of DataNode( read_from_local and read_from_remote) , there is higher
remote read rate than the job reported. It seems the Data-local Map counter
is not so accurate as we expected. I wonder when or why it will trigger the
HDFS remote read while already assigning a data-local map task.

Thanks for your time.

Best Regards,
Grace

Reply via email to