Hi all, According to the map task scheduling rules, it prefers a task with data local. And seeing the Data-local map counter(in the job report), it does have a very high locality for all the map tasks. However, when observing the metrics of DataNode( read_from_local and read_from_remote) , there is higher remote read rate than the job reported. It seems the Data-local Map counter is not so accurate as we expected. I wonder when or why it will trigger the HDFS remote read while already assigning a data-local map task.
Thanks for your time. Best Regards, Grace
