Counters for data-local and rack-local tasks should be replaced by
bytes-read-local and bytes-read-rack
-------------------------------------------------------------------------------------------------------
Key: MAPREDUCE-1922
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1922
Project: Hadoop Map/Reduce
Issue Type: Improvement
Environment: All
Reporter: Milind Bhandarkar
Assignee: Arun C Murthy
As more and more applications use combine file input format (to reduce number
of mappers), formats with columns groups implemented as different hdfs files
(zebra, hbase), composite input formats (map-side joins), data-locality and
rack-locality loses its meaning. (A map task reading only one column group, say
20% of its input, locally and 80% remote still gets flagged as data-local map.)
So, my suggestion is to drop these counters, and instead, replace them with
HDFS_LOCAL_BYTES_READ, HDFS_RACK_BYTES_READ, and HDFS_TOTAL_BYTES_READ. These
counters will make it easier to reason about read-performance for maps.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.