I am analyzing some HDFS counters, and I have these questions? 1 - The "HDFS: Number of bytes read" as long as the map tasks read data from the HDFS, or it is a pre-calculated sum before the mappers start to read?
2 - With these metrics, it was written some data in the HDFS before the map tasks start. Does anyone have an opinion if it is possible the map tasks write the intermediate output in thi HDFS? This happens because this job defined by the user forces to (I don't know what this job does)? <mapcompletion>map() completion: 0.9946828</mapcompletion> <redcompletion>reduce() completion: 0.0</redcompletion> <hdfs>HDFS: Number of bytes read=314470180</hdfs> <hdfs>HDFS: Number of bytes written=313912087</hdfs> -- Best regards,