We see this on Maps and only on incrementBytesRead (not on incrementBytesWritten). It is on HDFS where we are seeing the time spent. It seems that this is because incrementBytesRead is called every time a record is read, while incrementBytesWritten is only called when a buffer is spilled. We would benefit a lot from being able to turn this off.

On Oct 3, 2008, at 6:19 PM, Arun C Murthy wrote:


On Oct 3, 2008, at 5:18 PM, Nathan Marz wrote:


We have been doing some profiling of our MapReduce jobs, and we are seeing about 20% of the time of our jobs is spent calling "FileSystem$Statistics.incrementBytesRead" when we interact with the FileSystem. Is there a way to turn this stats-collection off?

This is interesting... could you provide more details? Are you seeing this on Maps or Reduces? Which FileSystem exhibited this i.e. HDFS or LocalFS? Any details on about your application?

To answer your original question - no, there isn't a way to disable this. However, if this turns out to be a systemic problem we definitely should consider having an option to allow users to switch it off.

So any information you can provide helps - thanks!


Nathan Marz

Reply via email to