We see this on Maps and only on incrementBytesRead (not on
incrementBytesWritten). It is on HDFS where we are seeing the time
spent. It seems that this is because incrementBytesRead is called
every time a record is read, while incrementBytesWritten is only
called when a buffer is spilled. We would benefit a lot from being
able to turn this off.
On Oct 3, 2008, at 6:19 PM, Arun C Murthy wrote:
Nathan,
On Oct 3, 2008, at 5:18 PM, Nathan Marz wrote:
Hello,
We have been doing some profiling of our MapReduce jobs, and we are
seeing about 20% of the time of our jobs is spent calling
"FileSystem$Statistics.incrementBytesRead" when we interact with
the FileSystem. Is there a way to turn this stats-collection off?
This is interesting... could you provide more details? Are you
seeing this on Maps or Reduces? Which FileSystem exhibited this i.e.
HDFS or LocalFS? Any details on about your application?
To answer your original question - no, there isn't a way to disable
this. However, if this turns out to be a systemic problem we
definitely should consider having an option to allow users to switch
it off.
So any information you can provide helps - thanks!
Arun
Thanks,
Nathan Marz
Rapleaf