Re: Turning off FileSystem statistics during MapReduce

2008-10-06 Thread Nathan Marz
We see this on Maps and only on incrementBytesRead (not on  
incrementBytesWritten). It is on HDFS where we are seeing the time  
spent. It seems that this is because incrementBytesRead is called  
every time a record is read, while incrementBytesWritten is only  
called when a buffer is spilled. We would benefit a lot from being  
able to turn this off.




On Oct 3, 2008, at 6:19 PM, Arun C Murthy wrote:


Nathan,

On Oct 3, 2008, at 5:18 PM, Nathan Marz wrote:


Hello,

We have been doing some profiling of our MapReduce jobs, and we are  
seeing about 20% of the time of our jobs is spent calling  
FileSystem$Statistics.incrementBytesRead when we interact with  
the FileSystem. Is there a way to turn this stats-collection off?




This is interesting... could you provide more details? Are you  
seeing this on Maps or Reduces? Which FileSystem exhibited this i.e.  
HDFS or LocalFS? Any details on about your application?


To answer your original question - no, there isn't a way to disable  
this. However, if this turns out to be a systemic problem we  
definitely should consider having an option to allow users to switch  
it off.


So any information you can provide helps - thanks!

Arun



Thanks,
Nathan Marz
Rapleaf







Turning off FileSystem statistics during MapReduce

2008-10-03 Thread Nathan Marz

Hello,

We have been doing some profiling of our MapReduce jobs, and we are  
seeing about 20% of the time of our jobs is spent calling FileSystem 
$Statistics.incrementBytesRead when we interact with the FileSystem.  
Is there a way to turn this stats-collection off?


Thanks,
Nathan Marz
Rapleaf



Re: Turning off FileSystem statistics during MapReduce

2008-10-03 Thread Arun C Murthy

Nathan,

On Oct 3, 2008, at 5:18 PM, Nathan Marz wrote:


Hello,

We have been doing some profiling of our MapReduce jobs, and we are  
seeing about 20% of the time of our jobs is spent calling FileSystem 
$Statistics.incrementBytesRead when we interact with the  
FileSystem. Is there a way to turn this stats-collection off?




This is interesting... could you provide more details? Are you seeing  
this on Maps or Reduces? Which FileSystem exhibited this i.e. HDFS or  
LocalFS? Any details on about your application?


To answer your original question - no, there isn't a way to disable  
this. However, if this turns out to be a systemic problem we  
definitely should consider having an option to allow users to switch  
it off.


So any information you can provide helps - thanks!

Arun



Thanks,
Nathan Marz
Rapleaf