We do track intermediate output used and if a job is using too much and can't 
be scheduled anywhere on a cluster the CS/JT will fail it. You'll need 
hadoop-0.20.204 for this though.

Also, with MRv2 we are in the process of adding limits on disk usage for 
intermediate outputs, logs etc.

hth,
Arun

On Sep 21, 2011, at 3:45 PM, Matt Steele wrote:

> Hi All,
> 
> Is it possible to enforce a maximum to the disk space consumed by a 
> map/reduce job's intermediate output?  It looks like you can impose limits on 
> hdfs consumption, or, via the capacity scheduler, limits on the RAM that a 
> map/reduce slot uses, or the number of slots used.
> 
> But if I'm worried that a job might exhaust the cluster's disk capacity 
> during the shuffle, my sense is that I'd have to quarantine the job on a 
> separate cluster.  Am I wrong?  Do you have any suggestions for me?
> 
> Thanks,
> Matt

Reply via email to