We do track intermediate output used and if a job is using too much and can't be scheduled anywhere on a cluster the CS/JT will fail it. You'll need hadoop-0.20.204 for this though.
Also, with MRv2 we are in the process of adding limits on disk usage for intermediate outputs, logs etc. hth, Arun On Sep 21, 2011, at 3:45 PM, Matt Steele wrote: > Hi All, > > Is it possible to enforce a maximum to the disk space consumed by a > map/reduce job's intermediate output? It looks like you can impose limits on > hdfs consumption, or, via the capacity scheduler, limits on the RAM that a > map/reduce slot uses, or the number of slots used. > > But if I'm worried that a job might exhaust the cluster's disk capacity > during the shuffle, my sense is that I'd have to quarantine the job on a > separate cluster. Am I wrong? Do you have any suggestions for me? > > Thanks, > Matt