On Mar 6, 2012, at 10:22 AM, Chris Curtin wrote: > Hi, > > We had a fun morning trying to figure out why our cluster was failing jobs, > removing nodes from the cluster etc. The majority of the errors were > something like: > [snip]
> We are running CDH3u3. You'll need to check with CDH lists. However, hadoop-1.0 (and prior, starting with hadoop-0.20.203) have mechanisms to clean up userlogs automatically; else, as you've found out, operating large clusters (4k nodes) with millions of jobs per month is too painful. Arun -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/