open files not being cleaned up?

Karl Anderson Fri, 29 Aug 2008 15:11:41 -0700

I'm running several Hadoop jobs sequentially on one cluster. I'mnoticing that later jobs are dying because of too many open files, andthat earlier runs tend to cause later runs to die - in other words,file resources aren't being freed somewhere.

By running a job over and over again, I can cause all subsequent jobsto die, even jobs that had successfully run earlier.

I'm using streaming on a hadoop-ec2 cluster, hadoop version 18.0, andmy inputs and outputs are all HDFS controlled by streaming (stdin andstdout), never writing or reading as a side effect. Each job usesthe HDFS output of a previous job as its input, but the jobs are allseparate Hadoop processes, and only one is running at a time.

I have increased the open file limit for root to 65536 in limits.confon my ec2 image, no help.


Is there any solution other than firing up a new cluster for each job?

I could file a bug, but I'm not sure what's consuming the files. On arandom job box, /proc/<pid>/fd shows only 359 fd entries for theentire box, and the most open for any process is 174.

open files not being cleaned up?

Reply via email to