Re: too many open files error

Johannes Zillmann Thu, 02 Oct 2008 05:23:24 -0700

Having a similar problem.

After upgrading from hadoop 0.16.4 to 0.17.2.1 we're facing"java.io.IOException: java.io.IOException: Too many open files" fatera few jobs.

f.e.:

Error message from task (reduce) tip_200810020918_0014_r_000031 Errorinitializing task_200810020918_0014_r_000031_1:

java.io.IOException: java.io.IOException: Too many open files
        at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
        at java.lang.ProcessImpl.start(ProcessImpl.java:65)
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:451)
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:149)
        at org.apache.hadoop.util.Shell.run(Shell.java:134)
        at org.apache.hadoop.fs.DF.getAvailable(DF.java:73)

at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:296)atorg.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)atorg.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:646)atorg.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1271)atorg.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:912)at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1307)at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2266)

Once a job failed with because of these exception, all subsequent jobsfailing too for the same reason.

After cluster-restart it works fine for a few jobs again....

Johannes

On Sep 27, 2008, at 1:59 AM, Karl Anderson wrote:

On 26-Sep-08, at 3:09 PM, Eric Zhang wrote:
Hi,
I encountered following FileNotFoundException resulting from "toomany open files" error when i tried to run a job. The job had beenrun for several times before without problem. I am confused by theexception because my code closes all the files and even itdoesn't, the job only have only 10-20 small input/output files.The limit on the open file on my box is 1024. Besides, the errorseemed to happen even before the task was executed, I am using 0.17version. I'd appreciate if somebody can shed some light on thisissue. BTW, the job ran ok after i restarted hadoop. Yes, thehadoop-site.xml did exist in that directory.
I had the same errors, including the bash one. Running oneparticular job would cause all subsequent jobs of any kind to fail,even after all running jobs had completed or failed out. This wasconfusing because the failing jobs themselves often had norelationship to the cause, they were just in a bad environment.
If you can't successfully run a dummy job (with the identity mapperand reducer, or a streaming job with cat) once you start gettingfailures, then you are probably in the same situation.
I believe that the problem was caused by increasing the timeout, butI never pinned it down enough to submit a Jira issue. It might havebeen the XML reader or something else. I was using streaming,hadoop-ec2, and either 0.17.0 or 0.18.0. It would happen just asrapidly after I made an ec2 image with a higher open file limit.
Eventually I figured it out by running each job in my pipeline 5 orso times before trying the next one, which let me see which job wascausing the problem (because it would eventually fail itself, ratherthan hosing a later job).
Karl Anderson
[EMAIL PROTECTED]
http://monkey.org/~kra


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
101tec GmbH
Halle (Saale), Saxony-Anhalt, Germany
http://www.101tec.com

Re: too many open files error

Reply via email to