Hi,

I'm running current snapshot (-r709609), doing a simple word count using python 
over 
streaming.  I'm have a relatively moderate setup of 17 nodes.

I'm getting this exception:

java.io.FileNotFoundException: 
/usr/local/hadoop/hadoop-hadoop/mapred/local/taskTracker/jobcache/job_200811041109_0003/attempt_200811041109_0003_m_000000_0/output/spill4055.out.index
 
(Too many open files)
        at java.io.FileInputStream.open(Native Method)
        at java.io.FileInputStream.(FileInputStream.java:137)
        at 
org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.(RawLocalFileSystem.java:62)
        at 
org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.(RawLocalFileSystem.java:98)
        at 
org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:168)
        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:359)
        at 
org.apache.hadoop.mapred.IndexRecord.readIndexFile(IndexRecord.java:47)
        at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.getIndexInformation(MapTask.java:1339)
        at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1237)
        at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:857)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333)
        at org.apache.hadoop.mapred.Child.main(Child.java:155)

I see that AFTER I've reconfigured the max allowable open files to 4096!

When I monitor the number of open files on a box running hadoop I see the 
number fluctuating around 900 during the map phase.  Then I see it going up 
through 
the roof during sorting/shuffling phase.  I see a lot of open files named like 
"/users/hadoop/hadoop-hadoop/mapred/local/taskTracker/jobcache/job_200811041109_0003/attempt_200811041109_0003_m_000000_1/outp
ut/spill2188.out"

What a poor user to do about this?  Reconfigure hadoop to allow 32K open files 
as somebody suggested
on an hbase forum that I googled up?  Or some other ridiculous number?  If yes, 
what should it be?
Or is it my config problem and there is a way to control this?

Do I need to file a jira about this or is this a problem that people are aware 
of?  Because right
now it looks to me that Hadoop scalability is broken.  No way 4K descriptors 
should be insufficient.

Any feedback will be appreciated.

Thanks,

  -Yuri

P.S.  BTW, someone on this list has suggested before that after restarting 
hadoop a similar sounding 
problem goes away for a while.  It did not work for me.

Reply via email to