Hi, I'm running current snapshot (-r709609), doing a simple word count using python over streaming. I'm have a relatively moderate setup of 17 nodes.
I'm getting this exception: java.io.FileNotFoundException: /usr/local/hadoop/hadoop-hadoop/mapred/local/taskTracker/jobcache/job_200811041109_0003/attempt_200811041109_0003_m_000000_0/output/spill4055.out.index (Too many open files) at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.(FileInputStream.java:137) at org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.(RawLocalFileSystem.java:62) at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.(RawLocalFileSystem.java:98) at org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:168) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:359) at org.apache.hadoop.mapred.IndexRecord.readIndexFile(IndexRecord.java:47) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.getIndexInformation(MapTask.java:1339) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1237) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:857) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333) at org.apache.hadoop.mapred.Child.main(Child.java:155) I see that AFTER I've reconfigured the max allowable open files to 4096! When I monitor the number of open files on a box running hadoop I see the number fluctuating around 900 during the map phase. Then I see it going up through the roof during sorting/shuffling phase. I see a lot of open files named like "/users/hadoop/hadoop-hadoop/mapred/local/taskTracker/jobcache/job_200811041109_0003/attempt_200811041109_0003_m_000000_1/outp ut/spill2188.out" What a poor user to do about this? Reconfigure hadoop to allow 32K open files as somebody suggested on an hbase forum that I googled up? Or some other ridiculous number? If yes, what should it be? Or is it my config problem and there is a way to control this? Do I need to file a jira about this or is this a problem that people are aware of? Because right now it looks to me that Hadoop scalability is broken. No way 4K descriptors should be insufficient. Any feedback will be appreciated. Thanks, -Yuri P.S. BTW, someone on this list has suggested before that after restarting hadoop a similar sounding problem goes away for a while. It did not work for me.