[
https://issues.apache.org/jira/browse/HADOOP-4614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yuri Pradkin updated HADOOP-4614:
---------------------------------
Attachment: openfds.txt
I'm posting the results of lsof while running Abdul's code.
After I upped the max number of fds to 16K, the job ran to completion.
I was monitoring the number of open files/processes every 15s (by simply
running ps and lsof | wc -l) and saw this:
#processes open_files
...
13 646
13 648
12 2535
13 4860
12 4346
12 3842
12 3324
12 2823
12 2316
12 1852
12 1387
12 936
12 643
12 643
12 643
12 643
12 643
12 643
13 642
12 642
12 4775
12 2738
12 917
12 643
12 642
12 4992
12 4453
12 3943
12 3299
12 2855
12 2437
...
It looks like something (garbage collection?) cleans up fds periodically; the
max I saw was 5007 (but again, there may have been more in between the 15s
sampling interval).
> "Too many open files" error while processing a large gzip file
> --------------------------------------------------------------
>
> Key: HADOOP-4614
> URL: https://issues.apache.org/jira/browse/HADOOP-4614
> Project: Hadoop Core
> Issue Type: Bug
> Components: io
> Affects Versions: 0.18.2
> Reporter: Abdul Qadeer
> Fix For: 0.18.3
>
> Attachments: openfds.txt
>
>
> I am running a simple word count program on a gzip compressed data of size 4
> GB (Uncompressed size is about 7 GB). I have setup of 17 nodes in my Hadoop
> cluster. After some time, I get the following exception:
> java.io.FileNotFoundException:
> /usr/local/hadoop/hadoop-hadoop/mapred/local/taskTracker/jobcache/job_200811041109_0003/attempt_200811041109_0003_m_000000_0/output/spill4055.out.index
> (Too many open files)
> at java.io.FileInputStream.open(Native Method)
> at java.io.FileInputStream.(FileInputStream.java:137)
> at
> org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.(RawLocalFileSystem.java:62)
> at
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.(RawLocalFileSystem.java:98)
> at
> org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:168)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:359)
> at
> org.apache.hadoop.mapred.IndexRecord.readIndexFile(IndexRecord.java:47)
> at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.getIndexInformation(MapTask.java:1339)
> at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1237)
> at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:857)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333)
> at org.apache.hadoop.mapred.Child.main(Child.java:155)
> From a user's perspective I know that Hadoop will use only one mapper for a
> gzipped file. The above exception suggests that probably Hadoop puts the
> intermediate data into many files. But the question is that "exactly how
> many open files are required in the worst case for any data size and cluster
> size?" Currently it looks as if Hadoop needs more number of open files as
> the size of input or the cluster size (in terms of nodes, mapper, reducers)
> increases. This is not plausible as far as scalability is concerned. A user
> needs to write some number in the /etc/security/limits.conf file that how
> many open files are allowed by hadoop node. The question is what that
> "magical number" should be?
> So probably the best solution to this problem is to change Hadoop such a way
> that it can work with some moderate number of allowed open files (e.g. 4 K)
> or any other number should be suggested as an upper limit such that a user is
> sure that for any data size and cluster size, hadoop will not run into this
> "too many open files" issue.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.