[ https://issues.apache.org/jira/browse/SPARK-21971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Takeshi Yamamuro closed SPARK-21971. ------------------------------------ Resolution: Not A Problem > Too many open files in Spark due to concurrent files being opened > ----------------------------------------------------------------- > > Key: SPARK-21971 > URL: https://issues.apache.org/jira/browse/SPARK-21971 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL > Affects Versions: 2.1.0, 2.2.0 > Reporter: Rajesh Balamohan > Priority: Minor > > When running Q67 of TPC-DS at 1 TB dataset on multi node cluster, it > consistently fails with "too many open files" exception. > {noformat} > O scheduler.TaskSetManager: Finished task 25.0 in stage 844.0 (TID 243786) in > 394 ms on machine111.xyz (executor 2) (189/200) > 17/08/20 10:33:45 INFO scheduler.TaskSetManager: Finished task 172.0 in stage > 844.0 (TID 243932) in 11996 ms on cn116-10.l42scl.hortonworks.com (executor > 6) (190/200) > 17/08/20 10:37:40 WARN scheduler.TaskSetManager: Lost task 144.0 in stage > 844.0 (TID 243904, machine1.xyz, executor 1): > java.nio.file.FileSystemException: > /grid/3/hadoop/yarn/local/usercache/rbalamohan/appcache/application_1490656001509_7207/blockmgr-5180e3f0-f7ed-44bb-affc-8f99f09ba7bc/28/temp_local_690afbf7-172d-4fdb-8492-3e2ebd8d5183: > Too many open files > at > sun.nio.fs.UnixException.translateToIOException(UnixException.java:91) > at > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) > at > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) > at > sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177) > at java.nio.channels.FileChannel.open(FileChannel.java:287) > at java.nio.channels.FileChannel.open(FileChannel.java:335) > at > org.apache.spark.io.NioBufferedFileInputStream.<init>(NioBufferedFileInputStream.java:43) > at > org.apache.spark.util.collection.unsafe.sort.UnsafeSorterSpillReader.<init>(UnsafeSorterSpillReader.java:75) > at > org.apache.spark.util.collection.unsafe.sort.UnsafeSorterSpillWriter.getReader(UnsafeSorterSpillWriter.java:150) > at > org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.getIterator(UnsafeExternalSorter.java:607) > at > org.apache.spark.sql.execution.ExternalAppendOnlyUnsafeRowArray.generateIterator(ExternalAppendOnlyUnsafeRowArray.scala:169) > at > org.apache.spark.sql.execution.ExternalAppendOnlyUnsafeRowArray.generateIterator(ExternalAppendOnlyUnsafeRowArray.scala:173) > {noformat} > Cluster was configured with multiple cores per executor. > Window function uses "spark.sql.windowExec.buffer.spill.threshold=4096" which > causes large number of spills in larger dataset. With multiple cores per > executor, this reproduces easily. > {{UnsafeExternalSorter::getIterator()}} invokes {{spillWriter.getReader}} for > all the available spillWriters. {{UnsafeSorterSpillReader}} opens up the file > in its constructor and closes the file later as a part of its close() call. > This causes too many open files issue. > Note that this is not a file leak, but more of concurrent files being open at > any given time depending on the dataset being processed. > One option could be to increase "spark.sql.windowExec.buffer.spill.threshold" > so that fewer spill files are generated, but it is hard to determine the > sweetspot for all workload. Another option is to set ulimit to "unlimited" > for files, but that would not be a good production setting. It would be good > to consider reducing the number of concurrent > "UnsafeExternalSorter::getIterator". -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org