[ 
https://issues.apache.org/jira/browse/SPARK-21971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro closed SPARK-21971.
------------------------------------
    Resolution: Not A Problem

> Too many open files in Spark due to concurrent files being opened
> -----------------------------------------------------------------
>
>                 Key: SPARK-21971
>                 URL: https://issues.apache.org/jira/browse/SPARK-21971
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, SQL
>    Affects Versions: 2.1.0, 2.2.0
>            Reporter: Rajesh Balamohan
>            Priority: Minor
>
> When running Q67 of TPC-DS at 1 TB dataset on multi node cluster, it 
> consistently fails with "too many open files" exception.
> {noformat}
> O scheduler.TaskSetManager: Finished task 25.0 in stage 844.0 (TID 243786) in 
> 394 ms on machine111.xyz (executor 2) (189/200)
> 17/08/20 10:33:45 INFO scheduler.TaskSetManager: Finished task 172.0 in stage 
> 844.0 (TID 243932) in 11996 ms on cn116-10.l42scl.hortonworks.com (executor 
> 6) (190/200)
> 17/08/20 10:37:40 WARN scheduler.TaskSetManager: Lost task 144.0 in stage 
> 844.0 (TID 243904, machine1.xyz, executor 1): 
> java.nio.file.FileSystemException: 
> /grid/3/hadoop/yarn/local/usercache/rbalamohan/appcache/application_1490656001509_7207/blockmgr-5180e3f0-f7ed-44bb-affc-8f99f09ba7bc/28/temp_local_690afbf7-172d-4fdb-8492-3e2ebd8d5183:
>  Too many open files
>         at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
>         at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>         at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>         at 
> sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177)
>         at java.nio.channels.FileChannel.open(FileChannel.java:287)
>         at java.nio.channels.FileChannel.open(FileChannel.java:335)
>         at 
> org.apache.spark.io.NioBufferedFileInputStream.<init>(NioBufferedFileInputStream.java:43)
>         at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeSorterSpillReader.<init>(UnsafeSorterSpillReader.java:75)
>         at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeSorterSpillWriter.getReader(UnsafeSorterSpillWriter.java:150)
>         at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.getIterator(UnsafeExternalSorter.java:607)
>         at 
> org.apache.spark.sql.execution.ExternalAppendOnlyUnsafeRowArray.generateIterator(ExternalAppendOnlyUnsafeRowArray.scala:169)
>         at 
> org.apache.spark.sql.execution.ExternalAppendOnlyUnsafeRowArray.generateIterator(ExternalAppendOnlyUnsafeRowArray.scala:173)
> {noformat}
> Cluster was configured with multiple cores per executor. 
> Window function uses "spark.sql.windowExec.buffer.spill.threshold=4096" which 
> causes large number of spills in larger dataset. With multiple cores per 
> executor, this reproduces easily. 
> {{UnsafeExternalSorter::getIterator()}} invokes {{spillWriter.getReader}} for 
> all the available spillWriters. {{UnsafeSorterSpillReader}} opens up the file 
> in its constructor and closes the file later as a part of its close() call. 
> This causes too many open files issue.
> Note that this is not a file leak, but more of concurrent files being open at 
> any given time depending on the dataset being processed.
> One option could be to increase "spark.sql.windowExec.buffer.spill.threshold" 
> so that fewer spill files are generated, but it is hard to determine the 
> sweetspot for all workload. Another option is to set ulimit to "unlimited" 
> for files, but that would not be a good production setting. It would be good 
> to consider reducing the number of concurrent 
> "UnsafeExternalSorter::getIterator".



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to