[ https://issues.apache.org/jira/browse/SPARK-21475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16323617#comment-16323617 ]
Shixiong Zhu commented on SPARK-21475: -------------------------------------- Fixed > Change to use NIO's Files API for external shuffle service > ---------------------------------------------------------- > > Key: SPARK-21475 > URL: https://issues.apache.org/jira/browse/SPARK-21475 > Project: Spark > Issue Type: Improvement > Components: Shuffle, Spark Core > Affects Versions: 2.3.0 > Reporter: Saisai Shao > Assignee: Saisai Shao > Priority: Minor > Fix For: 2.3.0 > > > Java's {{FileInputStream}} and {{FileOutputStream}} overrides {{finalize()}}, > even this file input/output stream is closed correctly and promptly, it will > still leave some memory footprints which will get cleaned in Full GC. This > will introduce two side effects: > 1. Lots of memory footprints regarding to Finalizer will be kept in memory > and this will increase the memory overhead. In our use case of external > shuffle service, a busy shuffle service will have bunch of this object and > potentially lead to OOM. > 2. The Finalizer will only be called in Full GC, and this will increase the > overhead of Full GC and lead to long GC pause. > So to fix this potential issue, here propose to use NIO's > Files#newInput/OutputStream instead in some critical paths like shuffle. > https://www.cloudbees.com/blog/fileinputstream-fileoutputstream-considered-harmful -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org