[jira] [Updated] (SPARK-21475) Change to use NIO's Files API for external shuffle service
[ https://issues.apache.org/jira/browse/SPARK-21475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-21475: - Fix Version/s: (was: 3.0.0) > Change to use NIO's Files API for external shuffle service > -- > > Key: SPARK-21475 > URL: https://issues.apache.org/jira/browse/SPARK-21475 > Project: Spark > Issue Type: Improvement > Components: Shuffle, Spark Core >Affects Versions: 2.3.0 >Reporter: Saisai Shao >Assignee: Saisai Shao >Priority: Minor > Fix For: 2.3.0 > > > Java's {{FileInputStream}} and {{FileOutputStream}} overrides {{finalize()}}, > even this file input/output stream is closed correctly and promptly, it will > still leave some memory footprints which will get cleaned in Full GC. This > will introduce two side effects: > 1. Lots of memory footprints regarding to Finalizer will be kept in memory > and this will increase the memory overhead. In our use case of external > shuffle service, a busy shuffle service will have bunch of this object and > potentially lead to OOM. > 2. The Finalizer will only be called in Full GC, and this will increase the > overhead of Full GC and lead to long GC pause. > So to fix this potential issue, here propose to use NIO's > Files#newInput/OutputStream instead in some critical paths like shuffle. > https://www.cloudbees.com/blog/fileinputstream-fileoutputstream-considered-harmful -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21475) Change to use NIO's Files API for external shuffle service
[ https://issues.apache.org/jira/browse/SPARK-21475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-21475: - Summary: Change to use NIO's Files API for external shuffle service (was: Change the usage of FileInputStream/OutputStream to Files.newInput/OutputStream in the critical path) > Change to use NIO's Files API for external shuffle service > -- > > Key: SPARK-21475 > URL: https://issues.apache.org/jira/browse/SPARK-21475 > Project: Spark > Issue Type: Improvement > Components: Shuffle, Spark Core >Affects Versions: 2.3.0 >Reporter: Saisai Shao >Assignee: Saisai Shao >Priority: Minor > Fix For: 2.3.0, 3.0.0 > > > Java's {{FileInputStream}} and {{FileOutputStream}} overrides {{finalize()}}, > even this file input/output stream is closed correctly and promptly, it will > still leave some memory footprints which will get cleaned in Full GC. This > will introduce two side effects: > 1. Lots of memory footprints regarding to Finalizer will be kept in memory > and this will increase the memory overhead. In our use case of external > shuffle service, a busy shuffle service will have bunch of this object and > potentially lead to OOM. > 2. The Finalizer will only be called in Full GC, and this will increase the > overhead of Full GC and lead to long GC pause. > So to fix this potential issue, here propose to use NIO's > Files#newInput/OutputStream instead in some critical paths like shuffle. > https://www.cloudbees.com/blog/fileinputstream-fileoutputstream-considered-harmful -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org