[ https://issues.apache.org/jira/browse/SPARK-25035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755322#comment-16755322 ]
Attila Zsolt Piros commented on SPARK-25035: -------------------------------------------- I am working on this. > Replicating disk-stored blocks should avoid memory mapping > ---------------------------------------------------------- > > Key: SPARK-25035 > URL: https://issues.apache.org/jira/browse/SPARK-25035 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 2.3.1 > Reporter: Imran Rashid > Priority: Major > Labels: memory-analysis > > This is a follow-up to SPARK-24296. > When replicating a disk-cached block, even if we fetch-to-disk, we still > memory-map the file, just to copy it to another location. > Ideally we'd just move the tmp file to the right location. But even without > that, we could read the file as an input stream, instead of memory-mapping > the whole thing. Memory-mapping is particularly a problem when running under > yarn, as the OS may believe there is plenty of memory available, meanwhile > yarn decides to kill the process for exceeding memory limits. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org