[ https://issues.apache.org/jira/browse/SPARK-19244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mridul Muralidharan resolved SPARK-19244. ----------------------------------------- Resolution: Fixed Fix Version/s: 2.2.0 Issue resolved by pull request 16603 [https://github.com/apache/spark/pull/16603] > Sort MemoryConsumers according to their memory usage when spilling > ------------------------------------------------------------------ > > Key: SPARK-19244 > URL: https://issues.apache.org/jira/browse/SPARK-19244 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Reporter: Liang-Chi Hsieh > Fix For: 2.2.0 > > > In `TaskMemoryManager `, when we acquire memory by calling > `acquireExecutionMemory` and we can't acquire required memory, we will try to > spill other memory consumers. > Currently, we simply iterates the memory consumers in a hash set. Normally > each time the consumer will be iterated in the same order. > The first issue is that we might spill additional consumers. For example, if > consumer 1 uses 10MB, consumer 2 uses 50MB, then consumer 3 acquires 100MB > but we can only get 60MB and spilling is needed. We might spill both consumer > 1 and consumer 2. But we actually just need to spill consumer 2 and get the > required 100MB. > The second issue is that if we spill consumer 1 in first time spilling. After > a while, consumer 1 now uses 5MB. Then consumer 4 may acquire some memory and > spilling is needed again. Because we iterate the memory consumers in the same > order, we will spill consumer 1 again. So for consumer 1, we will produce > many small spilling files. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org