[ https://issues.apache.org/jira/browse/SPARK-23964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16434479#comment-16434479 ]
Thomas Graves commented on SPARK-23964: --------------------------------------- I'm not sure, I'm trying to figure out if there is a performance implications here and perhaps there are but its at the cost of not being accurate on memory usage. In the deployments with fixed sized containers this is very important. if you wait 32 elements it may cause you to acquire a bigger chunk of memory at once vs getting smaller allocations (thus more). I would think the only check you need is: currentMemory >= myMemoryThreshold, the initial threshold is 5MB right now but all its doing is asking for more memory, only when it can't get memory does it spill. And the initial threshold is configurable so you can always make it bigger. I'm going to try to do some performance tests to see what happens but would like to know if anyone has other background. > why does Spillable wait for 32 elements? > ---------------------------------------- > > Key: SPARK-23964 > URL: https://issues.apache.org/jira/browse/SPARK-23964 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.2.1 > Reporter: Thomas Graves > Priority: Major > > The spillable class has a check in maybeSpill as to when it tries to acquire > more memory and determine if it should spill: > if (elementsRead % 32 == 0 && currentMemory >= myMemoryThreshold) { > Before it looks to see if it should spill. > [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/collection/Spillable.scala#L83] > I'm wondering why it has the elementsRead %32 in it? If I have a small > number of elements that are huge this can easily cause OOM before we actually > spill. > I saw a few conversations on this and one Jira related: > https://issues.apache.org/jira/browse/SPARK-4456 . but I've never seen an > answer to this. > anyone have history on this? -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org