Andrew Ash created SPARK-3211: --------------------------------- Summary: .take() is OOM-prone when there are empty partitions Key: SPARK-3211 URL: https://issues.apache.org/jira/browse/SPARK-3211 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.2 Reporter: Andrew Ash
Filed on dev@ on 22 August by [~pnepywoda]: {quote} On line 777 https://github.com/apache/spark/commit/42571d30d0d518e69eecf468075e4c5a823a2ae8#diff-1d55e54678eff2076263f2fe36150c17R771 the logic for take() reads ALL partitions if the first one (or first k) are empty. This has actually lead to OOMs when we had many partitions (thousands) and unfortunately the first one was empty. Wouldn't a better implementation strategy be numPartsToTry = partsScanned * 2 instead of numPartsToTry = totalParts - 1 (this doubling is similar to most memory allocation strategies) Thanks! - Paul {quote} -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org