Andrew Ash created SPARK-3211:
---------------------------------

             Summary: .take() is OOM-prone when there are empty partitions
                 Key: SPARK-3211
                 URL: https://issues.apache.org/jira/browse/SPARK-3211
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 1.0.2
            Reporter: Andrew Ash


Filed on dev@ on 22 August by [~pnepywoda]:

{quote}
On line 777
https://github.com/apache/spark/commit/42571d30d0d518e69eecf468075e4c5a823a2ae8#diff-1d55e54678eff2076263f2fe36150c17R771
the logic for take() reads ALL partitions if the first one (or first k) are
empty. This has actually lead to OOMs when we had many partitions
(thousands) and unfortunately the first one was empty.

Wouldn't a better implementation strategy be

numPartsToTry = partsScanned * 2

instead of

numPartsToTry = totalParts - 1

(this doubling is similar to most memory allocation strategies)

Thanks!
- Paul
{quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to