Idan Zalzberg created SPARK-5319:
------------------------------------

             Summary: Choosing partition size instead of count
                 Key: SPARK-5319
                 URL: https://issues.apache.org/jira/browse/SPARK-5319
             Project: Spark
          Issue Type: Brainstorming
            Reporter: Idan Zalzberg


With the current API, there are multiple locations when you can set the 
partition count when reading from sources.

However IME, it is sometimes useful to set the partition size (in MB), and 
infer the count from that. 
IME, spark is sensitive to the partition size, if they are too big, it raises 
the amount of memory needed per core, and if they are too small then the stage 
times increase significantly, so I'd like to stay in the "sweet spot" of the 
partition size, without trying to change the partition count around until I 
find it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to