[ https://issues.apache.org/jira/browse/SPARK-5319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen updated SPARK-5319: ----------------------------- Component/s: Spark Core > Choosing partition size instead of count > ---------------------------------------- > > Key: SPARK-5319 > URL: https://issues.apache.org/jira/browse/SPARK-5319 > Project: Spark > Issue Type: Brainstorming > Components: Spark Core > Reporter: Idan Zalzberg > > With the current API, there are multiple locations when you can set the > partition count when reading from sources. > However IME, it is sometimes useful to set the partition size (in MB), and > infer the count from that. > IME, spark is sensitive to the partition size, if they are too big, it raises > the amount of memory needed per core, and if they are too small then the > stage times increase significantly, so I'd like to stay in the "sweet spot" > of the partition size, without trying to change the partition count around > until I find it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org