Idan Zalzberg created SPARK-5319: ------------------------------------ Summary: Choosing partition size instead of count Key: SPARK-5319 URL: https://issues.apache.org/jira/browse/SPARK-5319 Project: Spark Issue Type: Brainstorming Reporter: Idan Zalzberg
With the current API, there are multiple locations when you can set the partition count when reading from sources. However IME, it is sometimes useful to set the partition size (in MB), and infer the count from that. IME, spark is sensitive to the partition size, if they are too big, it raises the amount of memory needed per core, and if they are too small then the stage times increase significantly, so I'd like to stay in the "sweet spot" of the partition size, without trying to change the partition count around until I find it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org