How are you specifying it, as an option to spark-submit ? On Sat, Sep 16, 2017 at 12:26 PM, Akhil Das <ak...@hacked.work> wrote:
> spark.sql.shuffle.partitions is still used I believe. I can see it in the > code > <https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L191> > and > in the documentation page > <https://spark.apache.org/docs/latest/sql-programming-guide.html#other-configuration-options> > . > > On Wed, Sep 13, 2017 at 4:46 AM, peay <p...@protonmail.com> wrote: > >> Hello, >> >> I am running unit tests with Spark DataFrames, and I am looking for >> configuration tweaks that would make tests faster. Usually, I use a >> local[2] or local[4] master. >> >> Something that has been bothering me is that most of my stages end up >> using 200 partitions, independently of whether I repartition the input. >> This seems a bit overkill for small unit tests that barely have 200 rows >> per DataFrame. >> >> spark.sql.shuffle.partitions used to control this I believe, but it seems >> to be gone and I could not find any information on what mechanism/setting >> replaces it or the corresponding JIRA. >> >> Has anyone experience to share on how to tune Spark best for very small >> local runs like that? >> >> Thanks! >> >> > > > -- > Cheers! > > -- http://www.femibyte.com/twiki5/bin/view/Tech/ http://www.nextmatrix.com "Great spirits have always encountered violent opposition from mediocre minds." - Albert Einstein.