Hello,

I am running unit tests with Spark DataFrames, and I am looking for 
configuration tweaks that would make tests faster. Usually, I use a local[2] or 
local[4] master.

Something that has been bothering me is that most of my stages end up using 200 
partitions, independently of whether I repartition the input. This seems a bit 
overkill for small unit tests that barely have 200 rows per DataFrame.

spark.sql.shuffle.partitions used to control this I believe, but it seems to be 
gone and I could not find any information on what mechanism/setting replaces it 
or the corresponding JIRA.

Has anyone experience to share on how to tune Spark best for very small local 
runs like that?

Thanks!

Reply via email to