Re: Configuration for unit testing and sql.shuffle.partitions

Femi Anthony Sat, 16 Sep 2017 11:55:29 -0700

How are you specifying it, as an option to spark-submit ?

On Sat, Sep 16, 2017 at 12:26 PM, Akhil Das <ak...@hacked.work> wrote:


> spark.sql.shuffle.partitions is still used I believe. I can see it in the
> code
> <https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L191>
>  and
> in the documentation page
> <https://spark.apache.org/docs/latest/sql-programming-guide.html#other-configuration-options>
> .
>
> On Wed, Sep 13, 2017 at 4:46 AM, peay <p...@protonmail.com> wrote:
>
>> Hello,
>>
>> I am running unit tests with Spark DataFrames, and I am looking for
>> configuration tweaks that would make tests faster. Usually, I use a
>> local[2] or local[4] master.
>>
>> Something that has been bothering me is that most of my stages end up
>> using 200 partitions, independently of whether I repartition the input.
>> This seems a bit overkill for small unit tests that barely have 200 rows
>> per DataFrame.
>>
>> spark.sql.shuffle.partitions used to control this I believe, but it seems
>> to be gone and I could not find any information on what mechanism/setting
>> replaces it or the corresponding JIRA.
>>
>> Has anyone experience to share on how to tune Spark best for very small
>> local runs like that?
>>
>> Thanks!
>>
>>
>
>
> --
> Cheers!
>
>


-- 
http://www.femibyte.com/twiki5/bin/view/Tech/
http://www.nextmatrix.com
"Great spirits have always encountered violent opposition from mediocre
minds." - Albert Einstein.

Re: Configuration for unit testing and sql.shuffle.partitions

Reply via email to