Yong
From: Davies Liu <dav...@databricks.com>
Sent: Tuesday, September 6, 2016 2:27 PM
To: Сергей Романов
Cc: Gavin Yue; Mich Talebzadeh; user
Subject: Re: Re[8]: Spark 2.0: SQL runs 5x times slower when adding 29th field
to aggregation.
I think the sl
I think the slowness is caused by generated aggregate method has more
than 8K bytecodes, than it's not JIT compiled, became much slower.
Could you try to disable the DontCompileHugeMethods by:
-XX:-DontCompileHugeMethods
On Mon, Sep 5, 2016 at 4:21 AM, Сергей Романов
Hi, Gavin,
Shuffling is exactly the same in both requests and is minimal. Both requests
produces one shuffle task. Running time is the only difference I can see in
metrics:
timeit.timeit(spark.read.csv('file:///data/dump/test_csv',
schema=schema).groupBy().sum(*(['dd_convs'] * 57) ).collect,