Re: Re[8]: Spark 2.0: SQL runs 5x times slower when adding 29th field to aggregation.

2016-09-06 Thread Yong Zhang
Yong From: Davies Liu <dav...@databricks.com> Sent: Tuesday, September 6, 2016 2:27 PM To: Сергей Романов Cc: Gavin Yue; Mich Talebzadeh; user Subject: Re: Re[8]: Spark 2.0: SQL runs 5x times slower when adding 29th field to aggregation. I think the sl

Re: Re[8]: Spark 2.0: SQL runs 5x times slower when adding 29th field to aggregation.

2016-09-06 Thread Davies Liu
I think the slowness is caused by generated aggregate method has more than 8K bytecodes, than it's not JIT compiled, became much slower. Could you try to disable the DontCompileHugeMethods by: -XX:-DontCompileHugeMethods On Mon, Sep 5, 2016 at 4:21 AM, Сергей Романов

Re[8]: Spark 2.0: SQL runs 5x times slower when adding 29th field to aggregation.

2016-09-05 Thread Сергей Романов
Hi, Gavin, Shuffling is exactly the same in both requests and is minimal. Both requests produces one shuffle task. Running time is the only difference I can see in metrics: timeit.timeit(spark.read.csv('file:///data/dump/test_csv', schema=schema).groupBy().sum(*(['dd_convs'] * 57) ).collect,