Re[10]: Spark 2.0: SQL runs 5x times slower when adding 29th field to aggregation.

2016-09-07 Thread Сергей Романов
006 >64 0.691393136978 >65 0.690823078156 >66 0.70525097847 >67 0.724694013596 >68 0.737638950348 >69 0.749594926834 > > >Yong > >---------- >From: Davies Liu < dav...@databricks.com > >Sent: Tuesday, September 6, 2016 2:27 PM >To: Сергей Рома

Re[8]: Spark 2.0: SQL runs 5x times slower when adding 29th field to aggregation.

2016-09-05 Thread Сергей Романов
    "id" : 492,     "name" : "internal.metrics.shuffle.write.writeTime",     "value" : "371883"   }, { Full metrics in attachment. >Суббота, 3 сентября 2016, 19:53 +03:00 от Gavin Yue <yue.yuany...@gmail.com>: > >Any shuffling?  > &

Re[7]: Spark 2.0: SQL runs 5x times slower when adding 29th field to aggregation.

2016-09-03 Thread Сергей Романов
;Суббота, 3 сентября 2016, 15:50 +03:00 от Сергей Романов ><romano...@inbox.ru.INVALID>: > >Same problem happens with CSV data file, so it's not parquet-related either. > >Welcome to >    __ > / __/__  ___ _/ /__ >    _\ \/ _ \

Re[6]: Spark 2.0: SQL runs 5x times slower when adding 29th field to aggregation.

2016-09-03 Thread Сергей Романов
62 3.43532109261 63 3.07742786407 64 3.03904604912 65 3.01616096497 66 3.06293702126 67 3.09386610985 68 3.27610206604 69 3.2041969299 Суббота, 3 сентября 2016, 15:40 +03:00 от Сергей Романов <romano...@inbox.ru.INVALID>: > >Hi, >I had narrowed down my problem to a very simple cas

Re[5]: Spark 2.0: SQL runs 5x times slower when adding 29th field to aggregation.

2016-09-03 Thread Сергей Романов
Hi, I had narrowed down my problem to a very simple case. I'm sending 27kb parquet in attachment. (file:///data/dump/test2 in example) Please, can you take a look at it? Why there is performance drop after 57 sum columns? Welcome to     __ / __/__  ___ _/ /__     _\

Re[4]: Spark 2.0: SQL runs 5x times slower when adding 29th field to aggregation.

2016-09-03 Thread Сергей Романов
rdpress.com > >Disclaimer:  Use it at your own risk. Any and all responsibility for any loss, >damage or destruction of data or any other property which may arise from relying on this email's  technical content is explicitly disclaimed. The author will in no case be liable for any monetary

Re[2]: Spark 2.0: SQL runs 5x times slower when adding 29th field to aggregation.

2016-09-02 Thread Сергей Романов
l content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. >  >On 1 September 2016 at 16:55, Сергей Романов < romano...@inbox.ru.invalid > >wrote: >>Hi, >> >>When I run a query like

Spark 2.0: SQL runs 5x times slower when adding 29th field to aggregation.

2016-09-01 Thread Сергей Романов
Hi, When I run a query like "SELECT field, SUM(x1), SUM(x2)... SUM(x28) FROM parquet_table WHERE partition = 1 GROUP BY field" it runs in under 2 seconds, but when I add just one more aggregate field to the query "SELECT field, SUM(x1), SUM(x2)... SUM(x28), SUM(x29) FROM parquet_table WHERE