Hi I have the following group by query which I tried to use it both using DataFrame and hiveContext.sql() but both shuffles huge data and is slow. I have around 8 fields passed in as group by fields
sourceFrame.select("blabla").groupby("col1","col2","col3",..."col8").agg("bla bla"); OR hiveContext.sql("insert into table partitions bla bla group by "col1","col2","col3",..."col8""); I have tried almost all tuning parameters like tungsten,lz4 shuffle, more shuffle.storage around 6.0 I am using Spark 1.4.0 please guide thanks in advance. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-tune-unavoidable-group-by-query-tp25001.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org