Re: Spark SQL Query Plan optimization

2014-08-02 Thread Michael Armbrust
The number of partitions (which decides the number of tasks) is fixed after any shuffle and can be configured using 'spark.sql.shuffle.partitions' though SQLConf (i.e. sqlContext.set(...) or SET spark.sql.shuffle.partitions=... in sql) It is possible we will auto select this based on statistics

Spark SQL Query Plan optimization

2014-08-01 Thread N . Venkata Naga Ravi
Hi, I am trying to understand the query plan and number of tasks /execution time created for joined query. Consider following example , creating two tables emp, sal with appropriate 100 records in each table with key for joining them. EmpRDDRelation.scala case class EmpRecord(key: