I have some trouble to control number of spark tasks for a stage. This on
latest spark 1.3.x source code build.
val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
sc.getConf.get(spark.default.parallelism) - setup to 10
val t1 = hiveContext.sql(FROM SalesJan2009 select * )
val t2 =
Hi Neal,
spark.sql.shuffle.partitions is the property to control the number of tasks
after shuffle (to generate t2, there is a shuffle for the aggregations
specified by groupBy and agg.) You can use
sqlContext.setConf(spark.sql.shuffle.partitions, newNumber) or
sqlContext.sql(set