Shark used to have shark.map.tasks variable. Is there an equivalent for Spark SQL?
We are trying a scenario with heavily partitioned Hive tables. We end up with a UnionRDD with a lot of partitions underneath and hence too many tasks: https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala#L202 is there a good way to tell SQL to coalesce these? thanks for any pointers