Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21498#discussion_r193263354 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -1099,6 +1099,17 @@ object SQLConf { .intConf .createWithDefault(SHUFFLE_SPILL_NUM_ELEMENTS_FORCE_SPILL_THRESHOLD.defaultValue.get) + val UNION_IN_SAME_PARTITION = + buildConf("spark.sql.unionInSamePartition") + .internal() + .doc("When true, Union operator will union children results in the same corresponding " + + "partitions if they have same partitioning. This eliminates unnecessary shuffle in later " + + "operators like aggregation. Note that because non-deterministic functions such as " + + "monotonically_increasing_id are depended on partition id. By doing this, the values of " + --- End diff -- Seems we have wanted to make sure non-deterministic functions have same values after union. Once we union children in same partitions, the values of such functions can be changed. So I added this config to control it. Default config is false.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org