[GitHub] [spark] aokolnychyi commented on a change in pull request #29066: [WIP][SPARK-23889] DataSourceV2: required sorting and clustering for writes

GitBox Tue, 24 Nov 2020 18:35:33 -0800


aokolnychyi commented on a change in pull request #29066:
URL: https://github.com/apache/spark/pull/29066#discussion_r530069698




##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -186,6 +186,13 @@ abstract class Optimizer(catalogManager: CatalogManager)
     // plan may contain nodes that do not report stats. Anything that uses 
stats must run after
     // this batch.
     Batch("Early Filter and Projection Push-Down", Once, 
earlyScanPushDownRules: _*) :+
+    // This batch contains rules that should be applied to writes early. For 
example,
+    // we have to construct a logical write early so that we can inject needed 
repartition/sort
+    // operators to satisfy data source distribution and ordering requirements.
+    // Expression optimizations must be run before this batch so that we have 
optimal
+    // expressions when we construct writes. At the same time, rules that 
dedup repartition and
+    // sort operators must by run afterwards.
+    Batch("Early Writes", Once, earlyWriteRules: _*) :+

Review comment:
       Fixed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] aokolnychyi commented on a change in pull request #29066: [WIP][SPARK-23889] DataSourceV2: required sorting and clustering for writes

Reply via email to