Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/14619 @cloud-fan I've moved the `InsertRelationScanner` rule to `Analyzer`, after relations and expressions are resolved. To reuse analyze and optimize rules, I updated relative rules such as `CleanupAliases`ã `ColumnPruning`ã `PushDownPredicate`ã `InferFiltersFromConstraints`ã `ConvertToLocalRelation`ã `PropagateEmptyRelation`, I also added new rules to combine and prune `Scanner` operators. Besides, I made some change in subquery related rules and recently found they have been refactored. Now that only a few of test cases is still failing, which should be easy to fix. But, I realized adding a wrapper node over every relation maybe not a idea that is perfect enough for the following reasons: Firstly, scan a relation is not among basic operators in SQL language, when we declare a relation, we imply it should be scanned, so It seems semantically duplicate to declare a `Scanner` node over a relation or calling `relation.scanner()`. Besides, to add this wrapper node, we have to make a new assumption that no other operators should be inserted between `Scanner` and its corresponding relation, this brought in more complexity. Secondly, a wrapper node should contain the output, predicates that can be used in partition pruning, and a relation to be scanned. But this may cause complex situation in some cases, for example, in `InferFiltersFromConstraints`, we have to covert expression in filters to alias name when we collect valid constraints, because output maybe alias and filters have to use child expression, this behavor is not needed in other operators. At last, I feel adding such a operator have caused too many changes, perhaps we should make some improvement on `PhysicalOperation`, until we figure out a way comprehensively better than current method. After all, I'm passionate to this improvement and will try my best to contribute, please correct me if I'm wrong, thank you!
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org