[GitHub] spark issue #14619: [SPARK-17031][SQL] Add `Scanner` operator to wrap the op...

jiangxb1987 Sun, 28 Aug 2016 08:25:23 -0700

Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/14619
  
    @cloud-fan I've moved the `InsertRelationScanner` rule to `Analyzer`, after 
relations and expressions are resolved. To reuse analyze and optimize rules, I 
updated relative rules such as `CleanupAliases`ã `ColumnPruning`ã 
`PushDownPredicate`ã `InferFiltersFromConstraints`ã 
`ConvertToLocalRelation`ã `PropagateEmptyRelation`, I also added new rules to 
combine and prune `Scanner` operators. Besides, I made some change in subquery 
related rules and recently found they have been refactored.
    Now that only a few of test cases is still failing, which should be easy to 
fix. But, I realized adding a wrapper node over every relation maybe not a idea 
that is perfect enough for the following reasons:
    Firstly, scan a relation is not among basic operators in SQL language, when 
we declare a relation, we imply it should be scanned, so It seems semantically 
duplicate to declare a `Scanner` node over a relation or calling 
`relation.scanner()`. Besides, to add this wrapper node, we have to make a new 
assumption that no other operators should be inserted between `Scanner` and its 
corresponding relation, this brought in more complexity.
    Secondly, a wrapper node should contain the output, predicates that can be 
used in partition pruning, and a relation to be scanned. But this may cause 
complex situation in some cases, for example, in `InferFiltersFromConstraints`, 
we have to covert expression in filters to alias name when we collect valid 
constraints, because output maybe alias and filters have to use child 
expression, this behavor is not needed in other operators.
    At last, I feel adding such a operator have caused too many changes, 
perhaps we should make some improvement on `PhysicalOperation`, until we figure 
out a way comprehensively better than current method.
    
    After all, I'm passionate to this improvement and will try my best to 
contribute, please correct me if I'm wrong, thank you!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14619: [SPARK-17031][SQL] Add `Scanner` operator to wrap the op...

Reply via email to