[GitHub] spark issue #14847: [SPARK-17254][SQL] Add StopAfter physical plan for the f...

ioana-delaney Wed, 19 Oct 2016 14:13:57 -0700

Github user ioana-delaney commented on the issue:

    https://github.com/apache/spark/pull/14847
  
    @viirya Hi Simon, I have some general comments/questions:
    1. It will help to include in the design document some example queries 
together with their corresponding optimized + physical plans. 
    2. I looked at the new test suite and all the examples are simple examples 
using DFs. Can we include some more complex examples? Maybe some examples 
referencing the bucketed tables that you reference in this PR. Another example 
using some more complex SQL e.g. select * from (select t1.c1 + t2.c1 as col1 
from t1, t2 where t1.c1 = t2.c1 order by col1) v1 where v1.col1 < 10
    ```SQL
    == Physical Plan ==
    *Sort [col1#2 ASC NULLS FIRST], true, 0
    +- Exchange rangepartitioning(col1#2 ASC NULLS FIRST, 5)
       +- *Project [(c1#12 + c1#15) AS col1#2]
          +- *SortMergeJoin [c1#12], [c1#15], Inner, ((c1#12 + c1#15) < 10)
             :- *Sort [c1#12 ASC NULLS FIRST], false, 0
             :  +- Exchange hashpartitioning(c1#12, 5)
             :     +- HiveTableScan [c1#12], MetastoreRelation default, t1
             +- *Sort [c1#15 ASC NULLS FIRST], false, 0
                +- Exchange hashpartitioning(c1#15, 5)
                   +- HiveTableScan [c1#15], MetastoreRelation default, t2
    ```
    
    3.In the above plan, is the local predicate c1#12 + c1#15 < 10 applied 
after the join, or as part of the join? i.e. does it stop the join execution 
once the values are outside the range? 
    
    4.An observation is that besides the above SQL query, which might already 
work today, I couldnât easily find other SQL examples that would benefit from 
this optimization since most of the predicates are pushed down to base tables, 
which in general are not ordered. I can think of some Having predicates, but 
they probably don't qualify.
    
    5. Your changes look very general. Are you also supporting Filters over 
Bucketed tables, which you reference in this PR? 
    
    6. Do Bucketed tables allow inserts/appends? How will that work with your 
optimization? 
    
    Thanks




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14847: [SPARK-17254][SQL] Add StopAfter physical plan for the f...

Reply via email to