GitHub user wzhfy opened a pull request:

    https://github.com/apache/spark/pull/17138

    [SPARK-17080] [SQL] join reorder

    ## What changes were proposed in this pull request?
    
    Reorder the joins using a dynamic programming algorithm (Selinger paper):
    First we put all items (basic joined nodes) into level 1, then we build all 
two-way joins at level 2 from plans at level 1 (single items), then build all 
3-way joins from plans at previous levels (two-way joins and single items), 
then 4-way joins ... etc, until we build all n-way joins and pick the best plan 
among them.
    
    When building m-way joins, we only keep the best plan (with the lowest 
cost) for the same set of m items. E.g., for 3-way joins, we keep only the best 
plan for items {A, B, C} among plans (A J B) J C, (A J C) J B and (B J C) J A. 
Thus, the plans maintained for each level when reordering four items A, B, C, D 
are as follows:
    ```
    level 1: p({A}), p({B}), p({C}), p({D})
    level 2: p({A, B}), p({A, C}), p({A, D}), p({B, C}), p({B, D}), p({C, D})
    level 3: p({A, B, C}), p({A, B, D}), p({A, C, D}), p({B, C, D})
    level 4: p({A, B, C, D})
    ```
    where p({A, B, C, D}) is the final output plan.
    
    For cost evaluation, since physical costs for operators are not available 
currently, we use cardinalities and sizes to compute costs.
    
    ## How was this patch tested?
    add test cases


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/wzhfy/spark joinReorder

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17138.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17138
    
----
commit 4682da4e20327bcf78f979061b9e4366dda25363
Author: wangzhenhua <wangzhen...@huawei.com>
Date:   2017-03-01T08:45:13Z

    join reorder

commit f8b19a81a6a5451150afa618488307c057bde861
Author: wangzhenhua <wangzhen...@huawei.com>
Date:   2017-03-02T14:17:43Z

    add test cases

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to