[GitHub] spark pull request #17546: [SPARK-20233] [SQL] Apply star-join filter heuris...

wzhfy Tue, 11 Apr 2017 00:00:03 -0700

Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17546#discussion_r110826959
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
 ---
    @@ -218,28 +220,48 @@ object JoinReorderDP extends PredicateHelper with 
Logging {
       }
     
       /**
    -   * Builds a new JoinPlan when both conditions hold:
    +   * Builds a new JoinPlan if the following conditions hold:
        * - the sets of items contained in left and right sides do not overlap.
        * - there exists at least one join condition involving references from 
both sides.
    +   * - if star-join filter is enabled, allow the following combinations:
    +   *         1) (oneJoinPlan U otherJoinPlan) is a subset of star-join
    +   *         2) star-join is a subset of (oneJoinPlan U otherJoinPlan)
    +   *         3) (oneJoinPlan U otherJoinPlan) is a subset of non star-join
    +   *
        * @param oneJoinPlan One side JoinPlan for building a new JoinPlan.
        * @param otherJoinPlan The other side JoinPlan for building a new join 
node.
        * @param conf SQLConf for statistics computation.
        * @param conditions The overall set of join conditions.
        * @param topOutput The output attributes of the final plan.
    +   * @param filters Join graph info to be used as filters by the search 
algorithm.
        * @return Builds and returns a new JoinPlan if both conditions hold. 
Otherwise, returns None.
        */
       private def buildJoin(
           oneJoinPlan: JoinPlan,
           otherJoinPlan: JoinPlan,
           conf: SQLConf,
           conditions: Set[Expression],
    -      topOutput: AttributeSet): Option[JoinPlan] = {
    +      topOutput: AttributeSet,
    +      filters: Option[JoinGraphInfo]): Option[JoinPlan] = {
     
         if (oneJoinPlan.itemIds.intersect(otherJoinPlan.itemIds).nonEmpty) {
           // Should not join two overlapping item sets.
           return None
         }
     
    +    if (conf.joinReorderDPStarFilter && filters.isDefined) {
    +      // Apply star-join filter, which ensures that tables in a star 
schema relationship
    +      // are planned together. The star-filter will eliminate joins among 
star and non-star
    +      // tables until the star joins are built. The following combinations 
are allowed:
    +      // 1. (oneJoinPlan U otherJoinPlan) is a subset of star-join
    +      // 2. star-join is a subset of (oneJoinPlan U otherJoinPlan)
    +      // 3. (oneJoinPlan U otherJoinPlan) is a subset of non star-join
    +      val isValidJoinCombination =
    +        JoinReorderDPFilters(conf).starJoinFilter(oneJoinPlan.itemIds, 
otherJoinPlan.itemIds,
    --- End diff --
    
    This will create a new `JoinReorderDPFilters` instance every time we try to 
build a join node.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17546: [SPARK-20233] [SQL] Apply star-join filter heuris...

Reply via email to