[GitHub] spark pull request #19451: SPARK-22181 Adds ReplaceExceptWithNotFilter rule

sathiyapk Mon, 23 Oct 2017 13:30:25 -0700

Github user sathiyapk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19451#discussion_r146384601
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceExceptWithFilter.scala
 ---
    @@ -0,0 +1,114 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.optimizer
    +
    +import scala.annotation.tailrec
    +
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.catalyst.plans.logical._
    +import org.apache.spark.sql.catalyst.rules.Rule
    +
    +
    +/**
    + * If one or both of the datasets in the logical [[Except]] operator are 
purely transformed using
    + * [[Filter]], this rule will replace logical [[Except]] operator with a 
[[Filter]] operator by
    + * flipping the filter condition of the right child.
    + * {{{
    + *   SELECT a1, a2 FROM Tab1 WHERE a2 = 12 EXCEPT SELECT a1, a2 FROM Tab1 
WHERE a1 = 5
    + *   ==>  SELECT DISTINCT a1, a2 FROM Tab1 WHERE a2 = 12 AND (a1 is null 
OR a1 <> 5)
    + * }}}
    + *
    + * Note:
    + * Before flipping the filter condition of the right node, we should:
    + * 1. Combine all it's [[Filter]].
    + * 2. Apply InferFiltersFromConstraints rule (to support NULL values of 
the condition).
    + */
    +object ReplaceExceptWithFilter extends Rule[LogicalPlan] {
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    --- End diff --
    
    You mean to add something like below in the catalyst 
`org.apache.spark.sql.internal.SQLConf` ?
    
    ```scala
    val REPLACE_EXCEPT_WITH_FILTER = 
buildConf("spark.sql.optimizer.replaceExceptWithFilter")
    .doc("When true, the apply function of the rule verifies, whether the right 
node of the except operation is of type Project followed Filter or Filter. If 
yes, the rule further verifies 1) Excluding the filter operations from the 
right (as well as the left node, if any) on the top, whether both the nodes 
evaluates to a same result by using the same project list if the top node is of 
type project, otherwise node.output. Remember a project list may contain a 
SubqueryExpression, so it is necessary to check it using projection 2) The 
filter condition don't contains any SubqueryExpression. If all the conditions 
are met, the rule will replace the except operation with a Filter operation by 
flipping the filter condition of the right node.")
    .booleanConf
    .createWithDefault(true)
    ```
    and
    ```scala
    def replaceExceptWithFilter: Boolean = getConf(REPLACE_EXCEPT_WITH_FILTER)
    ```



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19451: SPARK-22181 Adds ReplaceExceptWithNotFilter rule

Reply via email to