[GitHub] [spark] wangyum commented on a change in pull request #33404: [SPARK-36194][SQL] Remove the aggregation from left semi/anti join if the same aggregation has already been done on left side

2022-02-24 Thread GitBox


wangyum commented on a change in pull request #33404:
URL: https://github.com/apache/spark/pull/33404#discussion_r813940508



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanDistinctKeys.scala
##
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.plans.logical
+
+import org.apache.spark.sql.catalyst.expressions.{Alias, AttributeSet, 
ExpressionSet, NamedExpression}
+
+
+trait QueryPlanDistinctKeys {

Review comment:
   New pull request: https://github.com/apache/spark/pull/35651




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum commented on a change in pull request #33404: [SPARK-36194][SQL] Remove the aggregation from left semi/anti join if the same aggregation has already been done on left side

2022-02-23 Thread GitBox


wangyum commented on a change in pull request #33404:
URL: https://github.com/apache/spark/pull/33404#discussion_r812727258



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/DistinctAttributesVisitor.scala
##
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.plans.logical
+
+import org.apache.spark.sql.catalyst.expressions.{Alias, AttributeSet, 
ExpressionSet, NamedExpression}
+import org.apache.spark.sql.catalyst.plans.{LeftAnti, LeftSemi}
+
+/**
+ * A visitor pattern for traversing a [[LogicalPlan]] tree and propagate the 
distinct attributes.
+ */
+object DistinctAttributesVisitor extends LogicalPlanVisitor[Seq[AttributeSet]] 
{
+
+  private def allDistinctAttributes(
+  expressions: Seq[NamedExpression],
+  distinctAttributes: Seq[ExpressionSet]): Seq[AttributeSet] = {
+expressions.toSet.subsets(distinctAttributes.map(_.size).min).filter { s =>
+  val references = s.map {
+case a: Alias => a.child
+case ne => ne
+  }
+  distinctAttributes.exists(_.equals(ExpressionSet(references)))
+}.map(s => AttributeSet(s.map(_.toAttribute))).toSeq
+  }
+
+  override def default(p: LogicalPlan): Seq[AttributeSet] = {
+Seq.empty[AttributeSet]
+  }
+
+  override def visitAggregate(p: Aggregate): Seq[AttributeSet] = {
+val groupingExps = ExpressionSet(p.groupingExpressions) // handle group by 
a, a

Review comment:
   Does the distinct attributes related to the child of distinct 
attributes? For example:
   ```sql
   create table t(a int, b int, c int, d int, e int) using parquet;
   select a, b, c, sum(d) from (select distinct * from t) t1 group by a, b, c;
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum commented on a change in pull request #33404: [SPARK-36194][SQL] Remove the aggregation from left semi/anti join if the same aggregation has already been done on left side

2022-02-23 Thread GitBox


wangyum commented on a change in pull request #33404:
URL: https://github.com/apache/spark/pull/33404#discussion_r812732998



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/DistinctAttributesVisitor.scala
##
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.plans.logical
+
+import org.apache.spark.sql.catalyst.expressions.{Alias, AttributeSet, 
ExpressionSet, NamedExpression}
+import org.apache.spark.sql.catalyst.plans.{LeftAnti, LeftSemi}
+
+/**
+ * A visitor pattern for traversing a [[LogicalPlan]] tree and propagate the 
distinct attributes.
+ */
+object DistinctAttributesVisitor extends LogicalPlanVisitor[Seq[AttributeSet]] 
{
+
+  private def allDistinctAttributes(
+  expressions: Seq[NamedExpression],
+  distinctAttributes: Seq[ExpressionSet]): Seq[AttributeSet] = {
+expressions.toSet.subsets(distinctAttributes.map(_.size).min).filter { s =>

Review comment:
   Add a filter before build subsets:
   ```scala
   val expressions = keys.flatMap(_.toSet)
   projectList.filter {
 case a: Alias => expressions.exists(_.semanticEquals(a.child))
 case ne: NamedExpression => expressions.exists(_.semanticEquals(ne))
   }.toSet.subsets(keys.map(_.size).min).filter { s =>
 val references = s.map {
   case a: Alias => a.child
   case ne => ne
 }
 keys.exists(_.equals(ExpressionSet(references)))
   }.map(s => AttributeSet(s.map(_.toAttribute))).toSet
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum commented on a change in pull request #33404: [SPARK-36194][SQL] Remove the aggregation from left semi/anti join if the same aggregation has already been done on left side

2022-02-23 Thread GitBox


wangyum commented on a change in pull request #33404:
URL: https://github.com/apache/spark/pull/33404#discussion_r812727258



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/DistinctAttributesVisitor.scala
##
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.plans.logical
+
+import org.apache.spark.sql.catalyst.expressions.{Alias, AttributeSet, 
ExpressionSet, NamedExpression}
+import org.apache.spark.sql.catalyst.plans.{LeftAnti, LeftSemi}
+
+/**
+ * A visitor pattern for traversing a [[LogicalPlan]] tree and propagate the 
distinct attributes.
+ */
+object DistinctAttributesVisitor extends LogicalPlanVisitor[Seq[AttributeSet]] 
{
+
+  private def allDistinctAttributes(
+  expressions: Seq[NamedExpression],
+  distinctAttributes: Seq[ExpressionSet]): Seq[AttributeSet] = {
+expressions.toSet.subsets(distinctAttributes.map(_.size).min).filter { s =>
+  val references = s.map {
+case a: Alias => a.child
+case ne => ne
+  }
+  distinctAttributes.exists(_.equals(ExpressionSet(references)))
+}.map(s => AttributeSet(s.map(_.toAttribute))).toSeq
+  }
+
+  override def default(p: LogicalPlan): Seq[AttributeSet] = {
+Seq.empty[AttributeSet]
+  }
+
+  override def visitAggregate(p: Aggregate): Seq[AttributeSet] = {
+val groupingExps = ExpressionSet(p.groupingExpressions) // handle group by 
a, a

Review comment:
   Does the distinct attributes related to the child of distinct 
attributes? For example:
   ```sql
   create table t(a int, b int, c int, d int, e int) using parquet;
   select a, b, c, sum(d) from (select distinct * from t) t1 group by a, b, c;
   ```

##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/DistinctAttributesVisitor.scala
##
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.plans.logical
+
+import org.apache.spark.sql.catalyst.expressions.{Alias, AttributeSet, 
ExpressionSet, NamedExpression}
+import org.apache.spark.sql.catalyst.plans.{LeftAnti, LeftSemi}
+
+/**
+ * A visitor pattern for traversing a [[LogicalPlan]] tree and propagate the 
distinct attributes.
+ */
+object DistinctAttributesVisitor extends LogicalPlanVisitor[Seq[AttributeSet]] 
{
+
+  private def allDistinctAttributes(

Review comment:
   +1




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum commented on a change in pull request #33404: [SPARK-36194][SQL] Remove the aggregation from left semi/anti join if the same aggregation has already been done on left side

2022-02-23 Thread GitBox


wangyum commented on a change in pull request #33404:
URL: https://github.com/apache/spark/pull/33404#discussion_r812722470



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/DistinctAttributesVisitor.scala
##
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.plans.logical
+
+import org.apache.spark.sql.catalyst.expressions.{Alias, AttributeSet, 
ExpressionSet, NamedExpression}
+import org.apache.spark.sql.catalyst.plans.{LeftAnti, LeftSemi}
+
+/**
+ * A visitor pattern for traversing a [[LogicalPlan]] tree and propagate the 
distinct attributes.
+ */
+object DistinctAttributesVisitor extends LogicalPlanVisitor[Seq[AttributeSet]] 
{
+
+  private def allDistinctAttributes(
+  expressions: Seq[NamedExpression],
+  distinctAttributes: Seq[ExpressionSet]): Seq[AttributeSet] = {
+expressions.toSet.subsets(distinctAttributes.map(_.size).min).filter { s =>
+  val references = s.map {
+case a: Alias => a.child
+case ne => ne
+  }
+  distinctAttributes.exists(_.equals(ExpressionSet(references)))
+}.map(s => AttributeSet(s.map(_.toAttribute))).toSeq
+  }
+
+  override def default(p: LogicalPlan): Seq[AttributeSet] = {
+Seq.empty[AttributeSet]
+  }
+
+  override def visitAggregate(p: Aggregate): Seq[AttributeSet] = {
+val groupingExps = ExpressionSet(p.groupingExpressions) // handle group by 
a, a

Review comment:
   Does the distinct attributes related to the child distinct attributes? 
For example:
   ```sql
   create table t(a int, b int, c int, d int, e int) using parquet;
   select a, b, c, sum(d) from (select distinct * from t) t1 group by a, b, c;
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum commented on a change in pull request #33404: [SPARK-36194][SQL] Remove the aggregation from left semi/anti join if the same aggregation has already been done on left side

2022-02-23 Thread GitBox


wangyum commented on a change in pull request #33404:
URL: https://github.com/apache/spark/pull/33404#discussion_r812719768



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/DistinctAttributesVisitor.scala
##
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.plans.logical
+
+import org.apache.spark.sql.catalyst.expressions.{Alias, AttributeSet, 
ExpressionSet, NamedExpression}
+import org.apache.spark.sql.catalyst.plans.{LeftAnti, LeftSemi}
+
+/**
+ * A visitor pattern for traversing a [[LogicalPlan]] tree and propagate the 
distinct attributes.
+ */
+object DistinctAttributesVisitor extends LogicalPlanVisitor[Seq[AttributeSet]] 
{

Review comment:
   + 1 to make it a trait.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum commented on a change in pull request #33404: [SPARK-36194][SQL] Remove the aggregation from left semi/anti join if the same aggregation has already been done on left side

2021-11-02 Thread GitBox


wangyum commented on a change in pull request #33404:
URL: https://github.com/apache/spark/pull/33404#discussion_r740310943



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/DistinctAttributesVisitor.scala
##
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.plans.logical
+
+import org.apache.spark.sql.catalyst.expressions.{Alias, ExpressionSet}
+import org.apache.spark.sql.catalyst.plans.{LeftAnti, LeftSemi}
+
+/**
+ * A visitor pattern for traversing a [[LogicalPlan]] tree and propagate the 
distinct attributes.
+ */
+object DistinctAttributesVisitor extends 
LogicalPlanVisitor[Set[ExpressionSet]] {
+  override def default(p: LogicalPlan): Set[ExpressionSet] = {
+Set.empty[ExpressionSet]
+  }
+
+  override def visitAggregate(p: Aggregate): Set[ExpressionSet] = {

Review comment:
   I updated it to the following code to support it. 
   ```scala
 override def visitAggregate(p: Aggregate): Set[ExpressionSet] = {
   val groupingExps = ExpressionSet(p.groupingExpressions) // handle group 
by a, a
   val aggExpressions = p.aggregateExpressions.filter {
 case _: Attribute | _: Alias => true
 case _ => false
   }
   
   aggExpressions.toSet.subsets(groupingExps.size).filter { s =>
 groupingExps.subsetOf(ExpressionSet(s.map {
   case a: Alias => a.child
   case o => o
 }))
   }.map(s => ExpressionSet(s.map(_.toAttribute))).toSet
 }
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum commented on a change in pull request #33404: [SPARK-36194][SQL] Remove the aggregation from left semi/anti join if the same aggregation has already been done on left side

2021-11-01 Thread GitBox


wangyum commented on a change in pull request #33404:
URL: https://github.com/apache/spark/pull/33404#discussion_r740310943



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/DistinctAttributesVisitor.scala
##
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.plans.logical
+
+import org.apache.spark.sql.catalyst.expressions.{Alias, ExpressionSet}
+import org.apache.spark.sql.catalyst.plans.{LeftAnti, LeftSemi}
+
+/**
+ * A visitor pattern for traversing a [[LogicalPlan]] tree and propagate the 
distinct attributes.
+ */
+object DistinctAttributesVisitor extends 
LogicalPlanVisitor[Set[ExpressionSet]] {
+  override def default(p: LogicalPlan): Set[ExpressionSet] = {
+Set.empty[ExpressionSet]
+  }
+
+  override def visitAggregate(p: Aggregate): Set[ExpressionSet] = {

Review comment:
   I updated it to the following code to support it. 
   ```scala
 override def visitAggregate(p: Aggregate): Set[ExpressionSet] = {
   val groupingExps = ExpressionSet(p.groupingExpressions) // handle group 
by a, a
   val aggExpressions = p.aggregateExpressions.filter {
 case _: Attribute | _: Alias => true
 case _ => false
   }
   
   aggExpressions.toSet.subsets(groupingExps.size).filter { s =>
 groupingExps.subsetOf(ExpressionSet(s.map {
   case a: Alias => a.child
   case o => o
 }))
   }.map(s => ExpressionSet(s.map(_.toAttribute))).toSet
 }
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum commented on a change in pull request #33404: [SPARK-36194][SQL] Remove the aggregation from left semi/anti join if the same aggregation has already been done on left side

2021-11-01 Thread GitBox


wangyum commented on a change in pull request #33404:
URL: https://github.com/apache/spark/pull/33404#discussion_r740285441



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlanDistinctAttributes.scala
##
@@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.plans.logical
+
+import org.apache.spark.sql.catalyst.expressions.ExpressionSet
+
+/**
+ * A trait to add distinct attributes to [[LogicalPlan]]. For example:
+ * {{{
+ *   SELECT a, a FROM Tab1 GROUP BY a, b
+ *   // returns a
+ * }}}
+ */
+trait LogicalPlanDistinctAttributes { self: LogicalPlan =>
+  def distinctAttributes: Set[ExpressionSet] = {

Review comment:
   Moved it to `DistinctKeyVisitor`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum commented on a change in pull request #33404: [SPARK-36194][SQL] Remove the aggregation from left semi/anti join if the same aggregation has already been done on left side

2021-11-01 Thread GitBox


wangyum commented on a change in pull request #33404:
URL: https://github.com/apache/spark/pull/33404#discussion_r740285032



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/DistinctAttributesVisitor.scala
##
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.plans.logical
+
+import org.apache.spark.sql.catalyst.expressions.{Alias, ExpressionSet}
+import org.apache.spark.sql.catalyst.plans.{LeftAnti, LeftSemi}
+
+/**
+ * A visitor pattern for traversing a [[LogicalPlan]] tree and propagate the 
distinct attributes.
+ */
+object DistinctAttributesVisitor extends 
LogicalPlanVisitor[Set[ExpressionSet]] {
+  override def default(p: LogicalPlan): Set[ExpressionSet] = {
+Set.empty[ExpressionSet]
+  }
+
+  override def visitAggregate(p: Aggregate): Set[ExpressionSet] = {
+val groupingExps = p.groupingExpressions.toSet
+p.aggregateExpressions.toSet.subsets(groupingExps.size).filter { s =>
+  s.map {
+case a: Alias => a.child
+case o => o
+  }.equals(groupingExps)
+}.map(s => ExpressionSet(s.map(_.toAttribute))).toSet
+  }
+
+  override def visitDistinct(p: Distinct): Set[ExpressionSet] = 
Set(ExpressionSet(p.output))
+
+  override def visitExcept(p: Except): Set[ExpressionSet] =
+if (!p.isAll) Set(ExpressionSet(p.output)) else default(p)
+
+  override def visitExpand(p: Expand): Set[ExpressionSet] = default(p)
+
+  override def visitFilter(p: Filter): Set[ExpressionSet] = 
p.child.distinctAttributes
+
+  override def visitGenerate(p: Generate): Set[ExpressionSet] = default(p)
+
+  override def visitGlobalLimit(p: GlobalLimit): Set[ExpressionSet] = 
default(p)
+
+  override def visitIntersect(p: Intersect): Set[ExpressionSet] = {
+if (!p.isAll) Set(ExpressionSet(p.output)) else default(p)
+  }
+
+  override def visitJoin(p: Join): Set[ExpressionSet] = {
+p.joinType match {
+  case LeftSemi | LeftAnti => p.left.distinctAttributes
+  case _ => default(p)
+}
+  }
+
+  override def visitLocalLimit(p: LocalLimit): Set[ExpressionSet] = default(p)
+
+  override def visitPivot(p: Pivot): Set[ExpressionSet] = default(p)
+
+  override def visitProject(p: Project): Set[ExpressionSet] = {
+if (p.child.distinctAttributes.nonEmpty) {
+  val childDistinctAttributes = p.child.distinctAttributes
+  
p.projectList.toSet.subsets(childDistinctAttributes.map(_.size).min).filter { s 
=>
+val exps = s.map {
+  case a: Alias => a.child
+  case o => o
+}
+childDistinctAttributes.exists(_.equals(ExpressionSet(exps)))
+  }.map(s => ExpressionSet(s.map(_.toAttribute))).toSet
+} else {
+  default(p)
+}
+  }
+
+  override def visitRepartition(p: Repartition): Set[ExpressionSet] = 
p.child.distinctAttributes
+
+  override def visitRepartitionByExpr(p: RepartitionByExpression): 
Set[ExpressionSet] =
+p.child.distinctAttributes
+
+  override def visitSample(p: Sample): Set[ExpressionSet] = default(p)

Review comment:
   Fixed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum commented on a change in pull request #33404: [SPARK-36194][SQL] Remove the aggregation from left semi/anti join if the same aggregation has already been done on left side

2021-11-01 Thread GitBox


wangyum commented on a change in pull request #33404:
URL: https://github.com/apache/spark/pull/33404#discussion_r740284813



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/DistinctAttributesVisitor.scala
##
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.plans.logical
+
+import org.apache.spark.sql.catalyst.expressions.{Alias, ExpressionSet}
+import org.apache.spark.sql.catalyst.plans.{LeftAnti, LeftSemi}
+
+/**
+ * A visitor pattern for traversing a [[LogicalPlan]] tree and propagate the 
distinct attributes.
+ */
+object DistinctAttributesVisitor extends 
LogicalPlanVisitor[Set[ExpressionSet]] {

Review comment:
   Fixed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum commented on a change in pull request #33404: [SPARK-36194][SQL] Remove the aggregation from left semi/anti join if the same aggregation has already been done on left side

2021-10-30 Thread GitBox


wangyum commented on a change in pull request #33404:
URL: https://github.com/apache/spark/pull/33404#discussion_r739663634



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/DistinctAttributesVisitor.scala
##
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.plans.logical
+
+import org.apache.spark.sql.catalyst.expressions.{Alias, ExpressionSet}
+import org.apache.spark.sql.catalyst.plans.{LeftAnti, LeftSemi}
+
+/**
+ * A visitor pattern for traversing a [[LogicalPlan]] tree and propagate the 
distinct attributes.
+ */
+object DistinctAttributesVisitor extends 
LogicalPlanVisitor[Set[ExpressionSet]] {
+  override def default(p: LogicalPlan): Set[ExpressionSet] = {
+Set.empty[ExpressionSet]
+  }
+
+  override def visitAggregate(p: Aggregate): Set[ExpressionSet] = {

Review comment:
   ```scala
   spark.sql("create table t1 (a int, b int, c int) using parquet")
   spark.sql("select a, b, a as e, b as f from t1 group by a, 
b").queryExecution.analyzed.distinctKeys
   ```
   For such a query, which distinct keys do you prefer?
   ```
   Set(ExpressionSet(e#0, f#1))
   ```
   or
   ```
   Set(ExpressionSet(a#2, b#3), ExpressionSet(a#2, f#1), ExpressionSet(b#3, 
e#0), ExpressionSet(e#0, f#1))
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum commented on a change in pull request #33404: [SPARK-36194][SQL] Remove the aggregation from left semi/anti join if the same aggregation has already been done on left side

2021-10-30 Thread GitBox


wangyum commented on a change in pull request #33404:
URL: https://github.com/apache/spark/pull/33404#discussion_r739663634



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/DistinctAttributesVisitor.scala
##
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.plans.logical
+
+import org.apache.spark.sql.catalyst.expressions.{Alias, ExpressionSet}
+import org.apache.spark.sql.catalyst.plans.{LeftAnti, LeftSemi}
+
+/**
+ * A visitor pattern for traversing a [[LogicalPlan]] tree and propagate the 
distinct attributes.
+ */
+object DistinctAttributesVisitor extends 
LogicalPlanVisitor[Set[ExpressionSet]] {
+  override def default(p: LogicalPlan): Set[ExpressionSet] = {
+Set.empty[ExpressionSet]
+  }
+
+  override def visitAggregate(p: Aggregate): Set[ExpressionSet] = {

Review comment:
   ```
   spark.sql("create table t1 (a int, b int, c int) using parquet")
   spark.sql("select a, b, a as e, b as f from t1 group by a, 
b").queryExecution.analyzed.distinctKeys
   ```
   For such a query, which distinct keys do you prefer?
   ```
   Set(ExpressionSet(e#0, f#1))
   ```
   or
   ```
   Set(ExpressionSet(a#2, b#3), ExpressionSet(a#2, f#1), ExpressionSet(b#3, 
e#0), ExpressionSet(e#0, f#1))
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum commented on a change in pull request #33404: [SPARK-36194][SQL] Remove the aggregation from left semi/anti join if the same aggregation has already been done on left side

2021-07-20 Thread GitBox


wangyum commented on a change in pull request #33404:
URL: https://github.com/apache/spark/pull/33404#discussion_r672388764



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RemoveRedundantAggregatesInLeftSemiAntiJoin.scala
##
@@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.plans.{LeftAnti, LeftSemi}
+import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, Join, 
LogicalPlan}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.catalyst.trees.TreePattern.{AGGREGATE, 
LEFT_SEMI_OR_ANTI_JOIN}
+
+/**
+ * Remove the redundant aggregation from left semi/anti join if the same 
aggregation has already
+ * been done on left side.
+ */
+object RemoveRedundantAggregatesInLeftSemiAntiJoin extends Rule[LogicalPlan] {

Review comment:
   How about just add this rule to `RemoveRedundantAggregates`?
   ```patch
   Index: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RemoveRedundantAggregates.scala
   IDEA additional info:
   Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
   <+>UTF-8
   ===
   diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RemoveRedundantAggregates.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RemoveRedundantAggregates.scala
   --- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RemoveRedundantAggregates.scala
(revision 86a828aa4e27676ec74dab28b1fb3db996f9c637)
   +++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RemoveRedundantAggregates.scala
(date 1626706341253)
   @@ -20,7 +20,8 @@
import org.apache.spark.sql.catalyst.analysis.PullOutNondeterministic
import org.apache.spark.sql.catalyst.expressions.{AliasHelper, AttributeSet}
import 
org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression
   -import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, LogicalPlan}
   +import org.apache.spark.sql.catalyst.plans.{LeftAnti, LeftSemi}
   +import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, Join, 
LogicalPlan}
import org.apache.spark.sql.catalyst.rules.Rule
import org.apache.spark.sql.catalyst.trees.TreePattern.AGGREGATE

   @@ -29,7 +30,7 @@
 * only goal is to keep distinct values, while its parent aggregate would 
ignore duplicate values.
 */
object RemoveRedundantAggregates extends Rule[LogicalPlan] with AliasHelper 
{
   -  def apply(plan: LogicalPlan): LogicalPlan = plan.transformUpWithPruning(
   +  def apply(plan: LogicalPlan): LogicalPlan = plan.transformDownWithPruning(
_.containsPattern(AGGREGATE), ruleId) {
case upper @ Aggregate(_, _, lower: Aggregate) if 
isLowerRedundant(upper, lower) =>
  val aliasMap = getAliasMap(lower)
   @@ -47,6 +48,13 @@
  } else {
newAggregate
  }
   +
   +case agg @ Aggregate(groupingExps, aggExps,
   +  j @ Join(left: Aggregate, _, LeftSemi | LeftAnti, _, _)) if 
agg.groupOnly && left.groupOnly &&
   +aggExps.forall(e => 
left.aggregateExpressions.exists(_.semanticEquals(e))) &&
   +groupingExps.length == left.groupingExpressions.length &&
   +groupingExps.zip(left.groupingExpressions).forall(e => 
e._1.semanticEquals(e._2)) =>
   +  j
  }

  private def isLowerRedundant(upper: Aggregate, lower: Aggregate): Boolean 
= {
   
   ```

##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RemoveRedundantAggregatesInLeftSemiAntiJoin.scala
##
@@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www

[GitHub] [spark] wangyum commented on a change in pull request #33404: [SPARK-36194][SQL] Remove the aggregation from left semi/anti join if the same aggregation has already been done on left side

2021-07-20 Thread GitBox


wangyum commented on a change in pull request #33404:
URL: https://github.com/apache/spark/pull/33404#discussion_r672388764



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RemoveRedundantAggregatesInLeftSemiAntiJoin.scala
##
@@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.plans.{LeftAnti, LeftSemi}
+import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, Join, 
LogicalPlan}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.catalyst.trees.TreePattern.{AGGREGATE, 
LEFT_SEMI_OR_ANTI_JOIN}
+
+/**
+ * Remove the redundant aggregation from left semi/anti join if the same 
aggregation has already
+ * been done on left side.
+ */
+object RemoveRedundantAggregatesInLeftSemiAntiJoin extends Rule[LogicalPlan] {

Review comment:
   How about just add this rule to `RemoveRedundantAggregates`?
   ```patch
   Index: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RemoveRedundantAggregates.scala
   IDEA additional info:
   Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
   <+>UTF-8
   ===
   diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RemoveRedundantAggregates.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RemoveRedundantAggregates.scala
   --- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RemoveRedundantAggregates.scala
(revision 86a828aa4e27676ec74dab28b1fb3db996f9c637)
   +++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RemoveRedundantAggregates.scala
(date 1626706341253)
   @@ -20,7 +20,8 @@
import org.apache.spark.sql.catalyst.analysis.PullOutNondeterministic
import org.apache.spark.sql.catalyst.expressions.{AliasHelper, AttributeSet}
import 
org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression
   -import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, LogicalPlan}
   +import org.apache.spark.sql.catalyst.plans.{LeftAnti, LeftSemi}
   +import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, Join, 
LogicalPlan}
import org.apache.spark.sql.catalyst.rules.Rule
import org.apache.spark.sql.catalyst.trees.TreePattern.AGGREGATE

   @@ -29,7 +30,7 @@
 * only goal is to keep distinct values, while its parent aggregate would 
ignore duplicate values.
 */
object RemoveRedundantAggregates extends Rule[LogicalPlan] with AliasHelper 
{
   -  def apply(plan: LogicalPlan): LogicalPlan = plan.transformUpWithPruning(
   +  def apply(plan: LogicalPlan): LogicalPlan = plan.transformDownWithPruning(
_.containsPattern(AGGREGATE), ruleId) {
case upper @ Aggregate(_, _, lower: Aggregate) if 
isLowerRedundant(upper, lower) =>
  val aliasMap = getAliasMap(lower)
   @@ -47,6 +48,13 @@
  } else {
newAggregate
  }
   +
   +case agg @ Aggregate(groupingExps, aggExps,
   +  j @ Join(left: Aggregate, _, LeftSemi | LeftAnti, _, _)) if 
agg.groupOnly && left.groupOnly &&
   +aggExps.forall(e => 
left.aggregateExpressions.exists(_.semanticEquals(e))) &&
   +groupingExps.length == left.groupingExpressions.length &&
   +groupingExps.zip(left.groupingExpressions).forall(e => 
e._1.semanticEquals(e._2)) =>
   +  j
  }

  private def isLowerRedundant(upper: Aggregate, lower: Aggregate): Boolean 
= {
   
   ```

##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RemoveRedundantAggregatesInLeftSemiAntiJoin.scala
##
@@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www

[GitHub] [spark] wangyum commented on a change in pull request #33404: [SPARK-36194][SQL] Remove the aggregation from left semi/anti join if the same aggregation has already been done on left side

2021-07-19 Thread GitBox


wangyum commented on a change in pull request #33404:
URL: https://github.com/apache/spark/pull/33404#discussion_r672388764



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RemoveRedundantAggregatesInLeftSemiAntiJoin.scala
##
@@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.plans.{LeftAnti, LeftSemi}
+import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, Join, 
LogicalPlan}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.catalyst.trees.TreePattern.{AGGREGATE, 
LEFT_SEMI_OR_ANTI_JOIN}
+
+/**
+ * Remove the redundant aggregation from left semi/anti join if the same 
aggregation has already
+ * been done on left side.
+ */
+object RemoveRedundantAggregatesInLeftSemiAntiJoin extends Rule[LogicalPlan] {

Review comment:
   How about just add this rule to `RemoveRedundantAggregates`? This is 
because `Distinct(Join(Distinct(left), right, LeftSemi/ LeftAnti, condition, 
None))` is a very common case.
   
   ```patch
   Index: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RemoveRedundantAggregates.scala
   IDEA additional info:
   Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
   <+>UTF-8
   ===
   diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RemoveRedundantAggregates.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RemoveRedundantAggregates.scala
   --- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RemoveRedundantAggregates.scala
(revision 86a828aa4e27676ec74dab28b1fb3db996f9c637)
   +++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RemoveRedundantAggregates.scala
(date 1626706341253)
   @@ -20,7 +20,8 @@
import org.apache.spark.sql.catalyst.analysis.PullOutNondeterministic
import org.apache.spark.sql.catalyst.expressions.{AliasHelper, AttributeSet}
import 
org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression
   -import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, LogicalPlan}
   +import org.apache.spark.sql.catalyst.plans.{LeftAnti, LeftSemi}
   +import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, Join, 
LogicalPlan}
import org.apache.spark.sql.catalyst.rules.Rule
import org.apache.spark.sql.catalyst.trees.TreePattern.AGGREGATE

   @@ -29,7 +30,7 @@
 * only goal is to keep distinct values, while its parent aggregate would 
ignore duplicate values.
 */
object RemoveRedundantAggregates extends Rule[LogicalPlan] with AliasHelper 
{
   -  def apply(plan: LogicalPlan): LogicalPlan = plan.transformUpWithPruning(
   +  def apply(plan: LogicalPlan): LogicalPlan = plan.transformDownWithPruning(
_.containsPattern(AGGREGATE), ruleId) {
case upper @ Aggregate(_, _, lower: Aggregate) if 
isLowerRedundant(upper, lower) =>
  val aliasMap = getAliasMap(lower)
   @@ -47,6 +48,13 @@
  } else {
newAggregate
  }
   +
   +case agg @ Aggregate(groupingExps, aggExps,
   +  j @ Join(left: Aggregate, _, LeftSemi | LeftAnti, _, _)) if 
agg.groupOnly && left.groupOnly &&
   +aggExps.forall(e => 
left.aggregateExpressions.exists(_.semanticEquals(e))) &&
   +groupingExps.length == left.groupingExpressions.length &&
   +groupingExps.zip(left.groupingExpressions).forall(e => 
e._1.semanticEquals(e._2)) =>
   +  j
  }

  private def isLowerRedundant(upper: Aggregate, lower: Aggregate): Boolean 
= {
   
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-

[GitHub] [spark] wangyum commented on a change in pull request #33404: [SPARK-36194][SQL] Remove the aggregation from left semi/anti join if the same aggregation has already been done on left side

2021-07-19 Thread GitBox


wangyum commented on a change in pull request #33404:
URL: https://github.com/apache/spark/pull/33404#discussion_r672388764



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RemoveRedundantAggregatesInLeftSemiAntiJoin.scala
##
@@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.plans.{LeftAnti, LeftSemi}
+import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, Join, 
LogicalPlan}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.catalyst.trees.TreePattern.{AGGREGATE, 
LEFT_SEMI_OR_ANTI_JOIN}
+
+/**
+ * Remove the redundant aggregation from left semi/anti join if the same 
aggregation has already
+ * been done on left side.
+ */
+object RemoveRedundantAggregatesInLeftSemiAntiJoin extends Rule[LogicalPlan] {

Review comment:
   How about just add this rule to `RemoveRedundantAggregates`?
   ```patch
   Index: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RemoveRedundantAggregates.scala
   IDEA additional info:
   Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
   <+>UTF-8
   ===
   diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RemoveRedundantAggregates.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RemoveRedundantAggregates.scala
   --- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RemoveRedundantAggregates.scala
(revision 86a828aa4e27676ec74dab28b1fb3db996f9c637)
   +++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RemoveRedundantAggregates.scala
(date 1626706341253)
   @@ -20,7 +20,8 @@
import org.apache.spark.sql.catalyst.analysis.PullOutNondeterministic
import org.apache.spark.sql.catalyst.expressions.{AliasHelper, AttributeSet}
import 
org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression
   -import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, LogicalPlan}
   +import org.apache.spark.sql.catalyst.plans.{LeftAnti, LeftSemi}
   +import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, Join, 
LogicalPlan}
import org.apache.spark.sql.catalyst.rules.Rule
import org.apache.spark.sql.catalyst.trees.TreePattern.AGGREGATE

   @@ -29,7 +30,7 @@
 * only goal is to keep distinct values, while its parent aggregate would 
ignore duplicate values.
 */
object RemoveRedundantAggregates extends Rule[LogicalPlan] with AliasHelper 
{
   -  def apply(plan: LogicalPlan): LogicalPlan = plan.transformUpWithPruning(
   +  def apply(plan: LogicalPlan): LogicalPlan = plan.transformDownWithPruning(
_.containsPattern(AGGREGATE), ruleId) {
case upper @ Aggregate(_, _, lower: Aggregate) if 
isLowerRedundant(upper, lower) =>
  val aliasMap = getAliasMap(lower)
   @@ -47,6 +48,13 @@
  } else {
newAggregate
  }
   +
   +case agg @ Aggregate(groupingExps, aggExps,
   +  j @ Join(left: Aggregate, _, LeftSemi | LeftAnti, _, _)) if 
agg.groupOnly && left.groupOnly &&
   +aggExps.forall(e => 
left.aggregateExpressions.exists(_.semanticEquals(e))) &&
   +groupingExps.length == left.groupingExpressions.length &&
   +groupingExps.zip(left.groupingExpressions).forall(e => 
e._1.semanticEquals(e._2)) =>
   +  j
  }

  private def isLowerRedundant(upper: Aggregate, lower: Aggregate): Boolean 
= {
   
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum commented on a change in pull request #33404: [SPARK-36194][SQL] Remove the aggregation from left semi/anti join if the same aggregation has already been done on left side

2021-07-18 Thread GitBox


wangyum commented on a change in pull request #33404:
URL: https://github.com/apache/spark/pull/33404#discussion_r672016978



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RemoveAggsThroughLeftSemiAntiJoin.scala
##
@@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.plans.{LeftAnti, LeftSemi}
+import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, Join, 
LogicalPlan}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.catalyst.trees.TreePattern.{AGGREGATE, 
LEFT_SEMI_OR_ANTI_JOIN}
+
+/**
+ * Remove the aggregation from left semi/anti join if the same aggregation has 
already been done
+ * on left side.
+ */
+object RemoveAggsThroughLeftSemiAntiJoin extends Rule[LogicalPlan] {
+  // Transform down to remove more Aggregates.
+  def apply(plan: LogicalPlan): LogicalPlan = plan.transformDownWithPruning(
+_.containsAllPatterns(AGGREGATE, LEFT_SEMI_OR_ANTI_JOIN), ruleId) {
+case agg @ Aggregate(grouping, aggExps, j @ Join(left: Aggregate, _, 
LeftSemi | LeftAnti, _, _))

Review comment:
   Fixed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum commented on a change in pull request #33404: [SPARK-36194][SQL] Remove the aggregation from left semi/anti join if the same aggregation has already been done on left side

2021-07-18 Thread GitBox


wangyum commented on a change in pull request #33404:
URL: https://github.com/apache/spark/pull/33404#discussion_r672012328



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RemoveAggsThroughLeftSemiAntiJoin.scala
##
@@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.plans.{LeftAnti, LeftSemi}
+import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, Join, 
LogicalPlan}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.catalyst.trees.TreePattern.{AGGREGATE, 
LEFT_SEMI_OR_ANTI_JOIN}
+
+/**
+ * Remove the aggregation from left semi/anti join if the same aggregation has 
already been done
+ * on left side.
+ */
+object RemoveAggsThroughLeftSemiAntiJoin extends Rule[LogicalPlan] {
+  // Transform down to remove more Aggregates.

Review comment:
   Added it.
   
https://github.com/apache/spark/blob/d89da31d57c6c52d28c35d175cd219cf6f1f13cd/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RemoveRedundantAggregatesInLeftSemiAntiJoinSuite.scala#L64-L78




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum commented on a change in pull request #33404: [SPARK-36194][SQL] Remove the aggregation from left semi/anti join if the same aggregation has already been done on left side

2021-07-18 Thread GitBox


wangyum commented on a change in pull request #33404:
URL: https://github.com/apache/spark/pull/33404#discussion_r672012114



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RemoveAggsThroughLeftSemiAntiJoin.scala
##
@@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.plans.{LeftAnti, LeftSemi}
+import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, Join, 
LogicalPlan}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.catalyst.trees.TreePattern.{AGGREGATE, 
LEFT_SEMI_OR_ANTI_JOIN}
+
+/**
+ * Remove the aggregation from left semi/anti join if the same aggregation has 
already been done
+ * on left side.
+ */
+object RemoveAggsThroughLeftSemiAntiJoin extends Rule[LogicalPlan] {
+  // Transform down to remove more Aggregates.
+  def apply(plan: LogicalPlan): LogicalPlan = plan.transformDownWithPruning(
+_.containsAllPatterns(AGGREGATE, LEFT_SEMI_OR_ANTI_JOIN), ruleId) {
+case agg @ Aggregate(grouping, aggExps, j @ Join(left: Aggregate, _, 
LeftSemi | LeftAnti, _, _))
+  if agg.groupOnly && left.groupOnly &&
+aggExps.forall(e => 
left.aggregateExpressions.exists(_.semanticEquals(e))) &&

Review comment:
   Added it.
   
https://github.com/apache/spark/blob/d89da31d57c6c52d28c35d175cd219cf6f1f13cd/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RemoveRedundantAggregatesInLeftSemiAntiJoinSuite.scala#L91-L100




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum commented on a change in pull request #33404: [SPARK-36194][SQL] Remove the aggregation from left semi/anti join if the same aggregation has already been done on left side

2021-07-18 Thread GitBox


wangyum commented on a change in pull request #33404:
URL: https://github.com/apache/spark/pull/33404#discussion_r672011852



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RemoveAggsThroughLeftSemiAntiJoin.scala
##
@@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.plans.{LeftAnti, LeftSemi}
+import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, Join, 
LogicalPlan}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.catalyst.trees.TreePattern.{AGGREGATE, 
LEFT_SEMI_OR_ANTI_JOIN}
+
+/**
+ * Remove the aggregation from left semi/anti join if the same aggregation has 
already been done
+ * on left side.
+ */
+object RemoveAggsThroughLeftSemiAntiJoin extends Rule[LogicalPlan] {

Review comment:
   Renamed it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum commented on a change in pull request #33404: [SPARK-36194][SQL] Remove the aggregation from left semi/anti join if the same aggregation has already been done on left side

2021-07-17 Thread GitBox


wangyum commented on a change in pull request #33404:
URL: https://github.com/apache/spark/pull/33404#discussion_r671763303



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RemoveAggsThroughLeftSemiAntiJoin.scala
##
@@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.plans.{LeftAnti, LeftSemi}
+import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, Join, 
LogicalPlan}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.catalyst.trees.TreePattern.{AGGREGATE, 
LEFT_SEMI_OR_ANTI_JOIN}
+
+/**
+ * Remove the aggregation from left semi/anti join if the same aggregation has 
already been done
+ * on left side.
+ */
+object RemoveAggsThroughLeftSemiAntiJoin extends Rule[LogicalPlan] {
+  // Transform down to remove more Aggregates.
+  def apply(plan: LogicalPlan): LogicalPlan = plan.transformDownWithPruning(
+_.containsAllPatterns(AGGREGATE, LEFT_SEMI_OR_ANTI_JOIN), ruleId) {
+case agg @ Aggregate(grouping, aggExps, j @ Join(left: Aggregate, _, 
LeftSemi | LeftAnti, _, _))
+  if agg.groupOnly && left.groupOnly &&
+aggExps.forall(e => 
left.aggregateExpressions.exists(_.semanticEquals(e))) &&

Review comment:
   Do you mean this condition?
   ```scala
   aggExps.forall(e => left.aggregateExpressions.exists(_.semanticEquals(e)))
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum commented on a change in pull request #33404: [SPARK-36194][SQL] Remove the aggregation from left semi/anti join if the same aggregation has already been done on left side

2021-07-17 Thread GitBox


wangyum commented on a change in pull request #33404:
URL: https://github.com/apache/spark/pull/33404#discussion_r671763262



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RemoveAggsThroughLeftSemiAntiJoin.scala
##
@@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.plans.{LeftAnti, LeftSemi}
+import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, Join, 
LogicalPlan}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.catalyst.trees.TreePattern.{AGGREGATE, 
LEFT_SEMI_OR_ANTI_JOIN}
+
+/**
+ * Remove the aggregation from left semi/anti join if the same aggregation has 
already been done
+ * on left side.
+ */
+object RemoveAggsThroughLeftSemiAntiJoin extends Rule[LogicalPlan] {

Review comment:
   Do you have a recommended name?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum commented on a change in pull request #33404: [SPARK-36194][SQL] Remove the aggregation from left semi/anti join if the same aggregation has already been done on left side

2021-07-17 Thread GitBox


wangyum commented on a change in pull request #33404:
URL: https://github.com/apache/spark/pull/33404#discussion_r671762429



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RemoveAggsThroughLeftSemiAntiJoin.scala
##
@@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.plans.{LeftAnti, LeftSemi}
+import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, Join, 
LogicalPlan}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.catalyst.trees.TreePattern.{AGGREGATE, 
LEFT_SEMI_OR_ANTI_JOIN}
+
+/**
+ * Remove the aggregation from left semi/anti join if the same aggregation has 
already been done
+ * on left side.
+ */
+object RemoveAggsThroughLeftSemiAntiJoin extends Rule[LogicalPlan] {
+  // Transform down to remove more Aggregates.
+  def apply(plan: LogicalPlan): LogicalPlan = plan.transformDownWithPruning(
+_.containsAllPatterns(AGGREGATE, LEFT_SEMI_OR_ANTI_JOIN), ruleId) {
+case agg @ Aggregate(grouping, aggExps, j @ Join(left: Aggregate, _, 
LeftSemi | LeftAnti, _, _))

Review comment:
   I also like `groupingExprs`, but it will exceeds 100 characters. We need 
a new line.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum commented on a change in pull request #33404: [SPARK-36194][SQL] Remove the aggregation from left semi/anti join if the same aggregation has already been done on left side

2021-07-17 Thread GitBox


wangyum commented on a change in pull request #33404:
URL: https://github.com/apache/spark/pull/33404#discussion_r671761972



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RemoveAggsThroughLeftSemiAntiJoin.scala
##
@@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.plans.{LeftAnti, LeftSemi}
+import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, Join, 
LogicalPlan}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.catalyst.trees.TreePattern.{AGGREGATE, 
LEFT_SEMI_OR_ANTI_JOIN}
+
+/**
+ * Remove the aggregation from left semi/anti join if the same aggregation has 
already been done
+ * on left side.
+ */
+object RemoveAggsThroughLeftSemiAntiJoin extends Rule[LogicalPlan] {
+  // Transform down to remove more Aggregates.

Review comment:
   TPC-DS q38 is that case:
   
![image](https://user-images.githubusercontent.com/5399861/126052401-1d072f77-e584-45c2-939d-f87deda989e1.png)
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org