Re: [PR] [SPARK-45009][SQL] Decorrelate predicate subqueries in join condition [spark]

2023-10-15 Thread via GitHub


cloud-fan closed pull request #42725: [SPARK-45009][SQL] Decorrelate predicate 
subqueries in join condition
URL: https://github.com/apache/spark/pull/42725


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45009][SQL] Decorrelate predicate subqueries in join condition [spark]

2023-10-15 Thread via GitHub


cloud-fan commented on PR #42725:
URL: https://github.com/apache/spark/pull/42725#issuecomment-1763675871

   yea they are unrelated, thanks, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45009][SQL] Decorrelate predicate subqueries in join condition [spark]

2023-10-13 Thread via GitHub


andylam-db commented on PR #42725:
URL: https://github.com/apache/spark/pull/42725#issuecomment-1761961814

   @cloud-fan I think the build is failing because of unrelated failing tests 
-- timeouts and what not. Could you take a look and see if we can merge this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45009][SQL] Decorrelate predicate subqueries in join condition [spark]

2023-10-12 Thread via GitHub


jchen5 commented on code in PR #42725:
URL: https://github.com/apache/spark/pull/42725#discussion_r1355693885


##
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##
@@ -4378,6 +4378,25 @@ object SQLConf {
   .booleanConf
   .createWithDefault(true)
 
+  val DECORRELATE_PREDICATE_SUBQUERIES_IN_JOIN_CONDITION =
+
buildConf("spark.sql.optimizer.decorrelatePredicateSubqueriesInJoinPredicate.enabled")
+  .internal()
+  .doc("Decorrelate predicate (in and exists) subqueries with correlated 
references in join " +
+"predicates.")
+  .version("4.0.0")
+  .booleanConf
+  .createWithDefault(true)
+
+  val OPTIMIZE_UNCORRELATED_IN_SUBQUERIES_IN_JOIN_CONDITION =
+
buildConf("spark.sql.optimizer.optimizeUncorrelatedInSubqueriesInJoinCondition.enabled")
+  .internal()
+  .doc("When true, optimize uncorrelated IN subqueries in join predicates 
by rewriting them " +
+s"to joins. This interacts with 
${LEGACY_NULL_IN_EMPTY_LIST_BEHAVIOR.key} because it " +
+"can rewrite IN predicates.")
+  .version("4.0.0")
+  .booleanConf
+  .createWithDefault(false)

Review Comment:
   We can have this true by default now - LEGACY_NULL_IN_EMPTY_LIST_BEHAVIOR 
was flipped to the new behavior by default recently.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45009][SQL] Decorrelate predicate subqueries in join condition [spark]

2023-10-11 Thread via GitHub


andylam-db commented on code in PR #42725:
URL: https://github.com/apache/spark/pull/42725#discussion_r1355858190


##
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##
@@ -4378,6 +4378,25 @@ object SQLConf {
   .booleanConf
   .createWithDefault(true)
 
+  val DECORRELATE_PREDICATE_SUBQUERIES_IN_JOIN_CONDITION =
+
buildConf("spark.sql.optimizer.decorrelatePredicateSubqueriesInJoinPredicate.enabled")
+  .internal()
+  .doc("Decorrelate predicate (in and exists) subqueries with correlated 
references in join " +
+"predicates.")
+  .version("4.0.0")
+  .booleanConf
+  .createWithDefault(true)
+
+  val OPTIMIZE_UNCORRELATED_IN_SUBQUERIES_IN_JOIN_CONDITION =
+
buildConf("spark.sql.optimizer.optimizeUncorrelatedInSubqueriesInJoinCondition.enabled")
+  .internal()
+  .doc("When true, optimize uncorrelated IN subqueries in join predicates 
by rewriting them " +
+s"to joins. This interacts with 
${LEGACY_NULL_IN_EMPTY_LIST_BEHAVIOR.key} because it " +
+"can rewrite IN predicates.")
+  .version("4.0.0")
+  .booleanConf
+  .createWithDefault(false)

Review Comment:
   Is it easier to just rely on LEGACY_NULL_IN_EMPTY_LIST_BEHAVIOR instead of 
creating this new flag?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45009][SQL] Decorrelate predicate subqueries in join condition [spark]

2023-10-11 Thread via GitHub


andylam-db commented on code in PR #42725:
URL: https://github.com/apache/spark/pull/42725#discussion_r1355858190


##
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##
@@ -4378,6 +4378,25 @@ object SQLConf {
   .booleanConf
   .createWithDefault(true)
 
+  val DECORRELATE_PREDICATE_SUBQUERIES_IN_JOIN_CONDITION =
+
buildConf("spark.sql.optimizer.decorrelatePredicateSubqueriesInJoinPredicate.enabled")
+  .internal()
+  .doc("Decorrelate predicate (in and exists) subqueries with correlated 
references in join " +
+"predicates.")
+  .version("4.0.0")
+  .booleanConf
+  .createWithDefault(true)
+
+  val OPTIMIZE_UNCORRELATED_IN_SUBQUERIES_IN_JOIN_CONDITION =
+
buildConf("spark.sql.optimizer.optimizeUncorrelatedInSubqueriesInJoinCondition.enabled")
+  .internal()
+  .doc("When true, optimize uncorrelated IN subqueries in join predicates 
by rewriting them " +
+s"to joins. This interacts with 
${LEGACY_NULL_IN_EMPTY_LIST_BEHAVIOR.key} because it " +
+"can rewrite IN predicates.")
+  .version("4.0.0")
+  .booleanConf
+  .createWithDefault(false)

Review Comment:
   Is it easier to just rely on LEGACY_NULL_IN_EMPTY_LIST_BEHAVIOR instead of 
creating this new flag?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45009][SQL] Decorrelate predicate subqueries in join condition [spark]

2023-10-11 Thread via GitHub


jchen5 commented on code in PR #42725:
URL: https://github.com/apache/spark/pull/42725#discussion_r1355693885


##
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##
@@ -4378,6 +4378,25 @@ object SQLConf {
   .booleanConf
   .createWithDefault(true)
 
+  val DECORRELATE_PREDICATE_SUBQUERIES_IN_JOIN_CONDITION =
+
buildConf("spark.sql.optimizer.decorrelatePredicateSubqueriesInJoinPredicate.enabled")
+  .internal()
+  .doc("Decorrelate predicate (in and exists) subqueries with correlated 
references in join " +
+"predicates.")
+  .version("4.0.0")
+  .booleanConf
+  .createWithDefault(true)
+
+  val OPTIMIZE_UNCORRELATED_IN_SUBQUERIES_IN_JOIN_CONDITION =
+
buildConf("spark.sql.optimizer.optimizeUncorrelatedInSubqueriesInJoinCondition.enabled")
+  .internal()
+  .doc("When true, optimize uncorrelated IN subqueries in join predicates 
by rewriting them " +
+s"to joins. This interacts with 
${LEGACY_NULL_IN_EMPTY_LIST_BEHAVIOR.key} because it " +
+"can rewrite IN predicates.")
+  .version("4.0.0")
+  .booleanConf
+  .createWithDefault(false)

Review Comment:
   We can have this true by default now - LEGACY_NULL_IN_EMPTY_LIST_BEHAVIOR 
was enabled by default recently.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45009][SQL] Decorrelate predicate subqueries in join condition [spark]

2023-10-11 Thread via GitHub


andylam-db commented on code in PR #42725:
URL: https://github.com/apache/spark/pull/42725#discussion_r1355441657


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala:
##
@@ -159,6 +160,66 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] 
with PredicateHelper {
   Project(p.output, Filter(newCond.get, inputPlan))
   }
 
+// This case takes care of predicate subqueries in join conditions that 
are not pushed down
+// to the children nodes by [[PushDownPredicates]].

Review Comment:
   Yes -- pushdown rules are in `operatorOptimizationBatch` and 
`RewriteSubquery` batch runs after those (almost the last batch in the 
Optimizer)



##
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##
@@ -4378,6 +4378,25 @@ object SQLConf {
   .booleanConf
   .createWithDefault(true)
 
+  val DECORRELATE_PREDICATE_SUBQUERIES_IN_JOIN_CONDITION =
+
buildConf("spark.sql.optimizer.decorrelatePredicateSubqueriesInJoinPredicate.enabled")
+  .internal()
+  .doc("Decorrelate predicate (in and exists) subqueries with correlated 
references in join " +
+"predicates.")
+  .version("4.0.0")
+  .booleanConf
+  .createWithDefault(true)
+
+  val OPTIMIZE_UNCORRELATED_IN_SUBQUERIES_IN_JOIN_CONDITION =
+
buildConf("spark.sql.optimizer.optimizeUncorrelatedInSubqueriesInJoinCondition.enabled")
+  .internal()
+  .doc("When true, optimize uncorrelated IN subqueries in join predicates 
by rewriting them " +
+s"to joins. This interacts with 
${LEGACY_NULL_IN_EMPTY_LIST_BEHAVIOR.key} because it " +

Review Comment:
   This happens because with this change, IN subqueries are rewritten to joins 
and therefore force the "correct" NULL-in-empty-list behavior. This config is 
added in case users want to use the legacy behavior. This has nothing to do 
with EXISTS.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45009][SQL] Decorrelate predicate subqueries in join condition [spark]

2023-10-10 Thread via GitHub


cloud-fan commented on code in PR #42725:
URL: https://github.com/apache/spark/pull/42725#discussion_r1353833831


##
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##
@@ -4378,6 +4378,25 @@ object SQLConf {
   .booleanConf
   .createWithDefault(true)
 
+  val DECORRELATE_PREDICATE_SUBQUERIES_IN_JOIN_CONDITION =
+
buildConf("spark.sql.optimizer.decorrelatePredicateSubqueriesInJoinPredicate.enabled")
+  .internal()
+  .doc("Decorrelate predicate (in and exists) subqueries with correlated 
references in join " +
+"predicates.")
+  .version("4.0.0")
+  .booleanConf
+  .createWithDefault(true)
+
+  val OPTIMIZE_UNCORRELATED_IN_SUBQUERIES_IN_JOIN_CONDITION =
+
buildConf("spark.sql.optimizer.optimizeUncorrelatedInSubqueriesInJoinCondition.enabled")
+  .internal()
+  .doc("When true, optimize uncorrelated IN subqueries in join predicates 
by rewriting them " +
+s"to joins. This interacts with 
${LEGACY_NULL_IN_EMPTY_LIST_BEHAVIOR.key} because it " +

Review Comment:
   we don't do this for EXISTS because it's already fast to evaluate 
non-correlated EXISTS?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45009][SQL] Decorrelate predicate subqueries in join condition [spark]

2023-10-10 Thread via GitHub


cloud-fan commented on code in PR #42725:
URL: https://github.com/apache/spark/pull/42725#discussion_r1353829912


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala:
##
@@ -159,6 +160,66 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] 
with PredicateHelper {
   Project(p.output, Filter(newCond.get, inputPlan))
   }
 
+// This case takes care of predicate subqueries in join conditions that 
are not pushed down
+// to the children nodes by [[PushDownPredicates]].

Review Comment:
   how is this guaranteed? we run this rule in a later batch after the batch 
containing pushdown rules?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45009][SQL] Decorrelate predicate subqueries in join condition [spark]

2023-10-04 Thread via GitHub


andylam-db commented on PR #42725:
URL: https://github.com/apache/spark/pull/42725#issuecomment-1747794562

   Pinging for reviews! @allisonwang-db @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org