[
https://issues.apache.org/jira/browse/SPARK-40999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fredrik Klauß updated SPARK-40999:
--
Description:
Currently, if a user tries to specify a query like the following, the hints on
the subquery will be lost.
{code:java}
SELECT * FROM target t WHERE EXISTS
(SELECT /*+ BROADCAST */ * FROM source s WHERE s.key = t.key){code}
This happens as hints are removed from the plan and pulled into joins in the
beginning of the optimization stage, but subqueries are only turned into joins
during optimization. As we remove any hints that are not below a join, we end
up removing hints that are below a subquery.
It worked prior to a refactoring that added hints as a field to joins
(SPARK-26065) and can cause a regression if someone made use of hints on
subqueries before.
To resolve this, we add a hint field to SubqueryExpression that any hints
inside a subquery's plan can be pulled into during EliminateResolvedHint, and
then pass this hint on when the subquery is turned into a join.
was:
Currently, if a user tries to specify a query like the following, the hints on
the subquery will be lost.
{code:java}
SELECT * FROM target t WHERE EXISTS
(SELECT /*+ BROADCAST */ * FROM source s WHERE s.key = t.key){code}
This happens as hints are removed from the plan and pulled into joins in the
beginning of the optimization stage, but subqueries are only turned into joins
during optimization. As we remove any hints that are not below a join, we end
up removing hints that are below a subquery.
To resolve this, we add a hint field to SubqueryExpression that any hints
inside a subquery's plan can be pulled into during EliminateResolvedHint, and
then pass this hint on when the subquery is turned into a join.
> Hints on subqueries are not properly propagated
> ---
>
> Key: SPARK-40999
> URL: https://issues.apache.org/jira/browse/SPARK-40999
> Project: Spark
> Issue Type: Bug
> Components: Optimizer, Spark Core
>Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0,
> 3.1.3, 3.2.1, 3.3.0, 3.2.2, 3.4.0, 3.3.1
>Reporter: Fredrik Klauß
>Priority: Major
>
> Currently, if a user tries to specify a query like the following, the hints
> on the subquery will be lost.
> {code:java}
> SELECT * FROM target t WHERE EXISTS
> (SELECT /*+ BROADCAST */ * FROM source s WHERE s.key = t.key){code}
> This happens as hints are removed from the plan and pulled into joins in the
> beginning of the optimization stage, but subqueries are only turned into
> joins during optimization. As we remove any hints that are not below a join,
> we end up removing hints that are below a subquery.
>
> It worked prior to a refactoring that added hints as a field to joins
> (SPARK-26065) and can cause a regression if someone made use of hints on
> subqueries before.
>
> To resolve this, we add a hint field to SubqueryExpression that any hints
> inside a subquery's plan can be pulled into during EliminateResolvedHint, and
> then pass this hint on when the subquery is turned into a join.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org