Asif created SPARK-53264:
----------------------------
Summary: Conversion of correlated scala subquery to Left Outer
Join , results in nullability as false, of the right side attribute
Key: SPARK-53264
URL: https://issues.apache.org/jira/browse/SPARK-53264
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 4.0.0, 4.1.0
Reporter: Asif
In the RewriteCorrelatedScalarSubquery, when a correlated scalal subquery gets
converted into a Left Outer Join, the Project node just above the Left Outer
Join Node, has the nullability as false, of the attribute coming out of the
right side of the Join Table.
Any attribute coming out of the right side of the Join Table for Left Outer
Join should have nullability as false.
This results in the query ( from SQLQueryTestSuite) :
{quote}select *
{quote}
{quote}from range(1, 3) t1
{quote}
{quote}where (select t2.id c
{quote}
{quote}from range (1, 2) t2 where t1.id = t2.id
{quote}
{quote}) is not null
{quote}
eventually wrongly getting optimized into an Inner Join.
But the bug remains hidden in the current code base, due to the inefficiency in
the PushDownPredicates rule, which indirectly sorts of hide the issue.
If the PushDownPredicates was working efficiently ( i.e combining and pushing
predicates in a single pass), the bug would get exposed.
The inefficiency in PushDownPredicates rule is itself described in bug
[SPARK-53263|[https://issues.apache.org/jira/projects/SPARK/issues/SPARK-53263].]
Will be submitting a PR with bug test in some time.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]