Zhen Chen created CALCITE-6927:
----------------------------------
Summary: Join condition remove IS NOT DISTINCT FROM
Key: CALCITE-6927
URL: https://issues.apache.org/jira/browse/CALCITE-6927
Project: Calcite
Issue Type: Improvement
Reporter: Zhen Chen
Assignee: Zhen Chen
By referring to the conversion method of spark, IS NOT DISTINCT FROM can be
converted to `(coalesce(x, '') = coalesce(y, '') ) and (isnull(x) = isnull(y))`
so that the join with IS NOT DISTINCT FROM condition can be used HashJoin
instead of NestedLoopJoin when converting the logical plan to the physical
plan.
The sql is as follows:
{code:java}
explain
select t1.age from user_profiles as t1
join user_profiles t2
on t1.user_id <=> t2.user_id; {code}
The spark plan is as follows:
{code:java}
AdaptiveSparkPlan isFinalPlan=false
+- Project [age#6]
+- BroadcastHashJoin [coalesce(user_id#5, ), isnull(user_id#5)],
[coalesce(user_id#29, ), isnull(user_id#29)], Inner, BuildRight, false
:- FileScan orc default.user_profiles[user_id#5,age#6] Batched: true,
Bucketed: false (disabled by query planner), DataFilters: [], Format: ORC,
Location: InMemoryFileIndex(1 paths)[file:..., PartitionFilters: [],
PushedFilters: [], ReadSchema: struct<user_id:string,age:int>
+- BroadcastExchange HashedRelationBroadcastMode(List(coalesce(input[0,
string, true], ), isnull(input[0, string, true])),false), [plan_id=72]
+- FileScan orc default.user_profiles[user_id#29] Batched: true,
Bucketed: false (disabled by query planner), DataFilters: [], Format: ORC,
Location: InMemoryFileIndex(1 paths)[file:..., PartitionFilters: [],
PushedFilters: [], ReadSchema: struct<user_id:string>{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)