Hi all:
i used in subquery like following in spark 2.3.1
{code}
set spark.sql.autoBroadcastJoinThreshold=-1;
explain select * from testdata where key1 not in (select key1 from testdata as
b);
= Physical Plan ==
BroadcastNestedLoopJoin BuildRight, LeftAnti, ((key1#60 = key1#62) ||
isnull((key1#60 = key1#62)))
:- HiveTableScan [key1#60, value1#61], HiveTableRelation `default`.`testdata`,
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [key1#60, value1#61]
+- BroadcastExchange IdentityBroadcastMode
+- HiveTableScan [key1#62], HiveTableRelation `default`.`testdata`,
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [key1#62, value1#63]
{code}
It seems that even i disable the broadcast join ( set
spark.sql.autoBroadcastJoinThreshold=-1). it still a broadcast join not sort
merged join. Is there any way to let spark sql to use sort merge join by
setting parameters besides directly modifying the sql like : explain select
testdata.* from testdata left join testdata2 where testdata.key1=testdata2.key1
and testdata2.key1 is null;
Appreciate to have some suggestions
Best Regards
Kelly Zhang