Khurram Faraaz created DRILL-5010: ------------------------------------- Summary: Equality join condition is treated as a MergeJoin and not as a HashJoin. Key: DRILL-5010 URL: https://issues.apache.org/jira/browse/DRILL-5010 Project: Apache Drill Issue Type: Bug Components: Execution - Flow Affects Versions: 1.9.0 Reporter: Khurram Faraaz
Equality join condition is treated as a MergeJoin and not as a HashJoin. Drill 1.9.0 git commit ID: 83513daf Projecting the join columns results in merge join, whereas it should be doing a HashJoin. {noformat} 0: jdbc:drill:schema=dfs.tmp> explain plan for . . . . . . . . . . . . . . > select t1.intKey, t2.intKey from `left.json` t1, `right_all_nulls.json` t2 WHERE t1.intKey = t2.intKey OR ( t1.intKey IS NULL AND t2.intKey IS NULL); +------+------+ | text | json | +------+------+ | 00-00 Screen 00-01 Project(intKey=[$0], intKey0=[$1]) 00-02 Project(intKey=[$0], intKey0=[$1]) 00-03 MergeJoin(condition=[IS NOT DISTINCT FROM($0, $1)], joinType=[inner]) 00-05 SelectionVectorRemover 00-07 Sort(sort0=[$0], dir0=[ASC]) 00-09 Scan(groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/left.json, numFiles=1, columns=[`intKey`], files=[maprfs:///tmp/left.json]]]) 00-04 Project(intKey0=[$0]) 00-06 SelectionVectorRemover 00-08 Sort(sort0=[$0], dir0=[ASC]) 00-10 Scan(groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/right_all_nulls.json, numFiles=1, columns=[`intKey`], files=[maprfs:///tmp/right_all_nulls.json]]]) {noformat} Note that HashAgg and HashJoin were enabled. {noformat} | planner.enable_hashagg | BOOLEAN | SYSTEM | DEFAULT | null | null | true | null | | planner.enable_hashjoin | BOOLEAN | SYSTEM | DEFAULT | null | null | true | null | {noformat} Doing a SELECT <star> results in a HashJoin in the query plan {noformat} 0: jdbc:drill:schema=dfs.tmp> explain plan for . . . . . . . . . . . . . . > select * from `left.json` t1, `right_all_nulls.json` t2 WHERE t1.intKey = t2.intKey OR ( t1.intKey IS NULL AND t2.intKey IS NULL); +------+------+ | text | json | +------+------+ | 00-00 Screen 00-01 ProjectAllowDup(*=[$0], *0=[$1]) 00-02 Project(T46¦¦*=[$0], T47¦¦*=[$2]) 00-03 HashJoin(condition=[IS NOT DISTINCT FROM($1, $3)], joinType=[inner]) 00-04 Project(T47¦¦*=[$0], intKey0=[$1]) 00-06 Project(T47¦¦*=[$0], intKey=[$1]) 00-08 Scan(groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/right_all_nulls.json, numFiles=1, columns=[`*`], files=[maprfs:///tmp/right_all_nulls.json]]]) 00-05 Project(T46¦¦*=[$0], intKey=[$1]) 00-07 Scan(groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/left.json, numFiles=1, columns=[`*`], files=[maprfs:///tmp/left.json]]]) {noformat} Data used in above queries {noformat} [root@centos-01 null_eq_joins]# cat left.json { "intKey" : 123, "bgintKey": 1234567, "strKey": "this is a test string", "boolKey": true, "fltKey": 123.786, "dblKey": 457.984, "timKey": "18:30:45", "dtKey": "1997-10-21", "tmstmpKey": "2007-04-30 13:10:02.047", "intrvldyKey": "P9DT38833S", "intrvlyrKey": "P255M" } [root@centos-01 null_equality_joins]# [root@centos-01 null_eq_joins]# cat right_all_nulls.json { "intKey" : null, "bgintKey": null, "strKey": null, "boolKey": null, "fltKey": null, "dblKey": null, "timKey": null, "dtKey": null, "tmstmpKey": null, "intrvldyKey": null, "intrvlyrKey": null } [root@centos-01 null_eq_joins]# {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)