[jira] [Created] (DRILL-1788) Conflicting column names in join

Steven Phillips (JIRA) Thu, 27 Nov 2014 13:55:34 -0800

Steven Phillips created DRILL-1788:
--------------------------------------

             Summary: Conflicting column names in join
                 Key: DRILL-1788
                 URL: https://issues.apache.org/jira/browse/DRILL-1788
             Project: Apache Drill
          Issue Type: Bug
            Reporter: Steven Phillips



Drill doesn't support multiple columns within a batch having the same name. 
when doing a join where there are matching column names, the planner will 
insert a project to rename one of the columns to avoid this conflict.

However, it appears that there is some case-sensitive matching somewhere in the 
code path, because there are some cases where this rewrite does not happen:

For example, this query does do the column name change (see 01-03):


0: jdbc:drill:> explain plan for select n3.n_name from (select n2.n_name from 
cp.`tpch/nation.parquet` n1, cp.`tpch/nation.parquet` n2 where n1.n_name = 
n2.n_name) n3 join cp.`tpch/nation.parquet` n4 on n3.n_name = n4.n_name;
+------------+------------+
|    text    |    json    |
+------------+------------+
| 00-00    Screen
00-01      UnionExchange
01-01        Project(n_name=[$0])
01-02          HashJoin(condition=[=($0, $1)], joinType=[inner])
01-04            HashToRandomExchange(dist0=[[$0]])
02-01              Project(n_name=[$1])
02-02                HashJoin(condition=[=($0, $1)], joinType=[inner])
02-04                  HashToRandomExchange(dist0=[[$0]])
04-01                    Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=/tpch/nation.parquet]], 
selectionRoot=/tpch/nation.parquet, numFiles=1, columns=[`n_name`]]])
02-03                  Project(n_name0=[$0])
02-05                    HashToRandomExchange(dist0=[[$0]])
05-01                      Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=/tpch/nation.parquet]], 
selectionRoot=/tpch/nation.parquet, numFiles=1, columns=[`n_name`]]])
01-03            Project(n_name0=[$0])
01-05              HashToRandomExchange(dist0=[[$0]])
03-01                Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=/tpch/nation.parquet]], 
selectionRoot=/tpch/nation.parquet, numFiles=1, columns=[`n_name`]]])


But if I change the one of the letters in one of the identifiers to uppercase, 
the rename goes away:

0: jdbc:drill:> explain plan for select n3.n_name from (select n2.n_name from 
cp.`tpch/nation.parquet` n1, cp.`tpch/nation.parquet` n2 where n1.N_name = 
n2.n_name) n3 join cp.`tpch/nation.parquet` n4 on n3.n_name = n4.n_name;
+------------+------------+
|    text    |    json    |
+------------+------------+
| 00-00    Screen
00-01      UnionExchange
01-01        Project(n_name=[$0])
01-02          HashJoin(condition=[=($0, $1)], joinType=[inner])
01-04            HashToRandomExchange(dist0=[[$0]])
02-01              Project(n_name=[$1])
02-02                HashJoin(condition=[=($0, $1)], joinType=[inner])
02-04                  HashToRandomExchange(dist0=[[$0]])
04-01                    Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=/tpch/nation.parquet]], 
selectionRoot=/tpch/nation.parquet, numFiles=1, columns=[`N_name`]]])
02-03                  Project(N_name0=[$0])
02-05                    HashToRandomExchange(dist0=[[$0]])
05-01                      Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=/tpch/nation.parquet]], 
selectionRoot=/tpch/nation.parquet, numFiles=1, columns=[`N_name`]]])
01-03            HashToRandomExchange(dist0=[[$0]])
03-01              Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=/tpch/nation.parquet]], selectionRoot=/tpch/nation.parquet, numFiles=1, 
columns=[`N_name`]]])

Running this query without the rewrite results in failure:

java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
        at java.util.ArrayList.rangeCheck(ArrayList.java:604) ~[na:1.7.0_21]
        at java.util.ArrayList.get(ArrayList.java:382) ~[na:1.7.0_21]
        at 
org.apache.drill.exec.record.VectorContainer.getValueAccessorById(VectorContainer.java:252)
 
~[drill-java-exec-0.7.0-incubating-SNAPSHOT-rebuffed.jar:0.7.0-incubating-SNAPSHOT]
        at 
org.apache.drill.exec.record.AbstractRecordBatch.getValueAccessorById(AbstractRecordBatch.java:153)
 
~[drill-java-exec-0.7.0-incubating-SNAPSHOT-rebuffed.jar:0.7.0-incubating-SNAPSHOT]
        at 
org.apache.drill.exec.test.generated.HashJoinProbeGen249.doSetup(HashJoinProbeTemplate.java:46)
 ~[na:na]
        at 
org.apache.drill.exec.test.generated.HashJoinProbeGen249.setupHashJoinProbe(HashJoinProbeTemplate.java:97)
 ~[na:na]
        at 
org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext(HashJoinBatch.java:226)
 
~[drill-java-exec-0.7.0-incubating-SNAPSHOT-rebuffed.jar:0.7.0-incubating-SNAPSHOT]






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-1788) Conflicting column names in join

Reply via email to