[
https://issues.apache.org/jira/browse/DRILL-280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13818633#comment-13818633
]
Julian Hyde commented on DRILL-280:
-----------------------------------
Optiq's policy is to generate JoinRel operators only if you use ANSI join
syntax (a JOIN b ...). But if you write an old-style FROM clause, 'FROM a, b',
Optiq treats it as if you'd written 'a JOIN b ON true'. Then it's up to another
rule to push down the condition from the WHERE clause.
Converting conditions in the WHERE clause into join conditions may sound
trivial, but isn't. To see why it isn't trivial, consider
SELECT ...
FROM a, b, c
WHERE a.x = b.x
AND b.y = c.y
AND c.z = a.z
We need to generate a join tree, but there are 3 different ways to do it
(a-b-c, a-c-b, b-a-c). It's not obvious which is the best tree, so this is best
done within a cost-based optimizer, so you can try all of the possibilities.
In this case there's only one option, so PushFilterPastJoinRule will probably
do the trick.
> Multiple tables join may get join sequence where two tables joined do not
> have any join condition.
> ----------------------------------------------------------------------------------------------------
>
> Key: DRILL-280
> URL: https://issues.apache.org/jira/browse/DRILL-280
> Project: Apache Drill
> Issue Type: Bug
> Reporter: Jinfeng Ni
> Assignee: Jinfeng Ni
> Priority: Critical
>
> When we have > 2 tables joined together, if we put the join predicates in the
> WHERE clause, the query may get a join sequence where two tables joined
> together do not have any join condition. This will lead to error in Merge
> Join OP, which assume join always have a join condition.
> For example, the following query:
> SELECT S.S_ACCTBAL, S.S_NAME
> FROM
> ( SELECT _MAP['P_PARTKEY'] as P_PARTKEY,
> _MAP['P_MFGR'] as P_MFGR
> FROM "/Users/jni//work/tpc-h-parquet/part") P,
> ( SELECT _MAP['S_SUPPKEY'] AS S_SUPPKEY,
> _MAP['S_NATIONKEY'] AS S_NATIONKEY,
> _MAP['S_ACCTBAL'] AS S_ACCTBAL,
> _MAP['S_NAME'] AS S_NAME,
> _MAP['S_ADDRESS'] AS S_ADDRESS,
> _MAP['S_PHONE'] AS S_PHONE,
> _MAP['S_COMMENT'] AS S_COMMENT
> FROM "/Users/jni//work/tpc-h-parquet/supplier") S,
> (SELECT _MAP['PS_PARTKEY'] AS PS_PARTKEY,
> _MAP['PS_SUPPKEY'] AS PS_SUPPKEY
> FROM "/Users/jni//work/tpc-h-parquet/partsupp") PS
> WHERE P.P_PARTKEY = PS.PS_PARTKEY and
> S.S_SUPPKEY = PS.PS_SUPPKEY
> LIMIT 100;
> The join sequence in logical and physical plan is : P -> S -> PS. However,
> since there is no direct predicate between P and S, the Merge Join between P
> and S will have no join conditions. This lead to the following error in
> MergeJoinBatch.java:
> "Failure while setting up Foreman. < ArrayIndexOutOfBoundsException:[ 0 ]
> Since almost all TPC-H queries have multiple tables joins, it's important
> that we get this issue resolved, in order to run TPC-H queries.
>
--
This message was sent by Atlassian JIRA
(v6.1#6144)