[
https://issues.apache.org/jira/browse/DRILL-280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13824107#comment-13824107
]
Jinfeng Ni commented on DRILL-280:
----------------------------------
Hi Julian,
When the join conditions are in WHERE clauses, if there are multiple different
ways to generate a join tree, we could use any one as long as it does not have
Cartesian join, before we have a cost-based optimizer.
I saw that PushFilterPastJoinRule did the job to push the appropriate
conditions from FilterRel into JoinRel. However, I did not see Optiq generate
all the possible join trees. Did I miss something here?
Use a modified query from your example:
SELECT ...
FROM a, b, c
WHERE b.y = c.y
AND c.z = a.z
Optiq generate a join tree as ( a x b) x c, before it calls Prepare.optimize().
In the call of optimize, it's supposed to enumerate different join tree by
applying different rules. One rule is SwapJoinRule. I saw SwapJoinRule is
applied to (a x b), or the join between (a x b) and c. But I did not see a
join tree like (a x c ) x b, or (b x c ) x a, which are join tree having no
Cartesian join and exactly what we want to have.
To get join tree like (b x c) x a from (a x b) x c, at least 2 steps are
needed:
( a x b) x c
==> a x ( b x c ) : Associativity
==> (b x c) x a : Commutativity
Seems SwapJoinRule is for commutativity. But I did not see the rule for
associativity.
Could you please let me know if Optiq will enumerate all the different join
trees, ( including ( a x c) x b, ( b x c) x a for the above example)? If yes,
could you point me to the rule which will generate such join tree?
Thanks!
> Multiple tables join may get join sequence where two tables joined do not
> have any join condition.
> ----------------------------------------------------------------------------------------------------
>
> Key: DRILL-280
> URL: https://issues.apache.org/jira/browse/DRILL-280
> Project: Apache Drill
> Issue Type: Bug
> Reporter: Jinfeng Ni
> Assignee: Jinfeng Ni
> Priority: Critical
>
> When we have > 2 tables joined together, if we put the join predicates in the
> WHERE clause, the query may get a join sequence where two tables joined
> together do not have any join condition. This will lead to error in Merge
> Join OP, which assume join always have a join condition.
> For example, the following query:
> SELECT S.S_ACCTBAL, S.S_NAME
> FROM
> ( SELECT _MAP['P_PARTKEY'] as P_PARTKEY,
> _MAP['P_MFGR'] as P_MFGR
> FROM "/Users/jni//work/tpc-h-parquet/part") P,
> ( SELECT _MAP['S_SUPPKEY'] AS S_SUPPKEY,
> _MAP['S_NATIONKEY'] AS S_NATIONKEY,
> _MAP['S_ACCTBAL'] AS S_ACCTBAL,
> _MAP['S_NAME'] AS S_NAME,
> _MAP['S_ADDRESS'] AS S_ADDRESS,
> _MAP['S_PHONE'] AS S_PHONE,
> _MAP['S_COMMENT'] AS S_COMMENT
> FROM "/Users/jni//work/tpc-h-parquet/supplier") S,
> (SELECT _MAP['PS_PARTKEY'] AS PS_PARTKEY,
> _MAP['PS_SUPPKEY'] AS PS_SUPPKEY
> FROM "/Users/jni//work/tpc-h-parquet/partsupp") PS
> WHERE P.P_PARTKEY = PS.PS_PARTKEY and
> S.S_SUPPKEY = PS.PS_SUPPKEY
> LIMIT 100;
> The join sequence in logical and physical plan is : P -> S -> PS. However,
> since there is no direct predicate between P and S, the Merge Join between P
> and S will have no join conditions. This lead to the following error in
> MergeJoinBatch.java:
> "Failure while setting up Foreman. < ArrayIndexOutOfBoundsException:[ 0 ]
> Since almost all TPC-H queries have multiple tables joins, it's important
> that we get this issue resolved, in order to run TPC-H queries.
>
--
This message was sent by Atlassian JIRA
(v6.1#6144)