[ https://issues.apache.org/jira/browse/DRILL-1162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16176905#comment-16176905 ]
ASF GitHub Bot commented on DRILL-1162: --------------------------------------- Github user jinfengni commented on the issue: https://github.com/apache/drill/pull/905 @vvysotskyi , the example you listed (three tables a,b,c all have same values) seems to be essentially cross-join. For such cases, clearly the current rowCount estimation is way off from the real number, which would impact the estimation of hash join memory cost, and hence the proposed idea would not work. However, I feel it's not a very common case to have two tables joined like a cross join. The question is : does it make sense to modify cost estimation for seemly uncommonly use case? > 25 way join ended up with OOM > ----------------------------- > > Key: DRILL-1162 > URL: https://issues.apache.org/jira/browse/DRILL-1162 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow, Query Planning & Optimization > Reporter: Rahul Challapalli > Assignee: Volodymyr Vysotskyi > Priority: Critical > Fix For: Future > > Attachments: error.log, oom_error.log > > > git.commit.id.abbrev=e5c2da0 > The below query results in 0 results being returned > {code:sql} > select count(*) from `lineitem1.parquet` a > inner join `part.parquet` j on a.l_partkey = j.p_partkey > inner join `orders.parquet` k on a.l_orderkey = k.o_orderkey > inner join `supplier.parquet` l on a.l_suppkey = l.s_suppkey > inner join `partsupp.parquet` m on j.p_partkey = m.ps_partkey and l.s_suppkey > = m.ps_suppkey > inner join `customer.parquet` n on k.o_custkey = n.c_custkey > inner join `lineitem2.parquet` b on a.l_orderkey = b.l_orderkey > inner join `lineitem2.parquet` c on a.l_partkey = c.l_partkey > inner join `lineitem2.parquet` d on a.l_suppkey = d.l_suppkey > inner join `lineitem2.parquet` e on a.l_extendedprice = e.l_extendedprice > inner join `lineitem2.parquet` f on a.l_comment = f.l_comment > inner join `lineitem2.parquet` g on a.l_shipdate = g.l_shipdate > inner join `lineitem2.parquet` h on a.l_commitdate = h.l_commitdate > inner join `lineitem2.parquet` i on a.l_receiptdate = i.l_receiptdate > inner join `lineitem2.parquet` o on a.l_receiptdate = o.l_receiptdate > inner join `lineitem2.parquet` p on a.l_receiptdate = p.l_receiptdate > inner join `lineitem2.parquet` q on a.l_receiptdate = q.l_receiptdate > inner join `lineitem2.parquet` r on a.l_receiptdate = r.l_receiptdate > inner join `lineitem2.parquet` s on a.l_receiptdate = s.l_receiptdate > inner join `lineitem2.parquet` t on a.l_receiptdate = t.l_receiptdate > inner join `lineitem2.parquet` u on a.l_receiptdate = u.l_receiptdate > inner join `lineitem2.parquet` v on a.l_receiptdate = v.l_receiptdate > inner join `lineitem2.parquet` w on a.l_receiptdate = w.l_receiptdate > inner join `lineitem2.parquet` x on a.l_receiptdate = x.l_receiptdate; > {code} > However when we remove the last 'inner join' and run the query it returns > '716372534'. Since the last inner join is similar to the one's before it, it > should match some records and return the data appropriately. > The logs indicated that it actually returned 0 results. Attached the log file. -- This message was sent by Atlassian JIRA (v6.4.14#64029)