[ https://issues.apache.org/jira/browse/DRILL-1162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16121442#comment-16121442 ]
Volodymyr Vysotskyi commented on DRILL-1162: -------------------------------------------- I was not able to receive out of heap memory for the query, so I could not determine its reasons. As for the direct memory, as I wrote in the earlier comments, that the main reason is wrong row count estimation when joining by PR. Let's consider memory using for hash join operator. Information about direct memory using was taken from the query profiles. For example for queries similar to this query {code:sql} select count(*) from k inner join l on k.l_partkey=l.l_partkey inner join m on m.l_partkey=l.l_partkey inner join n on m.l_partkey=n.l_partkey inner join o on n.l_partkey=o.l_partkey inner join p on p.l_partkey=o.l_partkey; {code} Hash join operator uses such memory size: ||join||build side row count||Max Peak Memory for HASH_JOIN|| |k and l|10000|1MB| |m and (k and l)|109,920|2MB| |n and (m and (k and l))|1,867,062|16MB| |o and (n and (m and (k and l)))|43,037,076|338MB| |o and (n and (m1 and (k and l)))|49,446,827|388MB| |o and (n1 and (m1 and (k and l)))|54,949,346|431MB| |q and (p and (m and (k and l)))|61,771,300|484MB| |p and (o and (n and (k and m)))|99,483,263|778MB| |s and (q and (p and (k and o)))|330,512,646|2.52GB| m1, n1, q and s tables are created by queries {code:sql} create table n1 as select * from `lineitem1.parquet` limit 45020; create table m1 as select * from `lineitem1.parquet` limit 35010; create table q as (select l_partkey from `lineitem1.parquet` union all select l_partkey from `lineitem1.parquet` limit 70050); create table s as (select l_partkey from `lineitem1.parquet` union all select l_partkey from `lineitem1.parquet` limit 90050); {code} As you can see from the table, hash join operator uses direct memory in proportion to the rows count of the build side. For the query, similar to the query from Jira description, the ratio of memory used by the hash join operator to build side row count almost the same for the simplified query (it is executed without OOM) {code:sql} select count(*) from `lineitem1.parquet` a inner join `part.parquet` j on a.l_partkey = j.p_partkey inner join `orders.parquet` k on a.l_orderkey = k.o_orderkey inner join `supplier.parquet` l on a.l_suppkey = l.s_suppkey inner join `partsupp.parquet` m on j.p_partkey = m.ps_partkey and l.s_suppkey = m.ps_suppkey inner join `customer.parquet` n on k.o_custkey = n.c_custkey inner join `lineitem2.parquet` b on a.l_orderkey = b.l_orderkey inner join `lineitem2.parquet` d on a.l_suppkey = d.l_suppkey inner join `lineitem2.parquet` e on a.l_extendedprice = e.l_extendedprice; {code} and for query that fails: {code:sql} select count(*) from `lineitem1.parquet` a inner join `part.parquet` j on a.l_partkey = j.p_partkey inner join `orders.parquet` k on a.l_orderkey = k.o_orderkey inner join `supplier.parquet` l on a.l_suppkey = l.s_suppkey inner join `partsupp.parquet` m on j.p_partkey = m.ps_partkey and l.s_suppkey = m.ps_suppkey inner join `customer.parquet` n on k.o_custkey = n.c_custkey inner join `lineitem2.parquet` c on a.l_partkey = c.l_partkey inner join `lineitem2.parquet` d on a.l_suppkey = d.l_suppkey inner join `lineitem2.parquet` e on a.l_extendedprice = e.l_extendedprice; {code} 2.08GB / 181,675,567 and 9.02GB / 787,992,000 respectively. So the size of memory that is used by hash join operator is proportional to the row count of build side. That's why the actual issue is the row count estimation that causes plan in which right input of hash join operator actually has larger rows number than probe side. Creating or modifying existing planning rules may change plan more significantly, so I don't think that we should do this for the current bug. But swapping the join inputs only for hash join when the best physical plan was chosen should not cause regressions. Besides that this swapping which I am proposed will be happening very rare since all conditions that I described in my previous comment should be satisfied. There is not enough information to make a strict decision about swapping, so we only check that the worst case may happen. I was looking for a better decision but taking into account the shortage of information about rows count I chose this fix. > 25 way join ended up with OOM > ----------------------------- > > Key: DRILL-1162 > URL: https://issues.apache.org/jira/browse/DRILL-1162 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow, Query Planning & Optimization > Reporter: Rahul Challapalli > Assignee: Volodymyr Vysotskyi > Priority: Critical > Fix For: Future > > Attachments: error.log, oom_error.log > > > git.commit.id.abbrev=e5c2da0 > The below query results in 0 results being returned > {code:sql} > select count(*) from `lineitem1.parquet` a > inner join `part.parquet` j on a.l_partkey = j.p_partkey > inner join `orders.parquet` k on a.l_orderkey = k.o_orderkey > inner join `supplier.parquet` l on a.l_suppkey = l.s_suppkey > inner join `partsupp.parquet` m on j.p_partkey = m.ps_partkey and l.s_suppkey > = m.ps_suppkey > inner join `customer.parquet` n on k.o_custkey = n.c_custkey > inner join `lineitem2.parquet` b on a.l_orderkey = b.l_orderkey > inner join `lineitem2.parquet` c on a.l_partkey = c.l_partkey > inner join `lineitem2.parquet` d on a.l_suppkey = d.l_suppkey > inner join `lineitem2.parquet` e on a.l_extendedprice = e.l_extendedprice > inner join `lineitem2.parquet` f on a.l_comment = f.l_comment > inner join `lineitem2.parquet` g on a.l_shipdate = g.l_shipdate > inner join `lineitem2.parquet` h on a.l_commitdate = h.l_commitdate > inner join `lineitem2.parquet` i on a.l_receiptdate = i.l_receiptdate > inner join `lineitem2.parquet` o on a.l_receiptdate = o.l_receiptdate > inner join `lineitem2.parquet` p on a.l_receiptdate = p.l_receiptdate > inner join `lineitem2.parquet` q on a.l_receiptdate = q.l_receiptdate > inner join `lineitem2.parquet` r on a.l_receiptdate = r.l_receiptdate > inner join `lineitem2.parquet` s on a.l_receiptdate = s.l_receiptdate > inner join `lineitem2.parquet` t on a.l_receiptdate = t.l_receiptdate > inner join `lineitem2.parquet` u on a.l_receiptdate = u.l_receiptdate > inner join `lineitem2.parquet` v on a.l_receiptdate = v.l_receiptdate > inner join `lineitem2.parquet` w on a.l_receiptdate = w.l_receiptdate > inner join `lineitem2.parquet` x on a.l_receiptdate = x.l_receiptdate; > {code} > However when we remove the last 'inner join' and run the query it returns > '716372534'. Since the last inner join is similar to the one's before it, it > should match some records and return the data appropriately. > The logs indicated that it actually returned 0 results. Attached the log file. -- This message was sent by Atlassian JIRA (v6.4.14#64029)