[ 
https://issues.apache.org/jira/browse/DRILL-1162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16176587#comment-16176587
 ] 

ASF GitHub Bot commented on DRILL-1162:
---------------------------------------

Github user amansinha100 commented on a diff in the pull request:

    https://github.com/apache/drill/pull/905#discussion_r140417433
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillRelMdMaxRowCount.java
 ---
    @@ -0,0 +1,71 @@
    +/*
    +* Licensed to the Apache Software Foundation (ASF) under one or more
    +* contributor license agreements.  See the NOTICE file distributed with
    +* this work for additional information regarding copyright ownership.
    +* The ASF licenses this file to you under the Apache License, Version 2.0
    +* (the "License"); you may not use this file except in compliance with
    +* the License.  You may obtain a copy of the License at
    +*
    +* http://www.apache.org/licenses/LICENSE-2.0
    +*
    +* Unless required by applicable law or agreed to in writing, software
    +* distributed under the License is distributed on an "AS IS" BASIS,
    +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +* See the License for the specific language governing permissions and
    +* limitations under the License.
    +*/
    +package org.apache.drill.exec.planner.cost;
    +
    +import org.apache.calcite.plan.volcano.RelSubset;
    +import org.apache.calcite.rel.SingleRel;
    +import org.apache.calcite.rel.core.TableScan;
    +import org.apache.calcite.rel.metadata.ReflectiveRelMetadataProvider;
    +import org.apache.calcite.rel.metadata.RelMdMaxRowCount;
    +import org.apache.calcite.rel.metadata.RelMetadataProvider;
    +import org.apache.calcite.rel.metadata.RelMetadataQuery;
    +import org.apache.calcite.util.BuiltInMethod;
    +import org.apache.drill.exec.planner.physical.AbstractPrel;
    +import org.apache.drill.exec.planner.physical.ScanPrel;
    +
    +/**
    + * DrillRelMdMaxRowCount supplies a specific implementation of
    + * {@link RelMetadataQuery#getMaxRowCount} for Drill.
    + */
    +public class DrillRelMdMaxRowCount extends RelMdMaxRowCount {
    +
    +  private static final DrillRelMdMaxRowCount INSTANCE = new 
DrillRelMdMaxRowCount();
    +
    +  public static final RelMetadataProvider SOURCE = 
ReflectiveRelMetadataProvider.reflectiveSource(BuiltInMethod.MAX_ROW_COUNT.method,
 INSTANCE);
    +
    +  public Double getMaxRowCount(ScanPrel rel, RelMetadataQuery mq) {
    +    // the actual row count is known so returns its value
    +    return rel.estimateRowCount(mq);
    --- End diff --
    
    Returning 'estimated' row count means that this is just an estimate, not 
the actual value which could be higher. Looking at the implementation of 
estimatedRowCount() for several of the storage/format plugins, there are 
several that use NO_EXACT_ROW_COUNT.  for instance see [1] for the text format 
plugin.  So, I feel overloading getMaxRowCount() to return an estimate may 
cause problems.  If you look at the semantics of getMaxRowCount in Calcite's 
RelMdMaxRowCount, it is only intended for cases where **_during planning 
time_** we can guarantee that the max row count will never exceed that value.  
For example,  an Aggregate with no group-by clause or a LIMIT etc.  
    
    
    [1] 
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/easy/EasyFormatPlugin.java#L186


> 25 way join ended up with OOM
> -----------------------------
>
>                 Key: DRILL-1162
>                 URL: https://issues.apache.org/jira/browse/DRILL-1162
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow, Query Planning & Optimization
>            Reporter: Rahul Challapalli
>            Assignee: Volodymyr Vysotskyi
>            Priority: Critical
>             Fix For: Future
>
>         Attachments: error.log, oom_error.log
>
>
> git.commit.id.abbrev=e5c2da0
> The below query results in 0 results being returned 
> {code:sql}
> select count(*) from `lineitem1.parquet` a 
> inner join `part.parquet` j on a.l_partkey = j.p_partkey 
> inner join `orders.parquet` k on a.l_orderkey = k.o_orderkey 
> inner join `supplier.parquet` l on a.l_suppkey = l.s_suppkey 
> inner join `partsupp.parquet` m on j.p_partkey = m.ps_partkey and l.s_suppkey 
> = m.ps_suppkey 
> inner join `customer.parquet` n on k.o_custkey = n.c_custkey 
> inner join `lineitem2.parquet` b on a.l_orderkey = b.l_orderkey 
> inner join `lineitem2.parquet` c on a.l_partkey = c.l_partkey 
> inner join `lineitem2.parquet` d on a.l_suppkey = d.l_suppkey 
> inner join `lineitem2.parquet` e on a.l_extendedprice = e.l_extendedprice 
> inner join `lineitem2.parquet` f on a.l_comment = f.l_comment 
> inner join `lineitem2.parquet` g on a.l_shipdate = g.l_shipdate 
> inner join `lineitem2.parquet` h on a.l_commitdate = h.l_commitdate 
> inner join `lineitem2.parquet` i on a.l_receiptdate = i.l_receiptdate 
> inner join `lineitem2.parquet` o on a.l_receiptdate = o.l_receiptdate 
> inner join `lineitem2.parquet` p on a.l_receiptdate = p.l_receiptdate 
> inner join `lineitem2.parquet` q on a.l_receiptdate = q.l_receiptdate 
> inner join `lineitem2.parquet` r on a.l_receiptdate = r.l_receiptdate 
> inner join `lineitem2.parquet` s on a.l_receiptdate = s.l_receiptdate 
> inner join `lineitem2.parquet` t on a.l_receiptdate = t.l_receiptdate 
> inner join `lineitem2.parquet` u on a.l_receiptdate = u.l_receiptdate 
> inner join `lineitem2.parquet` v on a.l_receiptdate = v.l_receiptdate 
> inner join `lineitem2.parquet` w on a.l_receiptdate = w.l_receiptdate 
> inner join `lineitem2.parquet` x on a.l_receiptdate = x.l_receiptdate;
> {code}
> However when we remove the last 'inner join' and run the query it returns 
> '716372534'. Since the last inner join is similar to the one's before it, it 
> should match some records and return the data appropriately.
> The logs indicated that it actually returned 0 results. Attached the log file.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to