[ 
https://issues.apache.org/jira/browse/DRILL-7227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831193#comment-16831193
 ] 

ASF GitHub Bot commented on DRILL-7227:
---------------------------------------

gparai commented on pull request #1775: DRILL-7227: Fix predicate check in 
DrillRelOptUtil.analyzeSimpleEquiJoin
URL: https://github.com/apache/drill/pull/1775#discussion_r280181002
 
 

 ##########
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillRelMdDistinctRowCount.java
 ##########
 @@ -155,18 +155,18 @@ private Double getDistinctRowCountInternal(TableScan 
scan, RelMetadataQuery mq,
     }
 
     double s = 1.0;
-    boolean allCols = true;
+    boolean allColsHaveNDV = true;
     for (int i = 0; i < groupKey.length(); i++) {
       final String colName = type.getFieldNames().get(i);
-      // Skip NDV, if not available
       if (!groupKey.get(i)) {
-        allCols = false;
-        break;
+        continue;
       }
       ColumnStatistics columnStatistics = tableMetadata != null ?
           tableMetadata.getColumnStatistics(SchemaPath.getSimplePath(colName)) 
: null;
       Double ndv = columnStatistics != null ? (Double) 
columnStatistics.getStatistic(ColumnStatisticsKind.NDV) : null;
+      // Skip NDV, if not available
       if (ndv == null) {
+        allColsHaveNDV = false;
 
 Review comment:
   Done.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> TPCDS queries 47, 57, 59 fail to run with Statistics enabled at sf100
> ---------------------------------------------------------------------
>
>                 Key: DRILL-7227
>                 URL: https://issues.apache.org/jira/browse/DRILL-7227
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Metadata
>    Affects Versions: 1.16.0
>            Reporter: Robert Hou
>            Assignee: Gautam Parai
>            Priority: Major
>             Fix For: 1.17.0
>
>         Attachments: 23387ab0-cb1c-cd5e-449a-c9bcefc901c1.sys.drill, 
> 2338ae93-155b-356d-382e-0da949c6f439.sys.drill
>
>
> Here is query 78:
> {noformat}
> WITH ws 
>      AS (SELECT d_year                 AS ws_sold_year, 
>                 ws_item_sk, 
>                 ws_bill_customer_sk    ws_customer_sk, 
>                 Sum(ws_quantity)       ws_qty, 
>                 Sum(ws_wholesale_cost) ws_wc, 
>                 Sum(ws_sales_price)    ws_sp 
>          FROM   web_sales 
>                 LEFT JOIN web_returns 
>                        ON wr_order_number = ws_order_number 
>                           AND ws_item_sk = wr_item_sk 
>                 JOIN date_dim 
>                   ON ws_sold_date_sk = d_date_sk 
>          WHERE  wr_order_number IS NULL 
>          GROUP  BY d_year, 
>                    ws_item_sk, 
>                    ws_bill_customer_sk), 
>      cs 
>      AS (SELECT d_year                 AS cs_sold_year, 
>                 cs_item_sk, 
>                 cs_bill_customer_sk    cs_customer_sk, 
>                 Sum(cs_quantity)       cs_qty, 
>                 Sum(cs_wholesale_cost) cs_wc, 
>                 Sum(cs_sales_price)    cs_sp 
>          FROM   catalog_sales 
>                 LEFT JOIN catalog_returns 
>                        ON cr_order_number = cs_order_number 
>                           AND cs_item_sk = cr_item_sk 
>                 JOIN date_dim 
>                   ON cs_sold_date_sk = d_date_sk 
>          WHERE  cr_order_number IS NULL 
>          GROUP  BY d_year, 
>                    cs_item_sk, 
>                    cs_bill_customer_sk), 
>      ss 
>      AS (SELECT d_year                 AS ss_sold_year, 
>                 ss_item_sk, 
>                 ss_customer_sk, 
>                 Sum(ss_quantity)       ss_qty, 
>                 Sum(ss_wholesale_cost) ss_wc, 
>                 Sum(ss_sales_price)    ss_sp 
>          FROM   store_sales 
>                 LEFT JOIN store_returns 
>                        ON sr_ticket_number = ss_ticket_number 
>                           AND ss_item_sk = sr_item_sk 
>                 JOIN date_dim 
>                   ON ss_sold_date_sk = d_date_sk 
>          WHERE  sr_ticket_number IS NULL 
>          GROUP  BY d_year, 
>                    ss_item_sk, 
>                    ss_customer_sk) 
> SELECT ss_item_sk, 
>                Round(ss_qty / ( COALESCE(ws_qty + cs_qty, 1) ), 2) ratio, 
>                ss_qty                                              store_qty, 
>                ss_wc 
>                store_wholesale_cost, 
>                ss_sp 
>                store_sales_price, 
>                COALESCE(ws_qty, 0) + COALESCE(cs_qty, 0) 
>                other_chan_qty, 
>                COALESCE(ws_wc, 0) + COALESCE(cs_wc, 0) 
>                other_chan_wholesale_cost, 
>                COALESCE(ws_sp, 0) + COALESCE(cs_sp, 0) 
>                other_chan_sales_price 
> FROM   ss 
>        LEFT JOIN ws 
>               ON ( ws_sold_year = ss_sold_year 
>                    AND ws_item_sk = ss_item_sk 
>                    AND ws_customer_sk = ss_customer_sk ) 
>        LEFT JOIN cs 
>               ON ( cs_sold_year = ss_sold_year 
>                    AND cs_item_sk = cs_item_sk 
>                    AND cs_customer_sk = ss_customer_sk ) 
> WHERE  COALESCE(ws_qty, 0) > 0 
>        AND COALESCE(cs_qty, 0) > 0 
>        AND ss_sold_year = 1999 
> ORDER  BY ss_item_sk, 
>           ss_qty DESC, 
>           ss_wc DESC, 
>           ss_sp DESC, 
>           other_chan_qty, 
>           other_chan_wholesale_cost, 
>           other_chan_sales_price, 
>           Round(ss_qty / ( COALESCE(ws_qty + cs_qty, 1) ), 2)
> LIMIT 100; 
> {noformat}
> The profile for the new plan is 2338ae93-155b-356d-382e-0da949c6f439.  Hash 
> partition sender operator (10-00) takes 10-15 minutes.  I am not sure why it 
> takes so long.  It has 10 minor fragments sending to receiver (06-05), which 
> has 62 minor fragments.  But hash partition sender (16-00) has 10 minor 
> fragments sending to receiver (12-06), which has 220 minor fragments, and 
> there is no performance issue.
> The profile for the old plan is 23387ab0-cb1c-cd5e-449a-c9bcefc901c1.  Both 
> plans use the same commit.  The old plan is created by disabling statistics.
> I have not included the plans in the Jira because Jira has a max of 32K.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to