[ https://issues.apache.org/jira/browse/DRILL-7227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831193#comment-16831193 ]
ASF GitHub Bot commented on DRILL-7227: --------------------------------------- gparai commented on pull request #1775: DRILL-7227: Fix predicate check in DrillRelOptUtil.analyzeSimpleEquiJoin URL: https://github.com/apache/drill/pull/1775#discussion_r280181002 ########## File path: exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillRelMdDistinctRowCount.java ########## @@ -155,18 +155,18 @@ private Double getDistinctRowCountInternal(TableScan scan, RelMetadataQuery mq, } double s = 1.0; - boolean allCols = true; + boolean allColsHaveNDV = true; for (int i = 0; i < groupKey.length(); i++) { final String colName = type.getFieldNames().get(i); - // Skip NDV, if not available if (!groupKey.get(i)) { - allCols = false; - break; + continue; } ColumnStatistics columnStatistics = tableMetadata != null ? tableMetadata.getColumnStatistics(SchemaPath.getSimplePath(colName)) : null; Double ndv = columnStatistics != null ? (Double) columnStatistics.getStatistic(ColumnStatisticsKind.NDV) : null; + // Skip NDV, if not available if (ndv == null) { + allColsHaveNDV = false; Review comment: Done. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > TPCDS queries 47, 57, 59 fail to run with Statistics enabled at sf100 > --------------------------------------------------------------------- > > Key: DRILL-7227 > URL: https://issues.apache.org/jira/browse/DRILL-7227 > Project: Apache Drill > Issue Type: Bug > Components: Metadata > Affects Versions: 1.16.0 > Reporter: Robert Hou > Assignee: Gautam Parai > Priority: Major > Fix For: 1.17.0 > > Attachments: 23387ab0-cb1c-cd5e-449a-c9bcefc901c1.sys.drill, > 2338ae93-155b-356d-382e-0da949c6f439.sys.drill > > > Here is query 78: > {noformat} > WITH ws > AS (SELECT d_year AS ws_sold_year, > ws_item_sk, > ws_bill_customer_sk ws_customer_sk, > Sum(ws_quantity) ws_qty, > Sum(ws_wholesale_cost) ws_wc, > Sum(ws_sales_price) ws_sp > FROM web_sales > LEFT JOIN web_returns > ON wr_order_number = ws_order_number > AND ws_item_sk = wr_item_sk > JOIN date_dim > ON ws_sold_date_sk = d_date_sk > WHERE wr_order_number IS NULL > GROUP BY d_year, > ws_item_sk, > ws_bill_customer_sk), > cs > AS (SELECT d_year AS cs_sold_year, > cs_item_sk, > cs_bill_customer_sk cs_customer_sk, > Sum(cs_quantity) cs_qty, > Sum(cs_wholesale_cost) cs_wc, > Sum(cs_sales_price) cs_sp > FROM catalog_sales > LEFT JOIN catalog_returns > ON cr_order_number = cs_order_number > AND cs_item_sk = cr_item_sk > JOIN date_dim > ON cs_sold_date_sk = d_date_sk > WHERE cr_order_number IS NULL > GROUP BY d_year, > cs_item_sk, > cs_bill_customer_sk), > ss > AS (SELECT d_year AS ss_sold_year, > ss_item_sk, > ss_customer_sk, > Sum(ss_quantity) ss_qty, > Sum(ss_wholesale_cost) ss_wc, > Sum(ss_sales_price) ss_sp > FROM store_sales > LEFT JOIN store_returns > ON sr_ticket_number = ss_ticket_number > AND ss_item_sk = sr_item_sk > JOIN date_dim > ON ss_sold_date_sk = d_date_sk > WHERE sr_ticket_number IS NULL > GROUP BY d_year, > ss_item_sk, > ss_customer_sk) > SELECT ss_item_sk, > Round(ss_qty / ( COALESCE(ws_qty + cs_qty, 1) ), 2) ratio, > ss_qty store_qty, > ss_wc > store_wholesale_cost, > ss_sp > store_sales_price, > COALESCE(ws_qty, 0) + COALESCE(cs_qty, 0) > other_chan_qty, > COALESCE(ws_wc, 0) + COALESCE(cs_wc, 0) > other_chan_wholesale_cost, > COALESCE(ws_sp, 0) + COALESCE(cs_sp, 0) > other_chan_sales_price > FROM ss > LEFT JOIN ws > ON ( ws_sold_year = ss_sold_year > AND ws_item_sk = ss_item_sk > AND ws_customer_sk = ss_customer_sk ) > LEFT JOIN cs > ON ( cs_sold_year = ss_sold_year > AND cs_item_sk = cs_item_sk > AND cs_customer_sk = ss_customer_sk ) > WHERE COALESCE(ws_qty, 0) > 0 > AND COALESCE(cs_qty, 0) > 0 > AND ss_sold_year = 1999 > ORDER BY ss_item_sk, > ss_qty DESC, > ss_wc DESC, > ss_sp DESC, > other_chan_qty, > other_chan_wholesale_cost, > other_chan_sales_price, > Round(ss_qty / ( COALESCE(ws_qty + cs_qty, 1) ), 2) > LIMIT 100; > {noformat} > The profile for the new plan is 2338ae93-155b-356d-382e-0da949c6f439. Hash > partition sender operator (10-00) takes 10-15 minutes. I am not sure why it > takes so long. It has 10 minor fragments sending to receiver (06-05), which > has 62 minor fragments. But hash partition sender (16-00) has 10 minor > fragments sending to receiver (12-06), which has 220 minor fragments, and > there is no performance issue. > The profile for the old plan is 23387ab0-cb1c-cd5e-449a-c9bcefc901c1. Both > plans use the same commit. The old plan is created by disabling statistics. > I have not included the plans in the Jira because Jira has a max of 32K. -- This message was sent by Atlassian JIRA (v7.6.3#76005)