[ https://issues.apache.org/jira/browse/IMPALA-7560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17420630#comment-17420630 ]
ASF subversion and git services commented on IMPALA-7560: --------------------------------------------------------- Commit 8862719d87ac5dc214985025463f002d41b15672 in impala's branch refs/heads/branch-4.0.1 from liuyao [ https://gitbox.apache.org/repos/asf?p=impala.git;h=8862719 ] IMPALA-7560: Set selectivity of Not-equal Calculate binary predicate selectivity if one of the children is a slotref and the other children are all constant. eg. something like "col != 5", but not "2 * col != 10" selectivity = 1 - 1/ndv Testing: Modify the function testNeSelectivity() of the ExprCardinalityTest.java, change -1 to the correct value. Change-Id: Icd6f5945840ea2a8194d72aa440ddfa6915cbb3a Reviewed-on: http://gerrit.cloudera.org:8080/17344 Reviewed-by: Qifan Chen <qc...@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Reviewed-by: Zoltan Borok-Nagy <borokna...@cloudera.com> > Better selectivity estimate for != (not equals) binary predicate > ---------------------------------------------------------------- > > Key: IMPALA-7560 > URL: https://issues.apache.org/jira/browse/IMPALA-7560 > Project: IMPALA > Issue Type: Bug > Components: Frontend > Affects Versions: Impala 2.8.0, Impala 2.9.0, Impala 2.10.0, Impala > 2.12.0, Impala 2.13.0 > Reporter: Bharath Vissapragada > Assignee: liuyao > Priority: Major > Fix For: Impala 4.1.0 > > > Currently we use the default selectivity estimate for any binary predicate > with op other than EQ / NON_DISTINCT. > {noformat} > // Determine selectivity > // TODO: Compute selectivity for nested predicates. > // TODO: Improve estimation using histograms. > Reference<SlotRef> slotRefRef = new Reference<SlotRef>(); > if ((op_ == Operator.EQ || op_ == Operator.NOT_DISTINCT) > && isSingleColumnPredicate(slotRefRef, null)) { > long distinctValues = slotRefRef.getRef().getNumDistinctValues(); > if (distinctValues > 0) { > selectivity_ = 1.0 / distinctValues; > selectivity_ = Math.max(0, Math.min(1, selectivity_)); > } > } > {noformat} > This can give very conservative estimates. For example: > {noformat} > [localhost:21000] tpch> select * from nation where n_regionkey != 1; > [localhost:21000] tpch> summary; > +--------------+--------+----------+----------+-------+------------+-----------+---------------+-------------+ > | Operator | #Hosts | Avg Time | Max Time | *#Rows* | *Est. #Rows* | Peak > Mem | Est. Peak Mem | Detail | > +--------------+--------+----------+----------+-------+------------+-----------+---------------+-------------+ > | 00:SCAN HDFS | 1 | 3.32ms | 3.32ms | *20* | *3* | > 143.00 KB | 16.00 MB | tpch.nation | > +--------------+--------+----------+----------+-------+------------+-----------+---------------+-------------+ > [localhost:21000] tpch> > {noformat} > Ideally we could've inversed the selecitivity to 4/5 (=1 - 1/5) that can > give better estimate. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org