Aman Sinha created IMPALA-10615:
-----------------------------------

             Summary: Cardinality estimates for some scalar functions could be 
improved
                 Key: IMPALA-10615
                 URL: https://issues.apache.org/jira/browse/IMPALA-10615
             Project: IMPALA
          Issue Type: Bug
          Components: Frontend
    Affects Versions: Impala 3.4.0
            Reporter: Aman Sinha


The 10% default cardinality estimate for predicates involving most scalar 
functions can be a significant under-estimate.  Consider the following 
cardinality estimate with UPPER():
{noformat}
[localhost:21050] tpch> explain select * from nation where upper(n_name) is not 
null;

| 00:SCAN HDFS [tpch.nation]                                 |
|    HDFS partitions=1/1 files=1 size=2.15KB                 |
|    predicates: upper(n_name) IS NOT NULL                   |
|    row-size=109B cardinality=3                             |
+------------------------------------------------------------+
{noformat}

Since n_name is non-null, the actual cardinality is 25, as shown below:
{noformat}
[localhost:21050] tpch> explain select * from nation where n_name is not null;

| 00:SCAN HDFS [tpch.nation]                                 |
|    HDFS partitions=1/1 files=1 size=2.15KB                 |
|    predicates: n_name IS NOT NULL                          |
|    row-size=109B cardinality=25                            |
+------------------------------------------------------------+
{noformat}

In general, if a scalar function cannot change the nullability of its input, we 
should compute the same selectivity.  
Note that for explicit CAST, we do the right thing:
{noformat}
[localhost:21050] tpch> explain select * from nation where cast(n_name as 
varchar(10)) is not null;

| 00:SCAN HDFS [tpch.nation]                                 |
|    HDFS partitions=1/1 files=1 size=2.15KB                 |
|    predicates: CAST(n_name AS VARCHAR(10)) IS NOT NULL     |
|    row-size=109B cardinality=25                            |
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to