Alessandro Solimando created CALCITE-7173:
---------------------------------------------

             Summary: Improve RelMdDistinctRowCount estimation for lossless 
casts
                 Key: CALCITE-7173
                 URL: https://issues.apache.org/jira/browse/CALCITE-7173
             Project: Calcite
          Issue Type: Improvement
          Components: core
    Affects Versions: 1.40.0
            Reporter: Alessandro Solimando
            Assignee: Alessandro Solimando


Consider the following test for _RelMetadataTest_:
{code:java}
  @Test
  void testAggregateDistinctRowCountLosslessCast() {
    final String values = "values ('b', 10), ('b', 20), ('b', 30)";
    final String sql =
        "select name, cast(sal as varchar(11)) from (" + values + ") t(name, 
sal) " +
            "group by name, cast(sal as varchar(11))";

    sql(sql).assertThatDistinctRowCount(bitSetOf(1), is(3d));
  }
{code}

The test currently fails as follows:

{noformat}
Expected: is <3.0>
     but: was <1.6439107033725735>
{noformat}

For lossless casts (and in general for injective functions), one would expect 
"NDV(CAST($i)) = NDV($i)" to hold.

A minimal fix would enhance 
[RelMdUtil.java#L596|https://github.com/apache/calcite/blob/calcite-1.40.0/core/src/main/java/org/apache/calcite/rel/metadata/RelMdUtil.java#L596]
 to consider lossless casts as references to input fields, since it's only used 
in 
[RelMdDistinctRowCount|https://github.com/apache/calcite/blob/calcite-1.40.0/core/src/main/java/org/apache/calcite/rel/metadata/RelMdDistinctRowCount.java#L258]
 and with the same exact spirit in 
[RelMdPopulationSize|https://github.com/apache/calcite/blob/calcite-1.40.0/core/src/main/java/org/apache/calcite/rel/metadata/RelMdPopulationSize.java#L138].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to