Alessandro Solimando created CALCITE-7173:
---------------------------------------------
Summary: Improve RelMdDistinctRowCount estimation for lossless
casts
Key: CALCITE-7173
URL: https://issues.apache.org/jira/browse/CALCITE-7173
Project: Calcite
Issue Type: Improvement
Components: core
Affects Versions: 1.40.0
Reporter: Alessandro Solimando
Assignee: Alessandro Solimando
Consider the following test for _RelMetadataTest_:
{code:java}
@Test
void testAggregateDistinctRowCountLosslessCast() {
final String values = "values ('b', 10), ('b', 20), ('b', 30)";
final String sql =
"select name, cast(sal as varchar(11)) from (" + values + ") t(name,
sal) " +
"group by name, cast(sal as varchar(11))";
sql(sql).assertThatDistinctRowCount(bitSetOf(1), is(3d));
}
{code}
The test currently fails as follows:
{noformat}
Expected: is <3.0>
but: was <1.6439107033725735>
{noformat}
For lossless casts (and in general for injective functions), one would expect
"NDV(CAST($i)) = NDV($i)" to hold.
A minimal fix would enhance
[RelMdUtil.java#L596|https://github.com/apache/calcite/blob/calcite-1.40.0/core/src/main/java/org/apache/calcite/rel/metadata/RelMdUtil.java#L596]
to consider lossless casts as references to input fields, since it's only used
in
[RelMdDistinctRowCount|https://github.com/apache/calcite/blob/calcite-1.40.0/core/src/main/java/org/apache/calcite/rel/metadata/RelMdDistinctRowCount.java#L258]
and with the same exact spirit in
[RelMdPopulationSize|https://github.com/apache/calcite/blob/calcite-1.40.0/core/src/main/java/org/apache/calcite/rel/metadata/RelMdPopulationSize.java#L138].
--
This message was sent by Atlassian Jira
(v8.20.10#820010)