Stamatis Zampetakis created CALCITE-6640:
--------------------------------------------
Summary: RelMdUniqueKeys grows exponentially when key columns are
repeated in projections
Key: CALCITE-6640
URL: https://issues.apache.org/jira/browse/CALCITE-6640
Project: Calcite
Issue Type: Bug
Components: core
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis
Consider the following table where empno is a unique key column.
{code:sql}
CREATE TABLE emp (
empno INT,
ename VARCHAR,
job VARCHAR
PRIMARY KEY (empno));
{code}
The results of RelMetadataQuery#getUniqueKeys for the following queries are as
follows:
{code:sql}
SELECT empno FROM emp;
{0}
SELECT ename, empno FROM emp;
{1}
SELECT empno, ename, empno FROM emp;
{0}, {2}, {0, 2}
SELECT empno, ename, empno, empno FROM emp;
{0}, {2}, {3}, {0, 2}, {0 3}, {2, 3}, {0, 2, 3}
{code}
When key columns are repeated in the project the result grows exponentially.
This makes the unique key computation very expensive when there are many keys
or when keys are repeated multiple times. The problem can lead to OOM errors
and queries/rules hanging forever while trying to extract the keys.
Observe, that the results above are not minimal so currently we are creating
and returning a lot of redundant information.
{noformat}
{0}, {2}, {3}, {0, 2}, {0 3}, {2, 3}, {0, 2, 3}
{noformat}
If we know that \{0\}, \{2\}, and \{3\} are unique keys individually then any
superset of those is also a unique key so it is sufficient to return just those.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)