Stamatis Zampetakis created CALCITE-6640:
--------------------------------------------

             Summary: RelMdUniqueKeys grows exponentially when key columns are 
repeated in projections
                 Key: CALCITE-6640
                 URL: https://issues.apache.org/jira/browse/CALCITE-6640
             Project: Calcite
          Issue Type: Bug
          Components: core
            Reporter: Stamatis Zampetakis
            Assignee: Stamatis Zampetakis


Consider the following table where empno is a unique key column.
{code:sql}
CREATE TABLE emp (
 empno INT, 
 ename VARCHAR, 
 job VARCHAR
 PRIMARY KEY (empno));
{code}

The results of RelMetadataQuery#getUniqueKeys for the following queries are as 
follows:

{code:sql}
SELECT empno FROM emp;
{0}
SELECT ename, empno FROM emp;
{1} 
SELECT empno, ename, empno FROM emp;
{0}, {2}, {0, 2}
SELECT empno, ename, empno, empno FROM emp;
{0}, {2}, {3}, {0, 2}, {0 3}, {2, 3}, {0, 2, 3}
{code}
When key columns are repeated in the project the result grows exponentially. 
This makes the unique key computation very expensive when there are many keys 
or when keys are repeated multiple times. The problem can lead to OOM errors 
and queries/rules hanging forever while trying to extract the keys.

Observe, that the results above are not minimal so currently we are creating 
and returning a lot of redundant information.

{noformat}
{0}, {2}, {3}, {0, 2}, {0 3}, {2, 3}, {0, 2, 3}
{noformat}

If we know that \{0\}, \{2\}, and \{3\} are unique keys individually then any 
superset of those is also a unique key so it is sufficient to return just those.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to