[ 
https://issues.apache.org/jira/browse/CALCITE-5894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17752179#comment-17752179
 ] 

Julian Hyde commented on CALCITE-5894:
--------------------------------------

We can generalize a little. If we have "ORDER BY w, x, y, z" then we can remove 
"z" if "z" is functionally dependent (FD) on (w, x, y).

Clearly if w, x or y are individually unique then z is FD on (w, x, y).

But if (w, x, y) or (w, x) or (w, y) are unique (i.e. is a composite key) then 
z can be removed also.

And if z may be FD for other reasons. Suppose we know that z == x + y. Then z 
is FD on (x, y).

{{SortRemoveConstantKeysRule}} (added in CALCITE-873) is a case of this (a 
constant is functionally dependent on anything).

{{Sort}} already ensures that the sort keys are distinct. "ORDER BY x, y, x" 
would be a violation of the FD principle, because x is functionally dependent 
on x.

If you can implement this in terms of FD, rather than unique keys per se, then 
you will be able to handle cases such as "SELECT deptno FROM emp ORDER BY 
deptno, deptno / 2". FD might require a new kind of metadata (similar to 
{{RelMdUniqueKeys}} but going further) and therefore could be handled in a 
separate Jira case.

> Add SortRemoveRedundantRule to remove redundant sort fields if sort fields 
> contains unique key
> ----------------------------------------------------------------------------------------------
>
>                 Key: CALCITE-5894
>                 URL: https://issues.apache.org/jira/browse/CALCITE-5894
>             Project: Calcite
>          Issue Type: New Feature
>            Reporter: JingDas
>            Assignee: JingDas
>            Priority: Minor
>
> In some scene, Sort fields can be reduct, if sort fields contain unique key
> For example
> {code:java}
> SELECT ename, salary FROM Emp
> order by empno, ename{code}
> where `empno` is a key,  `ename` is redundant since `empno` alone is 
> sufficient to determine the order of any two records.
> So the SQL can be optimized as following:
> {code:java}
> SELECT name, Emp.salary FROM Emp
> order by empno{code}
> For another example:
> {code:java}
> SELECT e_agg.c, e_agg.ename
> FROM
> (SELECT count(*) as c, ename, job FROM Emp GROUP BY ename, job) AS e_agg
> ORDER BY e_agg.ename, e_agg.c {code}
> Although `e_agg.ename` is not a key but field `ename` is unique and not null, 
> it can be optimized as following:
> {code:java}
> SELECT e_agg.c, e_agg.ename
> FROM (SELECT count(*) as c, ename, job FROM Emp GROUP BY ename, job) AS e_agg
> ORDER BY e_agg.ename{code}
> Sorting is an expensive operation, however. Therefore, it is imperative that 
> sorting
> is optimized to avoid unnecessary sort field.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to