[jira] [Commented] (CALCITE-5894) Add SortRemoveRedundantRule to remove redundant sort fields if sort fields contains unique key

Julian Hyde (Jira) Tue, 08 Aug 2023 13:41:04 -0700


    [ 
https://issues.apache.org/jira/browse/CALCITE-5894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17752177#comment-17752177
 ]


Julian Hyde commented on CALCITE-5894:
--------------------------------------

[~libenchao], There is a significant benefit to removing redundant fields. If 
you are doing an external merge sort, those redundant fields will be added to 
the partition file. Even if those fields are never used, they incur an IO cost 
when the file is written and read; also, assuming that sort memory is fixed, 
the number of initial partitions in a merge-sort will be larger (because each 
partition contains fewer rows), and therefore the tree depth (the 'log N' in 'N 
log N') may be larger.

> Add SortRemoveRedundantRule to remove redundant sort fields if sort fields 
> contains unique key
> ----------------------------------------------------------------------------------------------
>
>                 Key: CALCITE-5894
>                 URL: https://issues.apache.org/jira/browse/CALCITE-5894
>             Project: Calcite
>          Issue Type: New Feature
>            Reporter: JingDas
>            Assignee: JingDas
>            Priority: Minor
>
> In some scene, Sort fields can be reduct, if sort fields contain unique key
> For example
> {code:java}
> SELECT ename, salary FROM Emp
> order by empno, ename{code}
> where `empno` is a key,  `ename` is redundant since `empno` alone is 
> sufficient to determine the order of any two records.
> So the SQL can be optimized as following:
> {code:java}
> SELECT name, Emp.salary FROM Emp
> order by empno{code}
> For another example:
> {code:java}
> SELECT e_agg.c, e_agg.ename
> FROM
> (SELECT count(*) as c, ename, job FROM Emp GROUP BY ename, job) AS e_agg
> ORDER BY e_agg.ename, e_agg.c {code}
> Although `e_agg.ename` is not a key but field `ename` is unique and not null, 
> it can be optimized as following:
> {code:java}
> SELECT e_agg.c, e_agg.ename
> FROM (SELECT count(*) as c, ename, job FROM Emp GROUP BY ename, job) AS e_agg
> ORDER BY e_agg.ename{code}
> Sorting is an expensive operation, however. Therefore, it is imperative that 
> sorting
> is optimized to avoid unnecessary sort field.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-5894) Add SortRemoveRedundantRule to remove redundant sort fields if sort fields contains unique key

Reply via email to