[ https://issues.apache.org/jira/browse/CALCITE-5894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17751007#comment-17751007 ]
JingDas edited comment on CALCITE-5894 at 8/4/23 8:04 AM: ---------------------------------------------------------- [~libenchao] Thanks for your comment. Acctually I get the idea from the paper [Fundamental Techniques for Order Optimization|[https://sigmodrecord.org/publications/sigmodRecord/9606/pdfs/235968.233320.pdf]] For example, dept table with data as following: dept_no|dept_name|dept_address|dept_decmail| -----{-}{{-}}+{+}{{-}}{-}-----{-}{{-}}{{-}}{-}--------{-}{{-}}{+}+{{-}}{-}---------- d008 | | 8| | d009 | | 9| | d001 |a | 1| | d004 |a | 4| | d006 |a | 6| | d002 |b | 2| | d005 |b | 5| | d003 |c | 3| | d007 |c | 7| | If we run the SQL as following: {code:java} SELECT * FROM dept order by dept_name, dept_address;{code} The response is: dept_no|dept_name|dept_address|dept_decmail| -----{-}{{-}}+{+}{{-}}{-}-----{-}{{-}}{{-}}{-}--------{-}{{-}}{+}+{{-}}{-}---------- d008 | | 8| | d009 | | 9| | d001 |a | 1| | d004 |a | 4| | d006 |a | 6| | d002 |b | 2| | d005 |b | 5| | d003 |c | 3| | d007 |c | 7| | It orders by dept_name firstly, then order by dept_address in same dept_name secondly. If we run SQL as following: {code:java} SELECT * FROM dept ORDER BY dept_no, dept_address;{code} To be precise, when this rule found firstly order field `dept_no` is unique and not null(may be it is out of order), it only order by the first order field `dept_no` sufficiently and doesn't need to sort by second order field `dept_address`. At this time, it works a bit like `SortRemoveConstantKeysRule` but essentially different. This rule also supports other scenes. I add other scene in description. WDYT? was (Author: JIRAUSER292370): [~libenchao] Thanks for your comment. Acctually I get the idea from the papaer [Fundamental Techniques for Order Optimization | [https://sigmodrecord.org/publications/sigmodRecord/9606/pdfs/235968.233320.pdf]] For example, dept table with data as following: dept_no|dept_name|dept_address|dept_decmail| ------{-}++{-}-------{-}{-}----------{-}++{-}----------- d008 | | 8| | d009 | | 9| | d001 |a | 1| | d004 |a | 4| | d006 |a | 6| | d002 |b | 2| | d005 |b | 5| | d003 |c | 3| | d007 |c | 7| | If we run the SQL as following: {code:java} SELECT * FROM dept order by dept_name, dept_address;{code} The response is: dept_no|dept_name|dept_address|dept_decmail| ------{-}++{-}-------{-}{-}----------{-}++{-}----------- d008 | | 8| | d009 | | 9| | d001 |a | 1| | d004 |a | 4| | d006 |a | 6| | d002 |b | 2| | d005 |b | 5| | d003 |c | 3| | d007 |c | 7| | It orders by dept_name firstly, then order by dept_address in same dept_name secondly. If we run SQL as following: {code:java} SELECT * FROM dept ORDER BY dept_no, dept_address;{code} To be precise, when this rule found firstly order field `dept_no` is unique and not null(may be it is out of order), it only order by the first order field `dept_no` sufficiently and doesn't need to sort by second order field `dept_address`. At this time, it works a bit like `SortRemoveConstantKeysRule`. This rule also supports other scenes. I add other scene in description. WDYT? > Add SortRemoveRedundantRule to remove redundant sort fields if sort fields > contains unique key > ---------------------------------------------------------------------------------------------- > > Key: CALCITE-5894 > URL: https://issues.apache.org/jira/browse/CALCITE-5894 > Project: Calcite > Issue Type: New Feature > Reporter: JingDas > Assignee: JingDas > Priority: Minor > > In some scene, Sort fields can be reduct, if sort fields contain unique key > For example > {code:java} > SELECT ename, salary FROM Emp > order by empno, ename{code} > where `empno` is a key, `ename` is redundant since `empno` alone is > sufficient to determine the order of any two records. > So the SQL can be optimized as following: > {code:java} > SELECT name, Emp.salary FROM Emp > order by empno{code} > For another example: > {code:java} > SELECT e_agg.c, e_agg.ename > FROM > (SELECT count(*) as c, ename, job FROM Emp GROUP BY ename, job) AS e_agg > ORDER BY e_agg.ename, e_agg.c {code} > Although `e_agg.ename` is not a key but field `ename` is unique and not null, > it can be optimized as following: > {code:java} > SELECT e_agg.c, e_agg.ename > FROM (SELECT count(*) as c, ename, job FROM Emp GROUP BY ename, job) AS e_agg > ORDER BY e_agg.ename{code} > Sorting is an expensive operation, however. Therefore, it is imperative that > sorting > is optimized to avoid unnecessary sort field. > -- This message was sent by Atlassian Jira (v8.20.10#820010)