[jira] [Comment Edited] (CALCITE-5894) Add SortRemoveRedundantRule to remove redundant sort fields if sort fields contains unique key

JingDas (Jira) Fri, 04 Aug 2023 01:05:06 -0700


    [ 
https://issues.apache.org/jira/browse/CALCITE-5894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17751007#comment-17751007
 ]


JingDas edited comment on CALCITE-5894 at 8/4/23 8:04 AM:
----------------------------------------------------------

[~libenchao] Thanks for your comment. Acctually I get the idea from the paper

[Fundamental Techniques for Order 
Optimization|[https://sigmodrecord.org/publications/sigmodRecord/9606/pdfs/235968.233320.pdf]]

For example, dept table with data as following:

dept_no|dept_name|dept_address|dept_decmail|
-----{-}{{-}}+{+}{{-}}{-}-----{-}{{-}}{{-}}{-}--------{-}{{-}}{+}+{{-}}{-}----------
d008   |         |           8|            |
d009   |         |           9|            |
d001   |a        |           1|            |
d004   |a        |           4|            |
d006   |a        |           6|            |
d002   |b        |           2|            |
d005   |b        |           5|            |
d003   |c        |           3|            |
d007   |c        |           7|            |

If we run the SQL as following:
{code:java}
SELECT * FROM dept order by dept_name, dept_address;{code}
The response is:

dept_no|dept_name|dept_address|dept_decmail|
-----{-}{{-}}+{+}{{-}}{-}-----{-}{{-}}{{-}}{-}--------{-}{{-}}{+}+{{-}}{-}----------
d008   |         |           8|            |
d009   |         |           9|            |
d001   |a        |           1|            |
d004   |a        |           4|            |
d006   |a        |           6|            |
d002   |b        |           2|            |
d005   |b        |           5|            |
d003   |c        |           3|            |
d007   |c        |           7|            |

It orders by dept_name firstly, then order by dept_address in same dept_name 
secondly.

If we run SQL as following:
{code:java}
SELECT * FROM dept ORDER BY dept_no, dept_address;{code}
To be precise, when this rule found firstly order field `dept_no` is unique and 
not null(may be it is out of order), it only order by the first order field 
`dept_no` sufficiently and doesn't need to sort by second order field 
`dept_address`. At this time, it works a bit like `SortRemoveConstantKeysRule` 
but essentially different. This rule also supports other scenes. I add other 
scene in description.

WDYT?


was (Author: JIRAUSER292370):
[~libenchao] Thanks for your comment. Acctually I get the idea from the papaer

[Fundamental Techniques for Order Optimization | 
[https://sigmodrecord.org/publications/sigmodRecord/9606/pdfs/235968.233320.pdf]]

For example, dept table with data as following:

dept_no|dept_name|dept_address|dept_decmail|
------{-}++{-}-------{-}{-}----------{-}++{-}-----------
d008   |         |           8|            |
d009   |         |           9|            |
d001   |a        |           1|            |
d004   |a        |           4|            |
d006   |a        |           6|            |
d002   |b        |           2|            |
d005   |b        |           5|            |
d003   |c        |           3|            |
d007   |c        |           7|            |

If we run the SQL as following:

 
{code:java}
SELECT * FROM dept order by dept_name, dept_address;{code}
The response is:

 

dept_no|dept_name|dept_address|dept_decmail|
------{-}++{-}-------{-}{-}----------{-}++{-}-----------
d008   |         |           8|            |
d009   |         |           9|            |
d001   |a        |           1|            |
d004   |a        |           4|            |
d006   |a        |           6|            |
d002   |b        |           2|            |
d005   |b        |           5|            |
d003   |c        |           3|            |
d007   |c        |           7|            |

It orders by dept_name firstly, then order by dept_address in same dept_name 
secondly.

If we run SQL as following:
{code:java}
SELECT * FROM dept ORDER BY dept_no, dept_address;{code}
To be precise, when this rule found firstly order field `dept_no` is unique and 
not null(may be it is out of order), it only order by the first order field 
`dept_no` sufficiently and doesn't need to sort by second order field 
`dept_address`. At this time, it works a bit like `SortRemoveConstantKeysRule`. 
This rule also supports other scenes. I add other scene in description.

WDYT?

> Add SortRemoveRedundantRule to remove redundant sort fields if sort fields 
> contains unique key
> ----------------------------------------------------------------------------------------------
>
>                 Key: CALCITE-5894
>                 URL: https://issues.apache.org/jira/browse/CALCITE-5894
>             Project: Calcite
>          Issue Type: New Feature
>            Reporter: JingDas
>            Assignee: JingDas
>            Priority: Minor
>
> In some scene, Sort fields can be reduct, if sort fields contain unique key
> For example
> {code:java}
> SELECT ename, salary FROM Emp
> order by empno, ename{code}
> where `empno` is a key,  `ename` is redundant since `empno` alone is 
> sufficient to determine the order of any two records.
> So the SQL can be optimized as following:
> {code:java}
> SELECT name, Emp.salary FROM Emp
> order by empno{code}
> For another example:
> {code:java}
> SELECT e_agg.c, e_agg.ename
> FROM
> (SELECT count(*) as c, ename, job FROM Emp GROUP BY ename, job) AS e_agg
> ORDER BY e_agg.ename, e_agg.c {code}
> Although `e_agg.ename` is not a key but field `ename` is unique and not null, 
> it can be optimized as following:
> {code:java}
> SELECT e_agg.c, e_agg.ename
> FROM (SELECT count(*) as c, ename, job FROM Emp GROUP BY ename, job) AS e_agg
> ORDER BY e_agg.ename{code}
> Sorting is an expensive operation, however. Therefore, it is imperative that 
> sorting
> is optimized to avoid unnecessary sort field.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (CALCITE-5894) Add SortRemoveRedundantRule to remove redundant sort fields if sort fields contains unique key

Reply via email to