[
https://issues.apache.org/jira/browse/CALCITE-5894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17751007#comment-17751007
]
JingDas edited comment on CALCITE-5894 at 8/4/23 8:04 AM:
----------------------------------------------------------
[~libenchao] Thanks for your comment. Acctually I get the idea from the paper
[Fundamental Techniques for Order
Optimization|[https://sigmodrecord.org/publications/sigmodRecord/9606/pdfs/235968.233320.pdf]]
For example, dept table with data as following:
dept_no|dept_name|dept_address|dept_decmail|
-----{-}{{-}}+{+}{{-}}{-}-----{-}{{-}}{{-}}{-}--------{-}{{-}}{+}+{{-}}{-}----------
d008 | | 8| |
d009 | | 9| |
d001 |a | 1| |
d004 |a | 4| |
d006 |a | 6| |
d002 |b | 2| |
d005 |b | 5| |
d003 |c | 3| |
d007 |c | 7| |
If we run the SQL as following:
{code:java}
SELECT * FROM dept order by dept_name, dept_address;{code}
The response is:
dept_no|dept_name|dept_address|dept_decmail|
-----{-}{{-}}+{+}{{-}}{-}-----{-}{{-}}{{-}}{-}--------{-}{{-}}{+}+{{-}}{-}----------
d008 | | 8| |
d009 | | 9| |
d001 |a | 1| |
d004 |a | 4| |
d006 |a | 6| |
d002 |b | 2| |
d005 |b | 5| |
d003 |c | 3| |
d007 |c | 7| |
It orders by dept_name firstly, then order by dept_address in same dept_name
secondly.
If we run SQL as following:
{code:java}
SELECT * FROM dept ORDER BY dept_no, dept_address;{code}
To be precise, when this rule found firstly order field `dept_no` is unique and
not null(may be it is out of order), it only order by the first order field
`dept_no` sufficiently and doesn't need to sort by second order field
`dept_address`. At this time, it works a bit like `SortRemoveConstantKeysRule`
but essentially different. This rule also supports other scenes. I add other
scene in description.
WDYT?
was (Author: JIRAUSER292370):
[~libenchao] Thanks for your comment. Acctually I get the idea from the papaer
[Fundamental Techniques for Order Optimization |
[https://sigmodrecord.org/publications/sigmodRecord/9606/pdfs/235968.233320.pdf]]
For example, dept table with data as following:
dept_no|dept_name|dept_address|dept_decmail|
------{-}++{-}-------{-}{-}----------{-}++{-}-----------
d008 | | 8| |
d009 | | 9| |
d001 |a | 1| |
d004 |a | 4| |
d006 |a | 6| |
d002 |b | 2| |
d005 |b | 5| |
d003 |c | 3| |
d007 |c | 7| |
If we run the SQL as following:
{code:java}
SELECT * FROM dept order by dept_name, dept_address;{code}
The response is:
dept_no|dept_name|dept_address|dept_decmail|
------{-}++{-}-------{-}{-}----------{-}++{-}-----------
d008 | | 8| |
d009 | | 9| |
d001 |a | 1| |
d004 |a | 4| |
d006 |a | 6| |
d002 |b | 2| |
d005 |b | 5| |
d003 |c | 3| |
d007 |c | 7| |
It orders by dept_name firstly, then order by dept_address in same dept_name
secondly.
If we run SQL as following:
{code:java}
SELECT * FROM dept ORDER BY dept_no, dept_address;{code}
To be precise, when this rule found firstly order field `dept_no` is unique and
not null(may be it is out of order), it only order by the first order field
`dept_no` sufficiently and doesn't need to sort by second order field
`dept_address`. At this time, it works a bit like `SortRemoveConstantKeysRule`.
This rule also supports other scenes. I add other scene in description.
WDYT?
> Add SortRemoveRedundantRule to remove redundant sort fields if sort fields
> contains unique key
> ----------------------------------------------------------------------------------------------
>
> Key: CALCITE-5894
> URL: https://issues.apache.org/jira/browse/CALCITE-5894
> Project: Calcite
> Issue Type: New Feature
> Reporter: JingDas
> Assignee: JingDas
> Priority: Minor
>
> In some scene, Sort fields can be reduct, if sort fields contain unique key
> For example
> {code:java}
> SELECT ename, salary FROM Emp
> order by empno, ename{code}
> where `empno` is a key, `ename` is redundant since `empno` alone is
> sufficient to determine the order of any two records.
> So the SQL can be optimized as following:
> {code:java}
> SELECT name, Emp.salary FROM Emp
> order by empno{code}
> For another example:
> {code:java}
> SELECT e_agg.c, e_agg.ename
> FROM
> (SELECT count(*) as c, ename, job FROM Emp GROUP BY ename, job) AS e_agg
> ORDER BY e_agg.ename, e_agg.c {code}
> Although `e_agg.ename` is not a key but field `ename` is unique and not null,
> it can be optimized as following:
> {code:java}
> SELECT e_agg.c, e_agg.ename
> FROM (SELECT count(*) as c, ename, job FROM Emp GROUP BY ename, job) AS e_agg
> ORDER BY e_agg.ename{code}
> Sorting is an expensive operation, however. Therefore, it is imperative that
> sorting
> is optimized to avoid unnecessary sort field.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)