[jira] [Comment Edited] (CALCITE-4522) Sort operator returns the same cpu cost no matter the RelCollation is empty or not

Julian Hyde (Jira) Tue, 09 Mar 2021 12:04:06 -0800


    [ 
https://issues.apache.org/jira/browse/CALCITE-4522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298297#comment-17298297
 ]


Julian Hyde edited comment on CALCITE-4522 at 3/9/21, 8:03 PM:
---------------------------------------------------------------

All good sort algorithms have the same basic cost function, {{M log(M)}} (where 
M is the number of input rows). When we're looking for the top N but still have 
to sort, the formula becomes {{M log(N)}} - we still have to process all M 
input rows, but the depth of the decision tree, which is determined by N, 
reduces.

You should multiply by {{constant + row_bytes}}, not {{row_bytes}}. You cannot 
sort 4-byte rows 25 types faster than 100-byte rows. There is a per-row 
overhead.

Your formula doesn't deal with the case where we're doing LIMIT/OFFSET but we 
don't need to sort.


was (Author: julianhyde):
Your formula doesn't deal with the case where we're doing LIMIT/OFFSET but we 
don't need to sort.

> Sort operator returns the same cpu cost no matter the RelCollation is empty 
> or not
> ----------------------------------------------------------------------------------
>
>                 Key: CALCITE-4522
>                 URL: https://issues.apache.org/jira/browse/CALCITE-4522
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>            Reporter: hqx
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> The old method to compute the cost of sort has some problem.
>  # When the RelCollation is empty, there is no need to sort, but it still 
> compute the cpu cost of sort.
>  # use n * log\(n) * row_byte to estimate the cpu cost may be inaccurate, 
> where n means the output row count of the sort operator, and row_byte means 
> the average bytes of one row .
> Instead, I give follow suggestion.
>  # the cpu cost is zero if the RelCollation is empty.
>  # use min(offset + fetch, input_count) * log\(input_count)* row_byte to 
> compute the cpu cost.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (CALCITE-4522) Sort operator returns the same cpu cost no matter the RelCollation is empty or not

Reply via email to