[
https://issues.apache.org/jira/browse/CALCITE-4522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298297#comment-17298297
]
Julian Hyde edited comment on CALCITE-4522 at 3/9/21, 8:03 PM:
---------------------------------------------------------------
All good sort algorithms have the same basic cost function, {{M log(M)}} (where
M is the number of input rows). When we're looking for the top N but still have
to sort, the formula becomes {{M log(N)}} - we still have to process all M
input rows, but the depth of the decision tree, which is determined by N,
reduces.
You should multiply by {{constant + row_bytes}}, not {{row_bytes}}. You cannot
sort 4-byte rows 25 types faster than 100-byte rows. There is a per-row
overhead.
Your formula doesn't deal with the case where we're doing LIMIT/OFFSET but we
don't need to sort.
was (Author: julianhyde):
Your formula doesn't deal with the case where we're doing LIMIT/OFFSET but we
don't need to sort.
> Sort operator returns the same cpu cost no matter the RelCollation is empty
> or not
> ----------------------------------------------------------------------------------
>
> Key: CALCITE-4522
> URL: https://issues.apache.org/jira/browse/CALCITE-4522
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Reporter: hqx
> Priority: Minor
> Labels: pull-request-available
> Time Spent: 40m
> Remaining Estimate: 0h
>
> The old method to compute the cost of sort has some problem.
> # When the RelCollation is empty, there is no need to sort, but it still
> compute the cpu cost of sort.
> # use n * log\(n) * row_byte to estimate the cpu cost may be inaccurate,
> where n means the output row count of the sort operator, and row_byte means
> the average bytes of one row .
> Instead, I give follow suggestion.
> # the cpu cost is zero if the RelCollation is empty.
> # use min(offset + fetch, input_count) * log\(input_count)* row_byte to
> compute the cpu cost.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)