[ 
https://issues.apache.org/jira/browse/CALCITE-4522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17316148#comment-17316148
 ] 

Ruben Q L commented on CALCITE-4522:
------------------------------------

I think this change introduces a regression in EnumerableLimitSort cost 
computation, specifically in the rowCount part (issue detected by a test suite 
in a downstream project).

EnumerableLimitSort cost formula used to be:
{code:java}
planner.getCostFactory().makeCost(inputRowCount, cpu, 0); // cpu is the nLogM * 
bytesPerRow
{code}
After this change, the first parameter (rowCount) of this formula in case of a 
Sort with fetch (e.g. an EnumerableLimitSort), will not be the inputRowCount, 
but {{readCount=Math.min(inCount, offsetValue + fetchValue);}} (which in 
practice in most cases would just be {{offsetValue + fetchValue}}:
{code:java}
planner.getCostFactory().makeCost(offsetValue + fetchValue, cpu, 0); // cpu is 
the nLogM * bytesPerRow
{code}
In my understanding this is wrong, since a Sort operator, even with fetch (such 
as EnumerableLimitSort) will still need to read and process inputRowCount of 
rows (even though it just needs to keep offsetValue + fetchValue rows sorted), 
so I think the new formula underestimates the cost of Sort with fetch, and its 
first parameter should still be inputRowCount in all cases. Should I create a 
ticket to address this issue?

> CPU cost of Sort should be lower if sort keys are empty
> -------------------------------------------------------
>
>                 Key: CALCITE-4522
>                 URL: https://issues.apache.org/jira/browse/CALCITE-4522
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>            Reporter: hqx
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 1.27.0
>
>          Time Spent: 9h 50m
>  Remaining Estimate: 0h
>
> The old method to compute the cost of sort has some problem.
>  # When the RelCollation is empty, there is no need to sort, but it still 
> compute the cpu cost of sort.
>  # use n * log\(n) * row_byte to estimate the cpu cost may be inaccurate, 
> where n means the output row count of the sort operator, and row_byte means 
> the average bytes of one row .
> Instead, I give follow suggestion.
>  # the cpu cost is zero if the RelCollation is empty.
>  # let heap_size be min(offset + fetch, input_count), and use input_count * 
> max(1, log(heap_size))* row_byte to compute the cpu cost.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to