[ https://issues.apache.org/jira/browse/CALCITE-4522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17316148#comment-17316148 ]
Ruben Q L commented on CALCITE-4522: ------------------------------------ I think this change introduces a regression in EnumerableLimitSort cost computation, specifically in the rowCount part (issue detected by a test suite in a downstream project). EnumerableLimitSort cost formula used to be: {code:java} planner.getCostFactory().makeCost(inputRowCount, cpu, 0); // cpu is the nLogM * bytesPerRow {code} After this change, the first parameter (rowCount) of this formula in case of a Sort with fetch (e.g. an EnumerableLimitSort), will not be the inputRowCount, but {{readCount=Math.min(inCount, offsetValue + fetchValue);}} (which in practice in most cases would just be {{offsetValue + fetchValue}}: {code:java} planner.getCostFactory().makeCost(offsetValue + fetchValue, cpu, 0); // cpu is the nLogM * bytesPerRow {code} In my understanding this is wrong, since a Sort operator, even with fetch (such as EnumerableLimitSort) will still need to read and process inputRowCount of rows (even though it just needs to keep offsetValue + fetchValue rows sorted), so I think the new formula underestimates the cost of Sort with fetch, and its first parameter should still be inputRowCount in all cases. Should I create a ticket to address this issue? > CPU cost of Sort should be lower if sort keys are empty > ------------------------------------------------------- > > Key: CALCITE-4522 > URL: https://issues.apache.org/jira/browse/CALCITE-4522 > Project: Calcite > Issue Type: Improvement > Components: core > Reporter: hqx > Priority: Minor > Labels: pull-request-available > Fix For: 1.27.0 > > Time Spent: 9h 50m > Remaining Estimate: 0h > > The old method to compute the cost of sort has some problem. > # When the RelCollation is empty, there is no need to sort, but it still > compute the cpu cost of sort. > # use n * log\(n) * row_byte to estimate the cpu cost may be inaccurate, > where n means the output row count of the sort operator, and row_byte means > the average bytes of one row . > Instead, I give follow suggestion. > # the cpu cost is zero if the RelCollation is empty. > # let heap_size be min(offset + fetch, input_count), and use input_count * > max(1, log(heap_size))* row_byte to compute the cpu cost. -- This message was sent by Atlassian Jira (v8.3.4#803005)