[ https://issues.apache.org/jira/browse/IMPALA-12657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Riza Suminto resolved IMPALA-12657. ----------------------------------- Resolution: Fixed > Improve ProcessingCost of ScanNode and NonGroupingAggregator > ------------------------------------------------------------ > > Key: IMPALA-12657 > URL: https://issues.apache.org/jira/browse/IMPALA-12657 > Project: IMPALA > Issue Type: Improvement > Components: Frontend > Affects Versions: Impala 4.3.0 > Reporter: Riza Suminto > Assignee: David Rorke > Priority: Major > Fix For: Impala 4.4.0 > > Attachments: profile_1f4d7a679a3e12d5_4223115700000000.txt > > > Several benchmark run measuring Impala scan performance indicates some > costing improvement opportunity around ScanNode and NonGroupingAggregator. > [^profile_1f4d7a679a3e12d5_4223115700000000.txt] shows an example of simple > count query. > Key takeaway: > # There is a strong correlation between total materialized bytes (row-size * > cardinality) with total materialized tuple time per fragment. Row > materialization cost should be adjusted to be based on this row-sized instead > of equal cost per scan range. > # NonGroupingAggregator should have much lower cost that GroupingAggregator. > In example above, the cost of NonGroupingAggregator dominates the scan > fragment even though it only does simple counting instead of hash table > operation. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org