[ https://issues.apache.org/jira/browse/IMPALA-12765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zoltán Borók-Nagy reassigned IMPALA-12765: ------------------------------------------ Assignee: Zoltán Borók-Nagy > Balance consecutive partitions better for Iceberg tables > -------------------------------------------------------- > > Key: IMPALA-12765 > URL: https://issues.apache.org/jira/browse/IMPALA-12765 > Project: IMPALA > Issue Type: Bug > Components: Frontend > Reporter: Zoltán Borók-Nagy > Assignee: Zoltán Borók-Nagy > Priority: Major > Labels: impala-iceberg > > During scheduling Impala does the following: > * Non-Iceberg tables > ** The scheduler processes the scan ranges in partition key order > ** The scheduler selects N replicas as candidates > ** The scheduler chooses the executor from the candidates based on minimum > number of assigned bytes > ** So consecutive partitions are more likely to be assigned to different > executors > * Iceberg tables > ** The scheduler processes the scan ranges in random order > ** The scheduler selects N replicas as candidates > ** The scheduler chooses the executor from the candidates based on minimum > number of assigned bytes > ** So consecutive partitions (by partition key order) are assigned randomly, > i.e. there's a higher chances of clustering > If the IcebergScanNode ordered its file descriptors based on their paths we > would have a more balanced scheduling for consecutive partitions. Queries > that operate on a range of partitions are quite common, so it makes sense to > optimize that case. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org