[ https://issues.apache.org/jira/browse/SPARK-41220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-41220: ------------------------------------ Assignee: Apache Spark > Range partitioner sample supports column pruning > ------------------------------------------------ > > Key: SPARK-41220 > URL: https://issues.apache.org/jira/browse/SPARK-41220 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.4.0 > Reporter: XiDuo You > Assignee: Apache Spark > Priority: Major > > When do a global sort, firstly we do sample to get range bounds, then we use > the range partitioner to do shuffle exchange. > The issue is, the sample plan is coupled with the shuffle plan that causes we > can not optimize the sample plan. What we need for sample plan is the columns > for sort order but the shuffle plan contains all data columns.So at least, we > can do column pruning for the sample plan to only fetch the ordering columns. > A common example is: `OPTIMIZE table ZORDER BY columns` -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org