[ https://issues.apache.org/jira/browse/SPARK-47398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Raza Jafri updated SPARK-47398: ------------------------------- Description: As part of SPARK-42101 we added support to AQE for handling InMemoryTableScanExec. This change directly references `InMemoryTableScanExec` which limits users from extending the caching functionality that was added as part of SPARK-32274 In AdaptiveSparkPlanExec we are wrapping InMemoryTableScanExec in TableCacheQueryStageExec. To accomplish this we are currently matching on the Exec, I am proposing that we should match on a trait instead just like how we do it for Exchange by matching against ShuffleExchangeLike and BroadcastExchangeLike. Looking at the current code, I propose the trait to be as {code:java} trait InMemoryTableScanLike extends LeafExecNode { /** * Returns whether the cache buffer is loaded */ def isMaterialized: Boolean /** * Returns the actual cached RDD without filters and serialization of row/columnar. */ def baseCacheRDD(): RDD[CachedBatch] /** * Returns the runtime statistics after shuffle materialization. */ def runtimeStatistics: Statistics } {code} This is just based on what I know about how AQE is using it. was: As part of SPARK-42101 we added support to AQE for handling InMemoryTableScanExec. This change directly references `InMemoryTableScanExec` which limits users from extending the caching functionality that was added as part of SPARK-32274 In `AdaptiveSparkPlanExec` we are wrapping `InMemoryTableScanExec` in `TableCacheQueryStageExec`. To accomplish this we are currently matching on the Exec, I am proposing that we should match on a trait instead just like how we do it for `Exchange` by matching against `ShuffleExchangeLike` and `BroadcastExchangeLike`. Looking at the current code, I propose the trait to be as ``` trait InMemoryTableScanLike extends LeafExecNode { /** * Returns whether the cache buffer is loaded */ def isMaterialized: Boolean /** * Returns the actual cached RDD without filters and serialization of row/columnar. */ def baseCacheRDD(): RDD[CachedBatch] /** * Returns the runtime statistics after shuffle materialization. */ def runtimeStatistics: Statistics } ``` This is just based on what I know about how AQE is using it. > AQE doesn't allow for extension of InMemoryTableScanExec > -------------------------------------------------------- > > Key: SPARK-47398 > URL: https://issues.apache.org/jira/browse/SPARK-47398 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.5.0, 3.5.1 > Reporter: Raza Jafri > Priority: Major > > As part of SPARK-42101 we added support to AQE for handling > InMemoryTableScanExec. > This change directly references `InMemoryTableScanExec` which limits users > from extending the caching functionality that was added as part of > SPARK-32274 > In AdaptiveSparkPlanExec we are wrapping InMemoryTableScanExec in > TableCacheQueryStageExec. To accomplish this we are currently matching on the > Exec, I am proposing that we should match on a trait instead just like how we > do it for Exchange by matching against ShuffleExchangeLike and > BroadcastExchangeLike. > > Looking at the current code, I propose the trait to be as > {code:java} > trait InMemoryTableScanLike extends LeafExecNode { > /** > * Returns whether the cache buffer is loaded > */ > def isMaterialized: Boolean > /** > * Returns the actual cached RDD without filters and serialization of > row/columnar. > */ > def baseCacheRDD(): RDD[CachedBatch] > /** > * Returns the runtime statistics after shuffle materialization. > */ > def runtimeStatistics: Statistics > } {code} > This is just based on what I know about how AQE is using it. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org