[ 
https://issues.apache.org/jira/browse/SPARK-47398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17827206#comment-17827206
 ] 

Raza Jafri commented on SPARK-47398:
------------------------------------

In `AdaptiveSparkPlanExec` we are wrapping `InMemoryTableScanExec` in 
`TableCacheQueryStageExec`. To accomplish this we are currently matching on the 
Exec, I am proposing that we should match on a trait instead just like how we 
do it for `Exchange` by matching against `ShuffleExchangeLike` and 
`BroadcastExchangeLike`. In the RAPIDS Accelerator for Apache Spark, we replace 
the `InMemoryTableScanExec` with our version which does some optimizations. 
This could cause a problem as the benefits of SPARK-42101 might be lost or the 
worst case could be that we try to look for the said Exec and throw an 
exception 

 

Looking at the current code, I propose the trait to be as 
{code:java}
trait InMemoryTableScanLike extends LeafExecNode {  
  /**
   * Returns whether the cache buffer is loaded
   */
  def isMaterialized: Boolean  

  /**
   * Returns the actual cached RDD without filters and serialization of 
row/columnar.
   */
  def baseCacheRDD(): RDD[CachedBatch]  

  /**
   * Returns the runtime statistics after shuffle materialization.  
   */
  def runtimeStatistics: Statistics
} {code}
This is just based on what I know about how AQE is using it. 

> AQE doesn't allow for extension of InMemoryTableScanExec
> --------------------------------------------------------
>
>                 Key: SPARK-47398
>                 URL: https://issues.apache.org/jira/browse/SPARK-47398
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.5.0, 3.5.1
>            Reporter: Raza Jafri
>            Priority: Major
>              Labels: pull-request-available
>
> As part of SPARK-42101, we added support to AQE for handling 
> InMemoryTableScanExec. 
> This change directly references `InMemoryTableScanExec` which limits users 
> from extending the caching functionality that was added as part of 
> SPARK-32274 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to