[jira] [Updated] (SPARK-47398) AQE doesn't allow for extension of InMemoryTableScanExec

Raza Jafri (Jira) Thu, 14 Mar 2024 11:23:15 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-47398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Raza Jafri updated SPARK-47398:
-------------------------------
    Description: 
As part of SPARK-42101 we added support to AQE for handling 
InMemoryTableScanExec. 

This change directly references `InMemoryTableScanExec` which limits users from 
extending the caching functionality that was added as part of SPARK-32274 

In AdaptiveSparkPlanExec we are wrapping InMemoryTableScanExec in 
TableCacheQueryStageExec. To accomplish this we are currently matching on the 
Exec, I am proposing that we should match on a trait instead just like how we 
do it for Exchange by matching against ShuffleExchangeLike and 
BroadcastExchangeLike. 

 

Looking at the current code, I propose the trait to be as 
{code:java}
trait InMemoryTableScanLike extends LeafExecNode {  
  /**
   * Returns whether the cache buffer is loaded
   */
  def isMaterialized: Boolean  

  /**
   * Returns the actual cached RDD without filters and serialization of 
row/columnar.
   */
  def baseCacheRDD(): RDD[CachedBatch]  

  /**
   * Returns the runtime statistics after shuffle materialization.  
   */
  def runtimeStatistics: Statistics
} {code}
This is just based on what I know about how AQE is using it. 

  was:
As part of SPARK-42101 we added support to AQE for handling 
InMemoryTableScanExec. 

This change directly references `InMemoryTableScanExec` which limits users from 
extending the caching functionality that was added as part of SPARK-32274 

In `AdaptiveSparkPlanExec` we are wrapping `InMemoryTableScanExec` in 
`TableCacheQueryStageExec`. To accomplish this we are currently matching on the 
Exec, I am proposing that we should match on a trait instead just like how we 
do it for `Exchange` by matching against `ShuffleExchangeLike` and 
`BroadcastExchangeLike`. 

 

Looking at the current code, I propose the trait to be as 

 

```

trait InMemoryTableScanLike extends LeafExecNode {

  /**
   * Returns whether the cache buffer is loaded
   */
  def isMaterialized: Boolean

  /**
   * Returns the actual cached RDD without filters and serialization of 
row/columnar.
   */
  def baseCacheRDD(): RDD[CachedBatch]

  /**
   * Returns the runtime statistics after shuffle materialization.
   */
  def runtimeStatistics: Statistics
}

```

This is just based on what I know about how AQE is using it. 


> AQE doesn't allow for extension of InMemoryTableScanExec
> --------------------------------------------------------
>
>                 Key: SPARK-47398
>                 URL: https://issues.apache.org/jira/browse/SPARK-47398
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.5.0, 3.5.1
>            Reporter: Raza Jafri
>            Priority: Major
>
> As part of SPARK-42101 we added support to AQE for handling 
> InMemoryTableScanExec. 
> This change directly references `InMemoryTableScanExec` which limits users 
> from extending the caching functionality that was added as part of 
> SPARK-32274 
> In AdaptiveSparkPlanExec we are wrapping InMemoryTableScanExec in 
> TableCacheQueryStageExec. To accomplish this we are currently matching on the 
> Exec, I am proposing that we should match on a trait instead just like how we 
> do it for Exchange by matching against ShuffleExchangeLike and 
> BroadcastExchangeLike. 
>  
> Looking at the current code, I propose the trait to be as 
> {code:java}
> trait InMemoryTableScanLike extends LeafExecNode {  
>   /**
>    * Returns whether the cache buffer is loaded
>    */
>   def isMaterialized: Boolean  
>   /**
>    * Returns the actual cached RDD without filters and serialization of 
> row/columnar.
>    */
>   def baseCacheRDD(): RDD[CachedBatch]  
>   /**
>    * Returns the runtime statistics after shuffle materialization.  
>    */
>   def runtimeStatistics: Statistics
> } {code}
> This is just based on what I know about how AQE is using it. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47398) AQE doesn't allow for extension of InMemoryTableScanExec

Reply via email to