[ 
https://issues.apache.org/jira/browse/SPARK-49409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17877366#comment-17877366
 ] 

Changgyoo Park commented on SPARK-49409:
----------------------------------------

Yes, because there is "a" case where 5 is insufficient because of unrelated 
data frames between very complicated dependent data frames. I'm pretty sure 
that just increasing the default value is not the best idea, so ideally, the 
analysed plan should be stored on the client side (this will be super 
difficult, I know that), removing the plan cache completely, but until then, 
increasing it to ~16 would cover much more cases.

> CONNECT_SESSION_PLAN_CACHE_SIZE is too small for certain programming patterns
> -----------------------------------------------------------------------------
>
>                 Key: SPARK-49409
>                 URL: https://issues.apache.org/jira/browse/SPARK-49409
>             Project: Spark
>          Issue Type: Improvement
>          Components: Connect
>    Affects Versions: 4.0.0
>            Reporter: Changgyoo Park
>            Priority: Major
>
> Example:
>  
> ```
> df_1 = df_a.filter(col('X').isNotNull())
> df_2 = df_b.filter(col('SAFE_SU_Conv').isNotNull())
> ....
> df_x = ...
> for _ in range(0, 5):
>     df_x = df_x.select(...)
> ...
> df_3 = df_1.join(df_2, ...)
> ```
> => df_x completely invalidates all the cached entries.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to