[ https://issues.apache.org/jira/browse/SPARK-49409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17877366#comment-17877366 ]
Changgyoo Park commented on SPARK-49409: ---------------------------------------- Yes, because there is "a" case where 5 is insufficient because of unrelated data frames between very complicated dependent data frames. I'm pretty sure that just increasing the default value is not the best idea, so ideally, the analysed plan should be stored on the client side (this will be super difficult, I know that), removing the plan cache completely, but until then, increasing it to ~16 would cover much more cases. > CONNECT_SESSION_PLAN_CACHE_SIZE is too small for certain programming patterns > ----------------------------------------------------------------------------- > > Key: SPARK-49409 > URL: https://issues.apache.org/jira/browse/SPARK-49409 > Project: Spark > Issue Type: Improvement > Components: Connect > Affects Versions: 4.0.0 > Reporter: Changgyoo Park > Priority: Major > > Example: > > ``` > df_1 = df_a.filter(col('X').isNotNull()) > df_2 = df_b.filter(col('SAFE_SU_Conv').isNotNull()) > .... > df_x = ... > for _ in range(0, 5): > df_x = df_x.select(...) > ... > df_3 = df_1.join(df_2, ...) > ``` > => df_x completely invalidates all the cached entries. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org