[ https://issues.apache.org/jira/browse/SPARK-44900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758376#comment-17758376 ]
Kent Yao commented on SPARK-44900: ---------------------------------- Please note that there is a storage limit in place. Adding items without removing any may lead to cache evictions and recomputations. In such cases, caching may not be as effective as direct computation, as extra write paths are introduced. Probably, you should optimize your program. > Cached DataFrame keeps growing > ------------------------------ > > Key: SPARK-44900 > URL: https://issues.apache.org/jira/browse/SPARK-44900 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.4.0 > Reporter: Varun Nalla > Priority: Blocker > > Scenario : > We have a kafka streaming application where the data lookups are happening by > joining another DF which is cached, and the caching strategy is > MEMORY_AND_DISK. > However the size of the cached DataFrame keeps on growing for every micro > batch the streaming application process and that's being visible under > storage tab. > A similar stack overflow thread was already raised. > https://stackoverflow.com/questions/55601779/spark-dataframe-cache-keeps-growing -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org