[ 
https://issues.apache.org/jira/browse/SPARK-44900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758376#comment-17758376
 ] 

Kent Yao commented on SPARK-44900:
----------------------------------

Please note that there is a storage limit in place. Adding items without 
removing any may lead to cache evictions and recomputations. In such cases, 
caching may not be as effective as direct computation, as extra write paths are 
introduced. Probably, you should optimize your program.

 

 

> Cached DataFrame keeps growing
> ------------------------------
>
>                 Key: SPARK-44900
>                 URL: https://issues.apache.org/jira/browse/SPARK-44900
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.4.0
>            Reporter: Varun Nalla
>            Priority: Blocker
>
> Scenario :
> We have a kafka streaming application where the data lookups are happening by 
> joining  another DF which is cached, and the caching strategy is 
> MEMORY_AND_DISK.
> However the size of the cached DataFrame keeps on growing for every micro 
> batch the streaming application process and that's being visible under 
> storage tab.
> A similar stack overflow thread was already raised.
> https://stackoverflow.com/questions/55601779/spark-dataframe-cache-keeps-growing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to