[ https://issues.apache.org/jira/browse/SPARK-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156125#comment-14156125 ]
Milan Straka commented on SPARK-3731: ------------------------------------- I will get to it later today and attach a dataset and program which exhibit this behaviour locally. I believe I will find it because I saw this behaviour in many local runs. > RDD caching stops working in pyspark after some time > ---------------------------------------------------- > > Key: SPARK-3731 > URL: https://issues.apache.org/jira/browse/SPARK-3731 > Project: Spark > Issue Type: Bug > Components: PySpark, Spark Core > Affects Versions: 1.1.0 > Environment: Linux, 32bit, both in local mode or in standalone > cluster mode > Reporter: Milan Straka > Attachments: worker.log > > > Consider a file F which when loaded with sc.textFile and cached takes up > slightly more than half of free memory for RDD cache. > When in PySpark the following is executed: > 1) a = sc.textFile(F) > 2) a.cache().count() > 3) b = sc.textFile(F) > 4) b.cache().count() > and then the following is repeated (for example 10 times): > a) a.unpersist().cache().count() > b) b.unpersist().cache().count() > after some time, there are no RDD cached in memory. > Also, since that time, no other RDD ever gets cached (the worker always > reports something like "WARN CacheManager: Not enough space to cache > partition rdd_23_5 in memory! Free memory is 277478190 bytes.", even if > rdd_23_5 is ~50MB). The Executors tab of the Application Detail UI shows that > all executors have 0MB memory used (which is consistent with the CacheManager > warning). > When doing the same in scala, everything works perfectly. > I understand that this is a vague description, but I do no know how to > describe the problem better. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org