[ https://issues.apache.org/jira/browse/SPARK-50639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated SPARK-50639: ----------------------------------- Labels: pull-request-available (was: ) > Improve warning logging in CacheManager > --------------------------------------- > > Key: SPARK-50639 > URL: https://issues.apache.org/jira/browse/SPARK-50639 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 4.0.0 > Reporter: Vlad Rozov > Priority: Minor > Labels: pull-request-available > > {{CacheManager}} currently logs warning when there is an attempt to add > dataframe with the logical plan that already present in the cache. There is > no warning message logged if there is an attempt to remove dataframe from the > cache when the logical plan is not present in the cache. The request is to: > # Add information about what logical plan is added/removed from the cache > # Add missing warning message for the above case > While there is an ability to enable detailed debug logging for theĀ > {{CacheManager}}, such debug logging is not enabled by default as it logs > large amount of data. > Consider the following code that leads to memory leaks: > {noformat} > Dataset<Row> dataset = ... > Dataset<Row> dataset1 = dataset.withColumn(...); > Dataset<Row> dataset2 = dataset1.withColumn(...); > dataset.persist(); // OK > dataset1.persist(); // OK > dataset.persist(); // currently logs warning without logical plan details > dataset.unpersist(); // OK > dataset.unpersist(); // no warning > dataset2.unpersist(); // no warning, the actual call should be on dataset1 > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org