Peter Andrew created SPARK-50569:
------------------------------------

             Summary: Clearing dataframes/views persisted in isolated Connect 
session
                 Key: SPARK-50569
                 URL: https://issues.apache.org/jira/browse/SPARK-50569
             Project: Spark
          Issue Type: Improvement
          Components: Connect
    Affects Versions: 3.5.3
            Reporter: Peter Andrew


With Spark Connect, `sparkSession.catalog.clearCache` clears all dataframes 
that have been persisted in the Spark Connect server, including those persisted 
by different isolated sessions.

Similarly, views createdby `DataFrame.createOrReplaceTempView` are not removed 
after an isolated session is terminated, even though the documentation might be 
interpreted as though it should:

> The lifetime of this temporary table is tied to the 
> [{{SparkSession}}|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.SparkSession.html#pyspark.sql.SparkSession]
>  that was used to create this 
> [{{DataFrame}}|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.html#pyspark.sql.DataFrame].

It'd be very useful to be able to do the following:

- When calling `clearCache`, controlling whether to clear dataframes persisted 
in the current isolated session only, or all dataframes.
- Configuring the Spark Connect server to clean up all persisted 
dataframes/views from an isolated session when it is terminated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to