[ 
https://issues.apache.org/jira/browse/SPARK-54367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-54367:
-----------------------------------
    Labels: pull-request-available  (was: )

> Propagate close() from SessionState to CatalogPlugins to prevent resource 
> leaks
> -------------------------------------------------------------------------------
>
>                 Key: SPARK-54367
>                 URL: https://issues.apache.org/jira/browse/SPARK-54367
>             Project: Spark
>          Issue Type: Bug
>          Components: Connect
>    Affects Versions: 4.0.1
>            Reporter: Yong Zheng
>            Priority: Major
>              Labels: pull-request-available
>
> Spark Connect server is leaking `SparkSession` objects each time a client 
> connects and disconnects when dealing with Apache Iceberg ([Apache Iceberg 
> PR|https://github.com/apache/iceberg/pull/14590]). 
> The `SessionHolder.close()` method in Spark Connect is responsible for 
> cleaning up a session. It does perform some cleanup such as artifacts and 
> streaming queries but it doesn't perform cleanup on the main `SessionState`. 
> This is where the `CatalogManager` lives which holds reference to 
> `RESTCatalog` and `S3FileIO`. Since the `SessionState` is never closed, these 
> `Closeable` catalogs are never closed and their threads leak.
> To fix this, I added the following changes:
> 1. [Apache Iceberg PR|https://github.com/apache/iceberg/pull/14590] to 
> implement closable catalog
> 2. Make `CatalogPlugin` interface extends `java.io.Closeable`
> 3. Implement a `close()` method in `CatalogManager` that iterates through all 
> registered catalogs and calls their `close()` method
> 4. Make `SessionState` implements `Closeable` and calls 
> `catalogManager.close()` from its `close()` method.
> 5. Invoke `session.sessionState.close()` from `SessionHolder.close()` when a 
> Spark Connect session is stopped
> Above changes create a clean lifecycle for catalogs when a session ended, a 
> `close()` call is propagated down the chain which allow each catalog to 
> release its resources.
> PR: https://github.com/apache/spark/pull/53078



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to