Yong Zheng created SPARK-54367:
----------------------------------

             Summary: Propagate close() from SessionState to CatalogPlugins to 
prevent resource leaks
                 Key: SPARK-54367
                 URL: https://issues.apache.org/jira/browse/SPARK-54367
             Project: Spark
          Issue Type: Bug
          Components: Connect
    Affects Versions: 4.0.1
            Reporter: Yong Zheng


Spark Connect server is leaking `SparkSession` objects each time a client 
connects and disconnects when dealing with Apache Iceberg ([Apache Iceberg 
PR|https://github.com/apache/iceberg/pull/14590]). 

The `SessionHolder.close()` method in Spark Connect is responsible for cleaning 
up a session. It does perform some cleanup such as artifacts and streaming 
queries but it doesn't perform cleanup on the main `SessionState`. This is 
where the `CatalogManager` lives which holds reference to `RESTCatalog` and 
`S3FileIO`. Since the `SessionState` is never closed, these `Closeable` 
catalogs are never closed and their threads leak.

To fix this, I added the following changes:
1. [Apache Iceberg PR|https://github.com/apache/iceberg/pull/14590] to 
implement closable catalog
2. Make `CatalogPlugin` interface extends `java.io.Closeable`
3. Implement a `close()` method in `CatalogManager` that iterates through all 
registered catalogs and calls their `close()` method
4. Make `SessionState` implements `Closeable` and calls 
`catalogManager.close()` from its `close()` method.
5. Invoke `session.sessionState.close()` from `SessionHolder.close()` when a 
Spark Connect session is stopped

Above changes create a clean lifecycle for catalogs when a session ended, a 
`close()` call is propagated down the chain which allow each catalog to release 
its resources.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to