Bobby Wang created SPARK-50168:
----------------------------------
Summary: Connect session is not released if not calling
spark.stop() explicitly
Key: SPARK-50168
URL: https://issues.apache.org/jira/browse/SPARK-50168
Project: Spark
Issue Type: Bug
Components: Connect
Affects Versions: 4.0.0
Reporter: Bobby Wang
Hi,
I found that the Spark Connect session will not be released if not calling
spark.stop() explicitly.
h2. *Repro:*
I have a python file with below code
test.py
{code:java}
from pyspark.sql import SparkSession
spark = SparkSession.builder.remote("sc://localhost").getOrCreate()
spark.range(10).show(){code}
After executing it by
{code:java}
python test.py{code}
I found the corresponding connect session is still alive from spark webui. See
this session id 96260131-a22c-4342-92df-8dc7ace5d1de item in below table
But if I have `spark.stop() been called explicitly in the python file
{code:java}
spark = SparkSession.builder.remote("sc://localhost").getOrCreate()
spark.range(10).show()
spark.stop(){code}
The connect session will be released. See the
4e8ffb4e-7684-4fa7-b750-814f9a23f2d0 item
||[User|http://localhost:4040/connect/?&sessionstat.sort=User&sessionstat.pageSize=100#sessionstat]||[Session
ID|http://localhost:4040/connect/?&sessionstat.sort=Session+ID&sessionstat.pageSize=100#sessionstat]||[Start
Time
▾|http://localhost:4040/connect/?&sessionstat.sort=Start+Time&sessionstat.desc=false&sessionstat.pageSize=100#sessionstat]||[Finish
Time|http://localhost:4040/connect/?&sessionstat.sort=Finish+Time&sessionstat.pageSize=100#sessionstat]||[Duration|http://localhost:4040/connect/?&sessionstat.sort=Duration&sessionstat.pageSize=100#sessionstat]||[Total
Execute|http://localhost:4040/connect/?&sessionstat.sort=Total+Execute&sessionstat.pageSize=100#sessionstat]||
|xxx|[4e8ffb4e-7684-4fa7-b750-814f9a23f2d0|http://localhost:4040/connect/session/?id=4e8ffb4e-7684-4fa7-b750-814f9a23f2d0]|2024/10/30
11:05:25|2024/10/30 11:05:25|78 ms|1|
|xxx|[96260131-a22c-4342-92df-8dc7ace5d1de|http://localhost:4040/connect/session/?id=96260131-a22c-4342-92df-8dc7ace5d1de]|2024/10/30
11:04:41| |13 minutes 55 seconds|0|
So I'm wondering if this is per-design or a potential bug? Since if the connect
session is not released, the connect server will
still hold the caches which will not be freed. That could blow up the connect
server/driver memory.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]