[ 
https://issues.apache.org/jira/browse/SPARK-39866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liu Shuo updated SPARK-39866:
-----------------------------
    Description: 
we are using Spark Thrift Server as a distributed sql query engine, the queries 
are related to read some datasource table with a lot of files.

when we open multiple sessions, the Driver could be crashed with OOM.

We can reproduce it with the following steps:
 # start spark thrift server;
 # using beeline to open a new session;
 # create a new datasource table;
 # insert one row data into this table;
 # open another 5 new sessions, then using `select` command to scan this table
 # close all the 6 session
 # using jmap command `jmap -histo:live pid > pid.log` to print the dump of 
Driver

*Expected result:*

The cached FileStatus should be cleaned, the number of HdfsLocatedFileStatus 
object should be 0.

*Actural result:*

 ** The number of HdfsLocatedFileStatus object  is 6.

*!image-2022-07-26-11-54-06-826.png!*

  was:
we are using Spark Thrift Server as a distributed sql query engine, the queries 
are related to read some datasource table with a lot of files.

when we open multiple sessions, the Driver could be crashed with OOM.

We can reproduce it with the following steps:
 # start spark thrift server;
 # using beeline to open a new session;
 # create a new datasource table;
 # insert one row data into this table;
 # open another 5 new sessions, then using `select` command to scan this table
 # close all the 6 session
 # using jmap command to print the dump of Driver


> Memory leak when closing a  session of Spark Thrift Server
> ----------------------------------------------------------
>
>                 Key: SPARK-39866
>                 URL: https://issues.apache.org/jira/browse/SPARK-39866
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.1.3
>            Reporter: Liu Shuo
>            Priority: Critical
>         Attachments: image-2022-07-26-11-54-06-826.png
>
>
> we are using Spark Thrift Server as a distributed sql query engine, the 
> queries are related to read some datasource table with a lot of files.
> when we open multiple sessions, the Driver could be crashed with OOM.
> We can reproduce it with the following steps:
>  # start spark thrift server;
>  # using beeline to open a new session;
>  # create a new datasource table;
>  # insert one row data into this table;
>  # open another 5 new sessions, then using `select` command to scan this table
>  # close all the 6 session
>  # using jmap command `jmap -histo:live pid > pid.log` to print the dump of 
> Driver
> *Expected result:*
> The cached FileStatus should be cleaned, the number of HdfsLocatedFileStatus 
> object should be 0.
> *Actural result:*
>  ** The number of HdfsLocatedFileStatus object  is 6.
> *!image-2022-07-26-11-54-06-826.png!*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to