[ https://issues.apache.org/jira/browse/SPARK-39866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Liu Shuo updated SPARK-39866: ----------------------------- Description: we are using Spark Thrift Server as a distributed sql query engine, the queries are related to read some datasource table with a lot of files. when we open multiple sessions, the Driver could be crashed with OOM. We can reproduce it with the following steps: # start spark thrift server; # using beeline to open a new session; # create a new datasource table; # insert one row data into this table; # open another 5 new sessions, then using `select` command to scan this table # close all the 6 session # using jmap command `jmap -histo:live pid > pid.log` to print the dump of Driver *Expected result:* The cached FileStatus should be cleaned, the number of HdfsLocatedFileStatus object should be 0. *Actural result:* ** The number of HdfsLocatedFileStatus object is 6. *!image-2022-07-26-11-54-06-826.png!* was: we are using Spark Thrift Server as a distributed sql query engine, the queries are related to read some datasource table with a lot of files. when we open multiple sessions, the Driver could be crashed with OOM. We can reproduce it with the following steps: # start spark thrift server; # using beeline to open a new session; # create a new datasource table; # insert one row data into this table; # open another 5 new sessions, then using `select` command to scan this table # close all the 6 session # using jmap command to print the dump of Driver > Memory leak when closing a session of Spark Thrift Server > ---------------------------------------------------------- > > Key: SPARK-39866 > URL: https://issues.apache.org/jira/browse/SPARK-39866 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.1.3 > Reporter: Liu Shuo > Priority: Critical > Attachments: image-2022-07-26-11-54-06-826.png > > > we are using Spark Thrift Server as a distributed sql query engine, the > queries are related to read some datasource table with a lot of files. > when we open multiple sessions, the Driver could be crashed with OOM. > We can reproduce it with the following steps: > # start spark thrift server; > # using beeline to open a new session; > # create a new datasource table; > # insert one row data into this table; > # open another 5 new sessions, then using `select` command to scan this table > # close all the 6 session > # using jmap command `jmap -histo:live pid > pid.log` to print the dump of > Driver > *Expected result:* > The cached FileStatus should be cleaned, the number of HdfsLocatedFileStatus > object should be 0. > *Actural result:* > ** The number of HdfsLocatedFileStatus object is 6. > *!image-2022-07-26-11-54-06-826.png!* -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org