[ https://issues.apache.org/jira/browse/SPARK-10155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shixiong Zhu updated SPARK-10155: --------------------------------- Description: I saw a lot of `ThreadLocal` objects in the following app: {code} import org.apache.spark._ import org.apache.spark.sql._ object SparkApp { def foo(sqlContext: SQLContext): Unit = { import sqlContext.implicits._ sqlContext.sparkContext.parallelize(Seq("aaa", "bbb", "ccc")).toDF().filter("length(_1) > 0").count() } def main(args: Array[String]): Unit = { val conf = new SparkConf().setAppName("sql-memory-leak") val sc = new SparkContext(conf) val sqlContext = new SQLContext(sc) while (true) { foo(sqlContext) } } } {code} Running the above codes in a long time and finally it will OOM. These ThreadLocal are from "scala.util.parsing.combinator.Parsers.lastNoSuccessVar", which stores `Failure("end of input", ...)`. There is an issue in Scala here: https://issues.scala-lang.org/browse/SI-9010 and some discussions here: https://issues.scala-lang.org/browse/SI-4929 I tried to fix it using reflection to clear "lastNoSuccessVar" but failed because of the complicated byte codes generated by Scala trait mixin. Looks the best solution is reusing Parser? was: I saw a lot of `ThreadLocal` objects in the following app: {code} import org.apache.spark._ import org.apache.spark.sql._ object SparkApp { def foo(sqlContext: SQLContext): Unit = { import sqlContext.implicits._ sqlContext.sparkContext.parallelize(Seq("aaa", "bbb", "ccc")).toDF().filter("length(_1) > 0").count() } def main(args: Array[String]): Unit = { val conf = new SparkConf().setAppName("sql-memory-leak") val sc = new SparkContext(conf) val sqlContext = new SQLContext(sc) while (true) { foo(sqlContext) } } } {code} Running the above codes in a long time and finally it will OOM. These ThreadLocal are from "scala.util.parsing.combinator.Parsers.lastNoSuccessVar", which stores `Failure("end of input", ...)`. There is an issue in Scala here: https://issues.scala-lang.org/browse/SI-9010 and some discussions here: https://issues.scala-lang.org/browse/SI-4929 I tried to fix it using reflection but failed because of the complicated byte codes generated by Scala trait mixin. Looks the best solution is reusing Parser? > Memory leak in SQL parsers > -------------------------- > > Key: SPARK-10155 > URL: https://issues.apache.org/jira/browse/SPARK-10155 > Project: Spark > Issue Type: Bug > Components: SQL > Reporter: Shixiong Zhu > Priority: Critical > > I saw a lot of `ThreadLocal` objects in the following app: > {code} > import org.apache.spark._ > import org.apache.spark.sql._ > object SparkApp { > def foo(sqlContext: SQLContext): Unit = { > import sqlContext.implicits._ > sqlContext.sparkContext.parallelize(Seq("aaa", "bbb", > "ccc")).toDF().filter("length(_1) > 0").count() > } > def main(args: Array[String]): Unit = { > val conf = new SparkConf().setAppName("sql-memory-leak") > val sc = new SparkContext(conf) > val sqlContext = new SQLContext(sc) > while (true) { > foo(sqlContext) > } > } > } > {code} > Running the above codes in a long time and finally it will OOM. > These ThreadLocal are from > "scala.util.parsing.combinator.Parsers.lastNoSuccessVar", which stores > `Failure("end of input", ...)`. > There is an issue in Scala here: https://issues.scala-lang.org/browse/SI-9010 > and some discussions here: https://issues.scala-lang.org/browse/SI-4929 > I tried to fix it using reflection to clear "lastNoSuccessVar" but failed > because of the complicated byte codes generated by Scala trait mixin. > Looks the best solution is reusing Parser? -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org