[ 
https://issues.apache.org/jira/browse/SPARK-10155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-10155:
---------------------------------
    Description: 
I saw a lot of `ThreadLocal` objects in the following app:
{code}
import org.apache.spark._
import org.apache.spark.sql._

object SparkApp {

  def foo(sqlContext: SQLContext): Unit = {
    import sqlContext.implicits._
    sqlContext.sparkContext.parallelize(Seq("aaa", "bbb", 
"ccc")).toDF().filter("length(_1) > 0").count()
  }

  def main(args: Array[String]): Unit = {
    val conf = new SparkConf().setAppName("sql-memory-leak")
    val sc = new SparkContext(conf)
    val sqlContext = new SQLContext(sc)
    while (true) {
      foo(sqlContext)
    }
  }
}
{code}
Running the above codes in a long time and finally it will OOM.

These "ThreadLocal"s are from 
"scala.util.parsing.combinator.Parsers.lastNoSuccessVar", which stores 
`Failure("end of input", ...)`.

There is an issue in Scala here: https://issues.scala-lang.org/browse/SI-9010
and some discussions here: https://issues.scala-lang.org/browse/SI-4929

  was:
I saw a lot of `ThreadLocal` objects in the following app:
{code}
import org.apache.spark._
import org.apache.spark.sql._

object SparkApp {

  def foo(sqlContext: SQLContext): Unit = {
    import sqlContext.implicits._
    sqlContext.sparkContext.parallelize(Seq("aaa", "bbb", 
"ccc")).toDF().filter("length(_1) > 0").count()
  }

  def main(args: Array[String]): Unit = {
    val conf = new SparkConf().setAppName("sql-memory-leak")
    val sc = new SparkContext(conf)
    val sqlContext = new SQLContext(sc)
    while (true) {
      foo(sqlContext)
    }
  }
}
{code}
Running the above codes in a long time and finally it will OOM.

These "ThreadLocal"s are from 
"scala.util.parsing.combinator.Parsers.lastNoSuccessVar", which stores 
`Failure("end of input", ...)`.

There is an issue in Scala here: https://issues.scala-lang.org/browse/SI-9010
and some discussions here: https://issues.scala-lang.org/browse/SI-4929

I tried to fix it using reflection to clear "lastNoSuccessVar" but failed 
because of the complicated byte codes generated by Scala trait mixin.

Looks the best solution is reusing Parser?


> Memory leak in SQL parsers
> --------------------------
>
>                 Key: SPARK-10155
>                 URL: https://issues.apache.org/jira/browse/SPARK-10155
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>            Reporter: Shixiong Zhu
>            Priority: Critical
>         Attachments: Screen Shot 2015-08-21 at 5.45.24 PM.png
>
>
> I saw a lot of `ThreadLocal` objects in the following app:
> {code}
> import org.apache.spark._
> import org.apache.spark.sql._
> object SparkApp {
>   def foo(sqlContext: SQLContext): Unit = {
>     import sqlContext.implicits._
>     sqlContext.sparkContext.parallelize(Seq("aaa", "bbb", 
> "ccc")).toDF().filter("length(_1) > 0").count()
>   }
>   def main(args: Array[String]): Unit = {
>     val conf = new SparkConf().setAppName("sql-memory-leak")
>     val sc = new SparkContext(conf)
>     val sqlContext = new SQLContext(sc)
>     while (true) {
>       foo(sqlContext)
>     }
>   }
> }
> {code}
> Running the above codes in a long time and finally it will OOM.
> These "ThreadLocal"s are from 
> "scala.util.parsing.combinator.Parsers.lastNoSuccessVar", which stores 
> `Failure("end of input", ...)`.
> There is an issue in Scala here: https://issues.scala-lang.org/browse/SI-9010
> and some discussions here: https://issues.scala-lang.org/browse/SI-4929



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to