[ 
https://issues.apache.org/jira/browse/SPARK-17356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15454495#comment-15454495
 ] 

Sean Zhong edited comment on SPARK-17356 at 9/1/16 6:38 AM:
------------------------------------------------------------

*Root cause:*

1. MLLib heavily leverage MetaData to store a lot of attribute information, in 
the case here, the metadata may contains tens of thousands of Attribute 
information. And the meta data may be stored to Alias expression like this:
{code}
case class Alias(child: Expression, name: String)(
    val exprId: ExprId = NamedExpression.newExprId,
    val qualifier: Option[String] = None,
    val explicitMetadata: Option[Metadata] = None,
    override val isGenerated: java.lang.Boolean = false)
{code} 

If we serialize the meta data to JSON, it will take a huge amount of memory.



was (Author: clockfly):
Root cause:

1. MLLib heavily leverage MetaData to store a lot of attribute information, in 
the case here, the metadata may contains tens of thousands of Attribute 
information. And the meta data may be stored to Alias expression like this:
{code}
case class Alias(child: Expression, name: String)(
    val exprId: ExprId = NamedExpression.newExprId,
    val qualifier: Option[String] = None,
    val explicitMetadata: Option[Metadata] = None,
    override val isGenerated: java.lang.Boolean = false)
{code} 

If we serialize the meta data to JSON, it will take a huge amount of memory.


> Out of memory when calling TreeNode.toJSON
> ------------------------------------------
>
>                 Key: SPARK-17356
>                 URL: https://issues.apache.org/jira/browse/SPARK-17356
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>            Reporter: Sean Zhong
>         Attachments: jmap.txt, jstack.txt, queryplan.txt
>
>
> When using MLLib, when calling toJSON on a plan with many level of 
> sub-queries, it may cause out of memory exception with stack trace like this
> {code}
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>       at scala.collection.mutable.AbstractSeq.<init>(Seq.scala:47)
>       at scala.collection.mutable.AbstractBuffer.<init>(Buffer.scala:48)
>       at scala.collection.mutable.ListBuffer.<init>(ListBuffer.scala:46)
>       at scala.collection.immutable.List$.newBuilder(List.scala:396)
>       at 
> scala.collection.generic.GenericTraversableTemplate$class.newBuilder(GenericTraversableTemplate.scala:64)
>       at 
> scala.collection.AbstractTraversable.newBuilder(Traversable.scala:105)
>       at 
> scala.collection.TraversableLike$class.filter(TraversableLike.scala:262)
>       at scala.collection.AbstractTraversable.filter(Traversable.scala:105)
>       at 
> scala.collection.TraversableLike$class.filterNot(TraversableLike.scala:274)
>       at scala.collection.AbstractTraversable.filterNot(Traversable.scala:105)
>       at 
> org.json4s.jackson.JValueSerializer.serialize(JValueSerializer.scala:25)
>       at 
> org.json4s.jackson.JValueSerializer.serialize(JValueSerializer.scala:20)
>       at 
> org.json4s.jackson.JValueSerializer.serialize(JValueSerializer.scala:25)
>       at 
> org.json4s.jackson.JValueSerializer.serialize(JValueSerializer.scala:25)
>       at 
> org.json4s.jackson.JValueSerializer.serialize(JValueSerializer.scala:25)
>       at 
> org.json4s.jackson.JValueSerializer.serialize(JValueSerializer.scala:25)
>       at 
> org.json4s.jackson.JValueSerializer.serialize(JValueSerializer.scala:20)
>       at 
> org.json4s.jackson.JValueSerializer.serialize(JValueSerializer.scala:20)
>       at 
> org.json4s.jackson.JValueSerializer.serialize(JValueSerializer.scala:25)
>       at 
> org.json4s.jackson.JValueSerializer.serialize(JValueSerializer.scala:20)
>       at 
> org.json4s.jackson.JValueSerializer.serialize(JValueSerializer.scala:7)
>       at 
> com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:128)
>       at 
> com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:2881)
>       at 
> com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2338)
>       at org.json4s.jackson.JsonMethods$class.compact(JsonMethods.scala:34)
>       at org.json4s.jackson.JsonMethods$.compact(JsonMethods.scala:50)
>       at 
> org.apache.spark.sql.catalyst.trees.TreeNode.toJSON(TreeNode.scala:566)
> {code}
> The query plan, stack trace, and jmap distribution is attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to