[ 
https://issues.apache.org/jira/browse/SPARK-17930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15577506#comment-15577506
 ] 

Guoqiang Li commented on SPARK-17930:
-------------------------------------

If a stage contains a lot of tasks, eg one million tasks, the code here needs 
to create one million SerializerInstance instances, which seriously affects the 
performance of the DAG. At least we can reuse the SerializerInstance instance 
per stage.

> The SerializerInstance instance used when deserializing a TaskResult is not 
> reused 
> -----------------------------------------------------------------------------------
>
>                 Key: SPARK-17930
>                 URL: https://issues.apache.org/jira/browse/SPARK-17930
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 1.6.1, 2.0.1
>            Reporter: Guoqiang Li
>
> The following code is called when the DirectTaskResult instance is 
> deserialized
> {noformat}
>   def value(): T = {
>     if (valueObjectDeserialized) {
>       valueObject
>     } else {
>       // Each deserialization creates a new instance of SerializerInstance, 
> which is very time-consuming
>       val resultSer = SparkEnv.get.serializer.newInstance()
>       valueObject = resultSer.deserialize(valueBytes)
>       valueObjectDeserialized = true
>       valueObject
>     }
>   }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to