[ https://issues.apache.org/jira/browse/SPARK-17930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15577506#comment-15577506 ]
Guoqiang Li commented on SPARK-17930: ------------------------------------- If a stage contains a lot of tasks, eg one million tasks, the code here needs to create one million SerializerInstance instances, which seriously affects the performance of the DAG. At least we can reuse the SerializerInstance instance per stage. > The SerializerInstance instance used when deserializing a TaskResult is not > reused > ----------------------------------------------------------------------------------- > > Key: SPARK-17930 > URL: https://issues.apache.org/jira/browse/SPARK-17930 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 1.6.1, 2.0.1 > Reporter: Guoqiang Li > > The following code is called when the DirectTaskResult instance is > deserialized > {noformat} > def value(): T = { > if (valueObjectDeserialized) { > valueObject > } else { > // Each deserialization creates a new instance of SerializerInstance, > which is very time-consuming > val resultSer = SparkEnv.get.serializer.newInstance() > valueObject = resultSer.deserialize(valueBytes) > valueObjectDeserialized = true > valueObject > } > } > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org