[
https://issues.apache.org/jira/browse/SPARK-17931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892016#comment-15892016
]
Giambattista commented on SPARK-17931:
--------------------------------------
Thanks, I just opened SPARK-19796 and added required details.
> taskScheduler has some unneeded serialization
> ---------------------------------------------
>
> Key: SPARK-17931
> URL: https://issues.apache.org/jira/browse/SPARK-17931
> Project: Spark
> Issue Type: Improvement
> Components: Scheduler
> Reporter: Guoqiang Li
> Assignee: Kay Ousterhout
> Fix For: 2.2.0
>
>
> In the existing code, there are three layers of serialization
> involved in sending a task from the scheduler to an executor:
> - A Task object is serialized
> - The Task object is copied to a byte buffer that also
> contains serialized information about any additional JARs,
> files, and Properties needed for the task to execute. This
> byte buffer is stored as the member variable serializedTask
> in the TaskDescription class.
> - The TaskDescription is serialized (in addition to the serialized
> task + JARs, the TaskDescription class contains the task ID and
> other metadata) and sent in a LaunchTask message.
> While it is necessary to have two layers of serialization, so that
> the JAR, file, and Property info can be deserialized prior to
> deserializing the Task object, the third layer of deserialization is
> unnecessary (this is as a result of SPARK-2521). We should
> eliminate a layer of serialization by moving the JARs, files, and Properties
> into the TaskDescription class.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]