[ https://issues.apache.org/jira/browse/SPARK-7708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564919#comment-14564919 ]
Akshat Aranya commented on SPARK-7708: -------------------------------------- Thanks, Josh. I'll look into it. I can't spend all my time on this either, but I'll continue with my PR when I get the time. > Incorrect task serialization with Kryo closure serializer > --------------------------------------------------------- > > Key: SPARK-7708 > URL: https://issues.apache.org/jira/browse/SPARK-7708 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 1.2.2 > Reporter: Akshat Aranya > > I've been investigating the use of Kryo for closure serialization with Spark > 1.2, and it seems like I've hit upon a bug: > When a task is serialized before scheduling, the following log message is > generated: > [info] o.a.s.s.TaskSetManager - Starting task 124.1 in stage 0.0 (TID 342, > <host>, PROCESS_LOCAL, 302 bytes) > This message comes from TaskSetManager which serializes the task using the > closure serializer. Before the message is sent out, the TaskDescription > (which included the original task as a byte array), is serialized again into > a byte array with the closure serializer. I added a log message for this in > CoarseGrainedSchedulerBackend, which produces the following output: > [info] o.a.s.s.c.CoarseGrainedSchedulerBackend - 124.1 size=132 > The serialized size of TaskDescription (132 bytes) turns out to be _smaller_ > than serialized task that it contains (302 bytes). This implies that > TaskDescription.buffer is not getting serialized correctly. > On the executor side, the deserialization produces a null value for > TaskDescription.buffer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org