[ https://issues.apache.org/jira/browse/SPARK-17931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890226#comment-15890226 ]
Giambattista commented on SPARK-17931: -------------------------------------- I just wanted to report that after this change Spark is failing in executing long SQL statements (my case they were long insert into table statements). The problem I was facing is very well described in this article https://www.drillio.com/en/2009/java-encoded-string-too-long-64kb-limit/ Eventually, I was able to get them working again with the change below. --- a/core/src/main/scala/org/apache/spark/scheduler/TaskDescription.scala +++ b/core/src/main/scala/org/apache/spark/scheduler/TaskDescription.scala @@ -86,7 +86,7 @@ private[spark] object TaskDescription { dataOut.writeInt(taskDescription.properties.size()) taskDescription.properties.asScala.foreach { case (key, value) => dataOut.writeUTF(key) - dataOut.writeUTF(value) + dataOut.writeUTF(value.substring(0, math.min(value.size, 65*1024/4))) } > taskScheduler has some unneeded serialization > --------------------------------------------- > > Key: SPARK-17931 > URL: https://issues.apache.org/jira/browse/SPARK-17931 > Project: Spark > Issue Type: Improvement > Components: Scheduler > Reporter: Guoqiang Li > Assignee: Kay Ousterhout > Fix For: 2.2.0 > > > In the existing code, there are three layers of serialization > involved in sending a task from the scheduler to an executor: > - A Task object is serialized > - The Task object is copied to a byte buffer that also > contains serialized information about any additional JARs, > files, and Properties needed for the task to execute. This > byte buffer is stored as the member variable serializedTask > in the TaskDescription class. > - The TaskDescription is serialized (in addition to the serialized > task + JARs, the TaskDescription class contains the task ID and > other metadata) and sent in a LaunchTask message. > While it is necessary to have two layers of serialization, so that > the JAR, file, and Property info can be deserialized prior to > deserializing the Task object, the third layer of deserialization is > unnecessary (this is as a result of SPARK-2521). We should > eliminate a layer of serialization by moving the JARs, files, and Properties > into the TaskDescription class. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org