spark git commit: [SPARK-17350][SQL] Disable default use of KryoSerializer in Thrift Server

rxin Tue, 01 Nov 2016 16:24:01 -0700

Repository: spark
Updated Branches:
  refs/heads/master 01dd00830 -> 6e6298154



[SPARK-17350][SQL] Disable default use of KryoSerializer in Thrift Server

In SPARK-4761 / #3621 (December 2014) we enabled Kryo serialization by default 
in the Spark Thrift Server. However, I don't think that the original rationale 
for doing this still holds now that most Spark SQL serialization is now 
performed via encoders and our UnsafeRow format.

In addition, the use of Kryo as the default serializer can introduce 
performance problems because the creation of new KryoSerializer instances is 
expensive and we haven't performed instance-reuse optimizations in several code 
paths (including DirectTaskResult deserialization).

Given all of this, I propose to revert back to using JavaSerializer as the 
default serializer in the Thrift Server.

/cc liancheng

Author: Josh Rosen <joshro...@databricks.com>

Closes #14906 from JoshRosen/disable-kryo-in-thriftserver.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6e629815
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6e629815
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6e629815

Branch: refs/heads/master
Commit: 6e6298154aba63831a292117797798131a646869
Parents: 01dd008
Author: Josh Rosen <joshro...@databricks.com>
Authored: Tue Nov 1 16:23:47 2016 -0700
Committer: Reynold Xin <r...@databricks.com>
Committed: Tue Nov 1 16:23:47 2016 -0700

----------------------------------------------------------------------
 docs/configuration.md                                     |  5 ++---
 .../apache/spark/sql/hive/thriftserver/SparkSQLEnv.scala  | 10 ----------
 2 files changed, 2 insertions(+), 13 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/6e629815/docs/configuration.md
----------------------------------------------------------------------
diff --git a/docs/configuration.md b/docs/configuration.md
index 780fc94..0017219 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -767,7 +767,7 @@ Apart from these, the following properties are also 
available, and may be useful
 </tr>
 <tr>
   <td><code>spark.kryo.referenceTracking</code></td>
-  <td>true (false when using Spark SQL Thrift Server)</td>
+  <td>true</td>
   <td>
     Whether to track references to the same object when serializing data with 
Kryo, which is
     necessary if your object graphs have loops and useful for efficiency if 
they contain multiple
@@ -838,8 +838,7 @@ Apart from these, the following properties are also 
available, and may be useful
 <tr>
   <td><code>spark.serializer</code></td>
   <td>
-    org.apache.spark.serializer.<br />JavaSerializer 
(org.apache.spark.serializer.<br />
-    KryoSerializer when using Spark SQL Thrift Server)
+    org.apache.spark.serializer.<br />JavaSerializer
   </td>
   <td>
     Class to use for serializing objects that will be sent over the network or 
need to be cached

http://git-wip-us.apache.org/repos/asf/spark/blob/6e629815/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLEnv.scala
----------------------------------------------------------------------
diff --git 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLEnv.scala
 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLEnv.scala
index 6389115..78a3094 100644
--- 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLEnv.scala
+++ 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLEnv.scala
@@ -19,8 +19,6 @@ package org.apache.spark.sql.hive.thriftserver
 
 import java.io.PrintStream
 
-import scala.collection.JavaConverters._
-
 import org.apache.spark.{SparkConf, SparkContext}
 import org.apache.spark.internal.Logging
 import org.apache.spark.sql.{SparkSession, SQLContext}
@@ -37,8 +35,6 @@ private[hive] object SparkSQLEnv extends Logging {
   def init() {
     if (sqlContext == null) {
       val sparkConf = new SparkConf(loadDefaults = true)
-      val maybeSerializer = sparkConf.getOption("spark.serializer")
-      val maybeKryoReferenceTracking = 
sparkConf.getOption("spark.kryo.referenceTracking")
       // If user doesn't specify the appName, we want to get 
[SparkSQL::localHostName] instead of
       // the default appName [SparkSQLCLIDriver] in cli or beeline.
       val maybeAppName = sparkConf
@@ -47,12 +43,6 @@ private[hive] object SparkSQLEnv extends Logging {
 
       sparkConf
         
.setAppName(maybeAppName.getOrElse(s"SparkSQL::${Utils.localHostName()}"))
-        .set(
-          "spark.serializer",
-          
maybeSerializer.getOrElse("org.apache.spark.serializer.KryoSerializer"))
-        .set(
-          "spark.kryo.referenceTracking",
-          maybeKryoReferenceTracking.getOrElse("false"))
 
       val sparkSession = 
SparkSession.builder.config(sparkConf).enableHiveSupport().getOrCreate()
       sparkContext = sparkSession.sparkContext


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-17350][SQL] Disable default use of KryoSerializer in Thrift Server

Reply via email to