spark git commit: [SPARK-15649][SQL] Avoid to serialize MetastoreRelation in HiveTableScanExec

rxin Tue, 31 May 2016 09:22:11 -0700

Repository: spark
Updated Branches:
  refs/heads/master 95db8a44f -> 2bfc4f152



[SPARK-15649][SQL] Avoid to serialize MetastoreRelation in HiveTableScanExec

## What changes were proposed in this pull request?
in HiveTableScanExec, schema is lazy and is related with relation.attributeMap. 
So it needs to serialize MetastoreRelation when serializing task binary 
bytes.It can avoid to serialize MetastoreRelation.

## How was this patch tested?

Author: Lianhui Wang <lianhuiwan...@gmail.com>

Closes #13397 from lianhuiwang/avoid-serialize.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2bfc4f15
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2bfc4f15
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2bfc4f15

Branch: refs/heads/master
Commit: 2bfc4f15214a870b3e067f06f37eb506b0070a1f
Parents: 95db8a4
Author: Lianhui Wang <lianhuiwan...@gmail.com>
Authored: Tue May 31 09:21:51 2016 -0700
Committer: Reynold Xin <r...@databricks.com>
Committed: Tue May 31 09:21:51 2016 -0700

----------------------------------------------------------------------
 .../org/apache/spark/sql/hive/execution/HiveTableScanExec.scala  | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/2bfc4f15/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScanExec.scala
----------------------------------------------------------------------
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScanExec.scala
 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScanExec.scala
index e29864f..cc3e74b 100644
--- 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScanExec.scala
+++ 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScanExec.scala
@@ -152,8 +152,10 @@ case class HiveTableScanExec(
       }
     }
     val numOutputRows = longMetric("numOutputRows")
+    // Avoid to serialize MetastoreRelation because schema is lazy. (see 
SPARK-15649)
+    val outputSchema = schema
     rdd.mapPartitionsInternal { iter =>
-      val proj = UnsafeProjection.create(schema)
+      val proj = UnsafeProjection.create(outputSchema)
       iter.map { r =>
         numOutputRows += 1
         proj(r)


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-15649][SQL] Avoid to serialize MetastoreRelation in HiveTableScanExec

Reply via email to