Repository: spark Updated Branches: refs/heads/master 95db8a44f -> 2bfc4f152
[SPARK-15649][SQL] Avoid to serialize MetastoreRelation in HiveTableScanExec ## What changes were proposed in this pull request? in HiveTableScanExec, schema is lazy and is related with relation.attributeMap. So it needs to serialize MetastoreRelation when serializing task binary bytes.It can avoid to serialize MetastoreRelation. ## How was this patch tested? Author: Lianhui Wang <lianhuiwan...@gmail.com> Closes #13397 from lianhuiwang/avoid-serialize. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2bfc4f15 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2bfc4f15 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2bfc4f15 Branch: refs/heads/master Commit: 2bfc4f15214a870b3e067f06f37eb506b0070a1f Parents: 95db8a4 Author: Lianhui Wang <lianhuiwan...@gmail.com> Authored: Tue May 31 09:21:51 2016 -0700 Committer: Reynold Xin <r...@databricks.com> Committed: Tue May 31 09:21:51 2016 -0700 ---------------------------------------------------------------------- .../org/apache/spark/sql/hive/execution/HiveTableScanExec.scala | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/2bfc4f15/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScanExec.scala ---------------------------------------------------------------------- diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScanExec.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScanExec.scala index e29864f..cc3e74b 100644 --- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScanExec.scala +++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScanExec.scala @@ -152,8 +152,10 @@ case class HiveTableScanExec( } } val numOutputRows = longMetric("numOutputRows") + // Avoid to serialize MetastoreRelation because schema is lazy. (see SPARK-15649) + val outputSchema = schema rdd.mapPartitionsInternal { iter => - val proj = UnsafeProjection.create(schema) + val proj = UnsafeProjection.create(outputSchema) iter.map { r => numOutputRows += 1 proj(r) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org