spark git commit: [SPARK-8903] Fix bug in cherry-pick of SPARK-8803

joshrosen Wed, 08 Jul 2015 15:34:07 -0700

Repository: spark
Updated Branches:
  refs/heads/branch-1.4 4df0f1b1b -> 3f6e6e0e2



[SPARK-8903] Fix bug in cherry-pick of SPARK-8803

This fixes a bug introduced in the cherry-pick of #7201 which led to a 
NullPointerException when cross-tabulating a data set that contains null values.

Author: Josh Rosen <joshro...@databricks.com>

Closes #7295 from JoshRosen/SPARK-8903 and squashes the following commits:

5489948 [Josh Rosen] [SPARK-8903] Fix bug in cherry-pick of SPARK-8803


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3f6e6e0e
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3f6e6e0e
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3f6e6e0e

Branch: refs/heads/branch-1.4
Commit: 3f6e6e0e2668832af1a54f5cb95e5a4537c7bc5a
Parents: 4df0f1b
Author: Josh Rosen <joshro...@databricks.com>
Authored: Wed Jul 8 15:33:14 2015 -0700
Committer: Josh Rosen <joshro...@databricks.com>
Committed: Wed Jul 8 15:33:14 2015 -0700

----------------------------------------------------------------------
 .../org/apache/spark/sql/execution/stat/StatFunctions.scala    | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/3f6e6e0e/sql/core/src/main/scala/org/apache/spark/sql/execution/stat/StatFunctions.scala
----------------------------------------------------------------------
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/stat/StatFunctions.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/stat/StatFunctions.scala
index 5a0c9a6..3c68028 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/stat/StatFunctions.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/stat/StatFunctions.scala
@@ -113,7 +113,7 @@ private[sql] object StatFunctions extends Logging {
       if (element == null) "null" else element.toString
     }
     // get the distinct values of column 2, so that we can make them the 
column names
-    val distinctCol2: Map[Any, Int] =
+    val distinctCol2: Map[String, Int] =
       counts.map(e => cleanElement(e.get(1))).distinct.zipWithIndex.toMap
     val columnSize = distinctCol2.size
     require(columnSize < 1e4, s"The number of distinct values for $col2, can't 
" +
@@ -128,7 +128,7 @@ private[sql] object StatFunctions extends Logging {
         countsRow.setLong(columnIndex + 1, row.getLong(2))
       }
       // the value of col1 is the first value, the rest are the counts
-      countsRow.setString(0, cleanElement(col1Item.toString))
+      countsRow.setString(0, cleanElement(col1Item))
       countsRow
     }.toSeq
     // Back ticks can't exist in DataFrame column names, therefore drop them. 
To be able to accept
@@ -139,7 +139,7 @@ private[sql] object StatFunctions extends Logging {
     // In the map, the column names (._1) are not ordered by the index (._2). 
This was the bug in
     // SPARK-8681. We need to explicitly sort by the column index and assign 
the column names.
     val headerNames = distinctCol2.toSeq.sortBy(_._2).map { r =>
-      StructField(cleanColumnName(r._1.toString), LongType)
+      StructField(cleanColumnName(r._1), LongType)
     }
     val schema = StructType(StructField(tableName, StringType) +: headerNames)
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-8903] Fix bug in cherry-pick of SPARK-8803

Reply via email to