spark git commit: [SPARK-16425][R] `describe()` should not fail with non-numeric columns
Repository: spark Updated Branches: refs/heads/branch-2.0 5828da41c -> 73c764a04 [SPARK-16425][R] `describe()` should not fail with non-numeric columns ## What changes were proposed in this pull request? This PR prevents ERRORs when `summary(df)` is called for `SparkDataFrame` with not-numeric columns. This failure happens only in `SparkR`. **Before** ```r > df <- createDataFrame(faithful) > df <- withColumn(df, "boolean", df$waiting==79) > summary(df) 16/07/07 14:15:16 ERROR RBackendHandler: describe on 34 failed Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) : org.apache.spark.sql.AnalysisException: cannot resolve 'avg(`boolean`)' due to data type mismatch: function average requires numeric types, not BooleanType; ``` **After** ```r > df <- createDataFrame(faithful) > df <- withColumn(df, "boolean", df$waiting==79) > summary(df) SparkDataFrame[summary:string, eruptions:string, waiting:string] ``` ## How was this patch tested? Pass the Jenkins with a updated testcase. Author: Dongjoon HyunCloses #14096 from dongjoon-hyun/SPARK-16425. (cherry picked from commit 6aa7d09f4e126f42e41085dec169c813379ed354) Signed-off-by: Shivaram Venkataraman Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/73c764a0 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/73c764a0 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/73c764a0 Branch: refs/heads/branch-2.0 Commit: 73c764a047f795c85909c7a7ea4324f286d2aafa Parents: 5828da4 Author: Dongjoon Hyun Authored: Thu Jul 7 17:47:29 2016 -0700 Committer: Shivaram Venkataraman Committed: Thu Jul 7 17:47:38 2016 -0700 -- R/pkg/R/DataFrame.R | 3 +-- R/pkg/inst/tests/testthat/test_sparkSQL.R | 8 ++-- 2 files changed, 7 insertions(+), 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/73c764a0/R/pkg/R/DataFrame.R -- diff --git a/R/pkg/R/DataFrame.R b/R/pkg/R/DataFrame.R index 17474d4..ec09aab 100644 --- a/R/pkg/R/DataFrame.R +++ b/R/pkg/R/DataFrame.R @@ -2617,8 +2617,7 @@ setMethod("describe", setMethod("describe", signature(x = "SparkDataFrame"), function(x) { -colList <- as.list(c(columns(x))) -sdf <- callJMethod(x@sdf, "describe", colList) +sdf <- callJMethod(x@sdf, "describe", list()) dataFrame(sdf) }) http://git-wip-us.apache.org/repos/asf/spark/blob/73c764a0/R/pkg/inst/tests/testthat/test_sparkSQL.R -- diff --git a/R/pkg/inst/tests/testthat/test_sparkSQL.R b/R/pkg/inst/tests/testthat/test_sparkSQL.R index 003fcce..755aded 100644 --- a/R/pkg/inst/tests/testthat/test_sparkSQL.R +++ b/R/pkg/inst/tests/testthat/test_sparkSQL.R @@ -1816,13 +1816,17 @@ test_that("describe() and summarize() on a DataFrame", { expect_equal(collect(stats)[2, "age"], "24.5") expect_equal(collect(stats)[3, "age"], "7.7781745930520225") stats <- describe(df) - expect_equal(collect(stats)[4, "name"], "Andy") + expect_equal(collect(stats)[4, "name"], NULL) expect_equal(collect(stats)[5, "age"], "30") stats2 <- summary(df) - expect_equal(collect(stats2)[4, "name"], "Andy") + expect_equal(collect(stats2)[4, "name"], NULL) expect_equal(collect(stats2)[5, "age"], "30") + # SPARK-16425: SparkR summary() fails on column of type logical + df <- withColumn(df, "boolean", df$age == 30) + summary(df) + # Test base::summary is working expect_equal(length(summary(attenu, digits = 4)), 35) }) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-16425][R] `describe()` should not fail with non-numeric columns
Repository: spark Updated Branches: refs/heads/master f4767bcc7 -> 6aa7d09f4 [SPARK-16425][R] `describe()` should not fail with non-numeric columns ## What changes were proposed in this pull request? This PR prevents ERRORs when `summary(df)` is called for `SparkDataFrame` with not-numeric columns. This failure happens only in `SparkR`. **Before** ```r > df <- createDataFrame(faithful) > df <- withColumn(df, "boolean", df$waiting==79) > summary(df) 16/07/07 14:15:16 ERROR RBackendHandler: describe on 34 failed Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) : org.apache.spark.sql.AnalysisException: cannot resolve 'avg(`boolean`)' due to data type mismatch: function average requires numeric types, not BooleanType; ``` **After** ```r > df <- createDataFrame(faithful) > df <- withColumn(df, "boolean", df$waiting==79) > summary(df) SparkDataFrame[summary:string, eruptions:string, waiting:string] ``` ## How was this patch tested? Pass the Jenkins with a updated testcase. Author: Dongjoon HyunCloses #14096 from dongjoon-hyun/SPARK-16425. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6aa7d09f Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6aa7d09f Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6aa7d09f Branch: refs/heads/master Commit: 6aa7d09f4e126f42e41085dec169c813379ed354 Parents: f4767bc Author: Dongjoon Hyun Authored: Thu Jul 7 17:47:29 2016 -0700 Committer: Shivaram Venkataraman Committed: Thu Jul 7 17:47:29 2016 -0700 -- R/pkg/R/DataFrame.R | 3 +-- R/pkg/inst/tests/testthat/test_sparkSQL.R | 8 ++-- 2 files changed, 7 insertions(+), 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/6aa7d09f/R/pkg/R/DataFrame.R -- diff --git a/R/pkg/R/DataFrame.R b/R/pkg/R/DataFrame.R index 5944bbc..a18eee3 100644 --- a/R/pkg/R/DataFrame.R +++ b/R/pkg/R/DataFrame.R @@ -2622,8 +2622,7 @@ setMethod("describe", setMethod("describe", signature(x = "SparkDataFrame"), function(x) { -colList <- as.list(c(columns(x))) -sdf <- callJMethod(x@sdf, "describe", colList) +sdf <- callJMethod(x@sdf, "describe", list()) dataFrame(sdf) }) http://git-wip-us.apache.org/repos/asf/spark/blob/6aa7d09f/R/pkg/inst/tests/testthat/test_sparkSQL.R -- diff --git a/R/pkg/inst/tests/testthat/test_sparkSQL.R b/R/pkg/inst/tests/testthat/test_sparkSQL.R index a0ab719..e2a1da0 100644 --- a/R/pkg/inst/tests/testthat/test_sparkSQL.R +++ b/R/pkg/inst/tests/testthat/test_sparkSQL.R @@ -1824,13 +1824,17 @@ test_that("describe() and summarize() on a DataFrame", { expect_equal(collect(stats)[2, "age"], "24.5") expect_equal(collect(stats)[3, "age"], "7.7781745930520225") stats <- describe(df) - expect_equal(collect(stats)[4, "name"], "Andy") + expect_equal(collect(stats)[4, "name"], NULL) expect_equal(collect(stats)[5, "age"], "30") stats2 <- summary(df) - expect_equal(collect(stats2)[4, "name"], "Andy") + expect_equal(collect(stats2)[4, "name"], NULL) expect_equal(collect(stats2)[5, "age"], "30") + # SPARK-16425: SparkR summary() fails on column of type logical + df <- withColumn(df, "boolean", df$age == 30) + summary(df) + # Test base::summary is working expect_equal(length(summary(attenu, digits = 4)), 35) }) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org