spark git commit: [SPARK-16425][R] `describe()` should not fail with non-numeric columns

2016-07-07 Thread shivaram
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 5828da41c -> 73c764a04


[SPARK-16425][R] `describe()` should not fail with non-numeric columns

## What changes were proposed in this pull request?

This PR prevents ERRORs when `summary(df)` is called for `SparkDataFrame` with 
not-numeric columns. This failure happens only in `SparkR`.

**Before**
```r
> df <- createDataFrame(faithful)
> df <- withColumn(df, "boolean", df$waiting==79)
> summary(df)
16/07/07 14:15:16 ERROR RBackendHandler: describe on 34 failed
Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) :
  org.apache.spark.sql.AnalysisException: cannot resolve 'avg(`boolean`)' due 
to data type mismatch: function average requires numeric types, not BooleanType;
```

**After**
```r
> df <- createDataFrame(faithful)
> df <- withColumn(df, "boolean", df$waiting==79)
> summary(df)
SparkDataFrame[summary:string, eruptions:string, waiting:string]
```

## How was this patch tested?

Pass the Jenkins with a updated testcase.

Author: Dongjoon Hyun 

Closes #14096 from dongjoon-hyun/SPARK-16425.

(cherry picked from commit 6aa7d09f4e126f42e41085dec169c813379ed354)
Signed-off-by: Shivaram Venkataraman 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/73c764a0
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/73c764a0
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/73c764a0

Branch: refs/heads/branch-2.0
Commit: 73c764a047f795c85909c7a7ea4324f286d2aafa
Parents: 5828da4
Author: Dongjoon Hyun 
Authored: Thu Jul 7 17:47:29 2016 -0700
Committer: Shivaram Venkataraman 
Committed: Thu Jul 7 17:47:38 2016 -0700

--
 R/pkg/R/DataFrame.R   | 3 +--
 R/pkg/inst/tests/testthat/test_sparkSQL.R | 8 ++--
 2 files changed, 7 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/73c764a0/R/pkg/R/DataFrame.R
--
diff --git a/R/pkg/R/DataFrame.R b/R/pkg/R/DataFrame.R
index 17474d4..ec09aab 100644
--- a/R/pkg/R/DataFrame.R
+++ b/R/pkg/R/DataFrame.R
@@ -2617,8 +2617,7 @@ setMethod("describe",
 setMethod("describe",
   signature(x = "SparkDataFrame"),
   function(x) {
-colList <- as.list(c(columns(x)))
-sdf <- callJMethod(x@sdf, "describe", colList)
+sdf <- callJMethod(x@sdf, "describe", list())
 dataFrame(sdf)
   })
 

http://git-wip-us.apache.org/repos/asf/spark/blob/73c764a0/R/pkg/inst/tests/testthat/test_sparkSQL.R
--
diff --git a/R/pkg/inst/tests/testthat/test_sparkSQL.R 
b/R/pkg/inst/tests/testthat/test_sparkSQL.R
index 003fcce..755aded 100644
--- a/R/pkg/inst/tests/testthat/test_sparkSQL.R
+++ b/R/pkg/inst/tests/testthat/test_sparkSQL.R
@@ -1816,13 +1816,17 @@ test_that("describe() and summarize() on a DataFrame", {
   expect_equal(collect(stats)[2, "age"], "24.5")
   expect_equal(collect(stats)[3, "age"], "7.7781745930520225")
   stats <- describe(df)
-  expect_equal(collect(stats)[4, "name"], "Andy")
+  expect_equal(collect(stats)[4, "name"], NULL)
   expect_equal(collect(stats)[5, "age"], "30")
 
   stats2 <- summary(df)
-  expect_equal(collect(stats2)[4, "name"], "Andy")
+  expect_equal(collect(stats2)[4, "name"], NULL)
   expect_equal(collect(stats2)[5, "age"], "30")
 
+  # SPARK-16425: SparkR summary() fails on column of type logical
+  df <- withColumn(df, "boolean", df$age == 30)
+  summary(df)
+
   # Test base::summary is working
   expect_equal(length(summary(attenu, digits = 4)), 35)
 })


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-16425][R] `describe()` should not fail with non-numeric columns

2016-07-07 Thread shivaram
Repository: spark
Updated Branches:
  refs/heads/master f4767bcc7 -> 6aa7d09f4


[SPARK-16425][R] `describe()` should not fail with non-numeric columns

## What changes were proposed in this pull request?

This PR prevents ERRORs when `summary(df)` is called for `SparkDataFrame` with 
not-numeric columns. This failure happens only in `SparkR`.

**Before**
```r
> df <- createDataFrame(faithful)
> df <- withColumn(df, "boolean", df$waiting==79)
> summary(df)
16/07/07 14:15:16 ERROR RBackendHandler: describe on 34 failed
Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) :
  org.apache.spark.sql.AnalysisException: cannot resolve 'avg(`boolean`)' due 
to data type mismatch: function average requires numeric types, not BooleanType;
```

**After**
```r
> df <- createDataFrame(faithful)
> df <- withColumn(df, "boolean", df$waiting==79)
> summary(df)
SparkDataFrame[summary:string, eruptions:string, waiting:string]
```

## How was this patch tested?

Pass the Jenkins with a updated testcase.

Author: Dongjoon Hyun 

Closes #14096 from dongjoon-hyun/SPARK-16425.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6aa7d09f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6aa7d09f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6aa7d09f

Branch: refs/heads/master
Commit: 6aa7d09f4e126f42e41085dec169c813379ed354
Parents: f4767bc
Author: Dongjoon Hyun 
Authored: Thu Jul 7 17:47:29 2016 -0700
Committer: Shivaram Venkataraman 
Committed: Thu Jul 7 17:47:29 2016 -0700

--
 R/pkg/R/DataFrame.R   | 3 +--
 R/pkg/inst/tests/testthat/test_sparkSQL.R | 8 ++--
 2 files changed, 7 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/6aa7d09f/R/pkg/R/DataFrame.R
--
diff --git a/R/pkg/R/DataFrame.R b/R/pkg/R/DataFrame.R
index 5944bbc..a18eee3 100644
--- a/R/pkg/R/DataFrame.R
+++ b/R/pkg/R/DataFrame.R
@@ -2622,8 +2622,7 @@ setMethod("describe",
 setMethod("describe",
   signature(x = "SparkDataFrame"),
   function(x) {
-colList <- as.list(c(columns(x)))
-sdf <- callJMethod(x@sdf, "describe", colList)
+sdf <- callJMethod(x@sdf, "describe", list())
 dataFrame(sdf)
   })
 

http://git-wip-us.apache.org/repos/asf/spark/blob/6aa7d09f/R/pkg/inst/tests/testthat/test_sparkSQL.R
--
diff --git a/R/pkg/inst/tests/testthat/test_sparkSQL.R 
b/R/pkg/inst/tests/testthat/test_sparkSQL.R
index a0ab719..e2a1da0 100644
--- a/R/pkg/inst/tests/testthat/test_sparkSQL.R
+++ b/R/pkg/inst/tests/testthat/test_sparkSQL.R
@@ -1824,13 +1824,17 @@ test_that("describe() and summarize() on a DataFrame", {
   expect_equal(collect(stats)[2, "age"], "24.5")
   expect_equal(collect(stats)[3, "age"], "7.7781745930520225")
   stats <- describe(df)
-  expect_equal(collect(stats)[4, "name"], "Andy")
+  expect_equal(collect(stats)[4, "name"], NULL)
   expect_equal(collect(stats)[5, "age"], "30")
 
   stats2 <- summary(df)
-  expect_equal(collect(stats2)[4, "name"], "Andy")
+  expect_equal(collect(stats2)[4, "name"], NULL)
   expect_equal(collect(stats2)[5, "age"], "30")
 
+  # SPARK-16425: SparkR summary() fails on column of type logical
+  df <- withColumn(df, "boolean", df$age == 30)
+  summary(df)
+
   # Test base::summary is working
   expect_equal(length(summary(attenu, digits = 4)), 35)
 })


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org