[GitHub] spark pull request #16689: [SPARK-19342][SPARKR] bug fixed in collect method...

2017-02-12 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16689


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16689: [SPARK-19342][SPARKR] bug fixed in collect method...

2017-02-01 Thread titicaca
Github user titicaca commented on a diff in the pull request:

https://github.com/apache/spark/pull/16689#discussion_r98918704
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -1138,6 +1138,11 @@ setMethod("collect",
   if (!is.null(PRIMITIVE_TYPES[[colType]]) && colType != 
"binary") {
 vec <- do.call(c, col)
 stopifnot(class(vec) != "list")
+class(vec) <-
+  if (colType == "timestamp")
+c("POSIXct", "POSIXt")
--- End diff --

It looks better if it won't affect other methods. I will try it. Thanks for 
the advice.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16689: [SPARK-19342][SPARKR] bug fixed in collect method...

2017-01-31 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16689#discussion_r98615769
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -1138,6 +1138,11 @@ setMethod("collect",
   if (!is.null(PRIMITIVE_TYPES[[colType]]) && colType != 
"binary") {
 vec <- do.call(c, col)
 stopifnot(class(vec) != "list")
+class(vec) <-
+  if (colType == "timestamp")
+c("POSIXct", "POSIXt")
--- End diff --

Should `PRIMITIVE_TYPES[["timestamp"]]` be changed then
https://github.com/apache/spark/blob/master/R/pkg/R/types.R#L32


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16689: [SPARK-19342][SPARKR] bug fixed in collect method...

2017-01-30 Thread titicaca
Github user titicaca commented on a diff in the pull request:

https://github.com/apache/spark/pull/16689#discussion_r98612545
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -1138,6 +1138,11 @@ setMethod("collect",
   if (!is.null(PRIMITIVE_TYPES[[colType]]) && colType != 
"binary") {
 vec <- do.call(c, col)
 stopifnot(class(vec) != "list")
+class(vec) <-
+  if (colType == "timestamp")
+c("POSIXct", "POSIXt")
+  else
+PRIMITIVE_TYPES[[colType]]
--- End diff --

Currently all tests are passed, except for the two modified tests with NA 
types as discussed before.  The followings are the all type convertions from 
SparkDataframe to R data.frame, which have been tested in the existing tests in 
test_sparkSQL.R. 
```
PRIMITIVE_TYPES <- as.environment(list(
  "tinyint" = "integer",
  "smallint" = "integer",
  "int" = "integer",
  "bigint" = "numeric",
  "float" = "numeric",
  "double" = "numeric",
  "decimal" = "numeric",
  "string" = "character",
  "binary" = "raw",
  "boolean" = "logical",
  "timestamp" = "POSIXct",
  "date" = "Date",
  # following types are not SQL types returned by dtypes(). They are listed 
here for usage
  # by checkType() in schema.R.
  # TODO: refactor checkType() in schema.R.
  "byte" = "integer",
  "integer" = "integer"
  ))
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16689: [SPARK-19342][SPARKR] bug fixed in collect method...

2017-01-30 Thread titicaca
Github user titicaca commented on a diff in the pull request:

https://github.com/apache/spark/pull/16689#discussion_r98611766
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -1138,6 +1138,11 @@ setMethod("collect",
   if (!is.null(PRIMITIVE_TYPES[[colType]]) && colType != 
"binary") {
 vec <- do.call(c, col)
 stopifnot(class(vec) != "list")
+class(vec) <-
+  if (colType == "timestamp")
+c("POSIXct", "POSIXt")
--- End diff --

Because `PRIMITIVE_TYPES[["timestamp"]]` is POSIXct, it usually comes with 
POSIXt together. POSIXt is virtual class used to allow operations such as 
subtraction to mix the two classes POSIXct and POSIXlt.
The previous convertion will also convert timestamp to c("POSIXct", 
"POSIXt"). 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16689: [SPARK-19342][SPARKR] bug fixed in collect method...

2017-01-30 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16689#discussion_r98605236
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -1138,6 +1138,11 @@ setMethod("collect",
   if (!is.null(PRIMITIVE_TYPES[[colType]]) && colType != 
"binary") {
 vec <- do.call(c, col)
 stopifnot(class(vec) != "list")
+class(vec) <-
+  if (colType == "timestamp")
+c("POSIXct", "POSIXt")
+  else
+PRIMITIVE_TYPES[[colType]]
--- End diff --

by setting these instead of having it inferred - does this break any 
existing behavior? does any type differ because of this line of change?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16689: [SPARK-19342][SPARKR] bug fixed in collect method...

2017-01-30 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16689#discussion_r98605157
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -1138,6 +1138,11 @@ setMethod("collect",
   if (!is.null(PRIMITIVE_TYPES[[colType]]) && colType != 
"binary") {
 vec <- do.call(c, col)
 stopifnot(class(vec) != "list")
+class(vec) <-
+  if (colType == "timestamp")
+c("POSIXct", "POSIXt")
--- End diff --

why should the class be `c("POSIXct", "POSIXt")` in this case?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16689: [SPARK-19342][SPARKR] bug fixed in collect method...

2017-01-24 Thread titicaca
Github user titicaca commented on a diff in the pull request:

https://github.com/apache/spark/pull/16689#discussion_r97714703
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -1138,6 +1138,9 @@ setMethod("collect",
   if (!is.null(PRIMITIVE_TYPES[[colType]]) && colType != 
"binary") {
 vec <- do.call(c, col)
 stopifnot(class(vec) != "list")
+# If vec is an vector with only NAs, the type is 
logical
--- End diff --

In local R, if we try
```
df <- data.frame(x = c(0,1,2), y = c(NA, NA, 1))
class(head(df, 1)$y)
```
The output is still numeric instead of logical. But the existed test is 
expecting NA logical instead of NA numeric.

So is it necessary to correct the existed tests, for example 
@test_sparkSQL.R#1280
from `expect_equal(collect(select(df, first(df$age)))[[1]], NA)` to
`expect_equal(collect(select(df, first(df$age)))[[1]], NA_real_)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16689: [SPARK-19342][SPARKR] bug fixed in collect method...

2017-01-24 Thread titicaca
Github user titicaca commented on a diff in the pull request:

https://github.com/apache/spark/pull/16689#discussion_r97712469
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -1138,6 +1138,9 @@ setMethod("collect",
   if (!is.null(PRIMITIVE_TYPES[[colType]]) && colType != 
"binary") {
 vec <- do.call(c, col)
 stopifnot(class(vec) != "list")
+# If vec is an vector with only NAs, the type is 
logical
--- End diff --

Yes. My first commit was trying to cast the column to its corresponding R 
data type explicitly, even if it is an vector with all NAs. However some 
existed tests were failed and expecting to get logical NA. For example
```
3. Failure: column functions (@test_sparkSQL.R#1280) 
---
collect(select(df, first(df$age)))[[1]] not equal to NA.
Types not compatible: double vs logical
4. Failure: column functions (@test_sparkSQL.R#1282) 
---
collect(select(df, first("age")))[[1]] not equal to NA.
Types not compatible: double vs logical
``` 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16689: [SPARK-19342][SPARKR] bug fixed in collect method...

2017-01-24 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16689#discussion_r97707703
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -1138,6 +1138,9 @@ setMethod("collect",
   if (!is.null(PRIMITIVE_TYPES[[colType]]) && colType != 
"binary") {
 vec <- do.call(c, col)
 stopifnot(class(vec) != "list")
+# If vec is an vector with only NAs, the type is 
logical
--- End diff --

if the DataFrame column is of type string, shouldn't it converts to R as 
character (which can be all NA), even though the column only has NULL (which 
maps to NA in R)?

it seems with this change it would become logical in R instead of character.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16689: [SPARK-19342][SPARKR] bug fixed in collect method...

2017-01-24 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16689#discussion_r97508643
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -1136,9 +1136,17 @@ setMethod("collect",
 
   # Note that "binary" columns behave like complex types.
   if (!is.null(PRIMITIVE_TYPES[[colType]]) && colType != 
"binary") {
-vec <- do.call(c, col)
+valueIndex <- which(!is.na(col))
+if (length(valueIndex) > 0 && valueIndex[1] > 1) {
+  colTail <- col[-(1 : (valueIndex[1] - 1))]
+  vec <- do.call(c, colTail)
+  classVal <- class(vec)
+  vec <- c(rep(NA, valueIndex[1] - 1), vec)
+  class(vec) <- classVal
--- End diff --

Hmm, what happened here?
if you want to drop the NA and use the rest to infer the class you can do 
`col[!is.na(col)]`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org