[GitHub] spark issue #16689: [SPARK-19342][SPARKR] bug fixed in collect method for co...

2017-02-13 Thread titicaca
Github user titicaca commented on the issue:

https://github.com/apache/spark/pull/16689
  
Thanks for the reminder. I may have forgotten to mention that I am the 
reporter of this JIRA bug. My JIRA ID is also titicaca. Thank you! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16689: [SPARK-19342][SPARKR] bug fixed in collect method for co...

2017-02-12 Thread titicaca
Github user titicaca commented on the issue:

https://github.com/apache/spark/pull/16689
  
Yes. The JIRA id is SPARK-19342. Thank you for the help and advices :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16689: [SPARK-19342][SPARKR] bug fixed in collect method for co...

2017-02-09 Thread titicaca
Github user titicaca commented on the issue:

https://github.com/apache/spark/pull/16689
  
Yes, collect on timestamp was getting `c("POSIXct", "POSIXt")`. But when NA 
exists at the top of the timetamp column, it was getting `numeric` as I 
described in the PR description.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16689: [SPARK-19342][SPARKR] bug fixed in collect method for co...

2017-02-04 Thread titicaca
Github user titicaca commented on the issue:

https://github.com/apache/spark/pull/16689
  
Thanks. I tried to fix the method `coltypes` for the modification of the 
timestamp, and it can pass all the tests now. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16689: [SPARK-19342][SPARKR] bug fixed in collect method for co...

2017-02-01 Thread titicaca
Github user titicaca commented on the issue:

https://github.com/apache/spark/pull/16689
  
I tried to modify the PRIMITIVE_TYPES for timestamp, but it had a side 
effect on coltypes method.

In test_sparkSQL.R#2262, `expect_equal(coltypes(DF), c("integer", 
"logical", "POSIXct"))`, coltypes return a list instead of a vector because of 
the convertion from timestamp to `c(POSIXct, POSIXt)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16689: [SPARK-19342][SPARKR] bug fixed in collect method...

2017-02-01 Thread titicaca
Github user titicaca commented on a diff in the pull request:

https://github.com/apache/spark/pull/16689#discussion_r98918704
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -1138,6 +1138,11 @@ setMethod("collect",
   if (!is.null(PRIMITIVE_TYPES[[colType]]) && colType != 
"binary") {
 vec <- do.call(c, col)
 stopifnot(class(vec) != "list")
+class(vec) <-
+  if (colType == "timestamp")
+c("POSIXct", "POSIXt")
--- End diff --

It looks better if it won't affect other methods. I will try it. Thanks for 
the advice.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16689: [SPARK-19342][SPARKR] bug fixed in collect method...

2017-01-30 Thread titicaca
Github user titicaca commented on a diff in the pull request:

https://github.com/apache/spark/pull/16689#discussion_r98612545
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -1138,6 +1138,11 @@ setMethod("collect",
   if (!is.null(PRIMITIVE_TYPES[[colType]]) && colType != 
"binary") {
 vec <- do.call(c, col)
 stopifnot(class(vec) != "list")
+class(vec) <-
+  if (colType == "timestamp")
+c("POSIXct", "POSIXt")
+  else
+PRIMITIVE_TYPES[[colType]]
--- End diff --

Currently all tests are passed, except for the two modified tests with NA 
types as discussed before.  The followings are the all type convertions from 
SparkDataframe to R data.frame, which have been tested in the existing tests in 
test_sparkSQL.R. 
```
PRIMITIVE_TYPES <- as.environment(list(
  "tinyint" = "integer",
  "smallint" = "integer",
  "int" = "integer",
  "bigint" = "numeric",
  "float" = "numeric",
  "double" = "numeric",
  "decimal" = "numeric",
  "string" = "character",
  "binary" = "raw",
  "boolean" = "logical",
  "timestamp" = "POSIXct",
  "date" = "Date",
  # following types are not SQL types returned by dtypes(). They are listed 
here for usage
  # by checkType() in schema.R.
  # TODO: refactor checkType() in schema.R.
  "byte" = "integer",
  "integer" = "integer"
  ))
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16689: [SPARK-19342][SPARKR] bug fixed in collect method...

2017-01-30 Thread titicaca
Github user titicaca commented on a diff in the pull request:

https://github.com/apache/spark/pull/16689#discussion_r98611766
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -1138,6 +1138,11 @@ setMethod("collect",
   if (!is.null(PRIMITIVE_TYPES[[colType]]) && colType != 
"binary") {
 vec <- do.call(c, col)
 stopifnot(class(vec) != "list")
+class(vec) <-
+  if (colType == "timestamp")
+c("POSIXct", "POSIXt")
--- End diff --

Because `PRIMITIVE_TYPES[["timestamp"]]` is POSIXct, it usually comes with 
POSIXt together. POSIXt is virtual class used to allow operations such as 
subtraction to mix the two classes POSIXct and POSIXlt.
The previous convertion will also convert timestamp to c("POSIXct", 
"POSIXt"). 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16689: [SPARK-19342][SPARKR] bug fixed in collect method for co...

2017-01-25 Thread titicaca
Github user titicaca commented on the issue:

https://github.com/apache/spark/pull/16689
  
I have modified the codes and tests, including the existed tests 
@test_sparkSQL.R#1280 and @test_sparkSQL.R#1282. 

Like in local R,  now NA column of the SparkDataFrame will also be 
collected as its corresponding type instead of logical NA.   




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16689: [SPARK-19342][SPARKR] bug fixed in collect method...

2017-01-24 Thread titicaca
Github user titicaca commented on a diff in the pull request:

https://github.com/apache/spark/pull/16689#discussion_r97714703
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -1138,6 +1138,9 @@ setMethod("collect",
   if (!is.null(PRIMITIVE_TYPES[[colType]]) && colType != 
"binary") {
 vec <- do.call(c, col)
 stopifnot(class(vec) != "list")
+# If vec is an vector with only NAs, the type is 
logical
--- End diff --

In local R, if we try
```
df <- data.frame(x = c(0,1,2), y = c(NA, NA, 1))
class(head(df, 1)$y)
```
The output is still numeric instead of logical. But the existed test is 
expecting NA logical instead of NA numeric.

So is it necessary to correct the existed tests, for example 
@test_sparkSQL.R#1280
from `expect_equal(collect(select(df, first(df$age)))[[1]], NA)` to
`expect_equal(collect(select(df, first(df$age)))[[1]], NA_real_)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16689: [SPARK-19342][SPARKR] bug fixed in collect method...

2017-01-24 Thread titicaca
Github user titicaca commented on a diff in the pull request:

https://github.com/apache/spark/pull/16689#discussion_r97712469
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -1138,6 +1138,9 @@ setMethod("collect",
   if (!is.null(PRIMITIVE_TYPES[[colType]]) && colType != 
"binary") {
 vec <- do.call(c, col)
 stopifnot(class(vec) != "list")
+# If vec is an vector with only NAs, the type is 
logical
--- End diff --

Yes. My first commit was trying to cast the column to its corresponding R 
data type explicitly, even if it is an vector with all NAs. However some 
existed tests were failed and expecting to get logical NA. For example
```
3. Failure: column functions (@test_sparkSQL.R#1280) 
---
collect(select(df, first(df$age)))[[1]] not equal to NA.
Types not compatible: double vs logical
4. Failure: column functions (@test_sparkSQL.R#1282) 
---
collect(select(df, first("age")))[[1]] not equal to NA.
Types not compatible: double vs logical
``` 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16689: [SPARK-19342][SPARKR] bug fixed in collect method for co...

2017-01-24 Thread titicaca
Github user titicaca commented on the issue:

https://github.com/apache/spark/pull/16689
  
Sorry for the late reply. I figured out that the tests failed because if a 
vector is with only NAs, the type is logical, therefore we cannot cast the type 
in that case. I have updated the codes and added some tests for that. Thank you 
for the advice. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16689: [SPARK-19342][SPARKR] bug fixed in collect method for co...

2017-01-24 Thread titicaca
Github user titicaca commented on the issue:

https://github.com/apache/spark/pull/16689
  
Sure. Shall I add the tests in pkg/inst/tests/testthat/test_sparkSQL.R?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16689: SPARK-19342 bug fixed in collect method for colle...

2017-01-23 Thread titicaca
GitHub user titicaca opened a pull request:

https://github.com/apache/spark/pull/16689

SPARK-19342 bug fixed in collect method for collecting timestamp column

## What changes were proposed in this pull request?

Fix a bug in collect method for collecting timestamp column, the bug can be 
reproduced as shown in the following codes and outputs:

```
library(SparkR)
sparkR.session(master = "local")
df <- data.frame(col1 = c(0, 1, 2), 
 col2 = c(as.POSIXct("2017-01-01 00:00:01"), NA, 
as.POSIXct("2017-01-01 12:00:01")))

sdf1 <- createDataFrame(df)
print(dtypes(sdf1))
df1 <- collect(sdf1)
print(lapply(df1, class))

sdf2 <- filter(sdf1, "col1 > 0")
print(dtypes(sdf2))
df2 <- collect(sdf2)
print(lapply(df2, class))
```

As we can see from the printed output, the column type of col2 in df2 is 
converted to numeric unexpectedly, when NA exists at the top of the column. 

This is caused by method `do.call(c, list)`, if we convert a list, i.e. 
`do.call(c, list(NA, as.POSIXct("2017-01-01 12:00:01"))`, the class of the 
result is numeric instead of POSIXct. 

Therefore, we need to cast the data type of the vector explicitly. 



## How was this patch tested?

The patch can be tested manually with the same code above.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/titicaca/spark sparkr-dev

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16689.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16689

----
commit a51c2eb54ca672ad63495d0709bd3ae7b254bd14
Author: titicaca <fangzhou.y...@hotmail.com>
Date:   2017-01-24T06:24:47Z

SPARK-19342 bug fixed in collect method for collecting timestamp column




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org