[GitHub] spark pull request #21835: [SPARK-24779]Add sequence / map_concat / map_from...

2018-08-15 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21835#discussion_r210489980
  
--- Diff: R/pkg/R/functions.R ---
@@ -3320,7 +3321,7 @@ setMethod("explode",
 #' @aliases sequence sequence,Column-method
 #' @note sequence since 2.4.0
 setMethod("sequence",
- signature(x = "Column", y = "Column"),
+ signature(),
--- End diff --

sorry, I didn't see the reply.
yes, we should try to make sequence callable. 

we shouldn't have to manually call it though and it is better to rely on R 
internal type/call routing. it's a bit hard to explain but check out

`attach` `setGeneric("attach")`
or
`str` `setGeneric("str")`

if you see what I mean.

also we should avoid `signature()` empty as well.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21835: [SPARK-24779]Add sequence / map_concat / map_from...

2018-08-15 Thread huaxingao
Github user huaxingao commented on a diff in the pull request:

https://github.com/apache/spark/pull/21835#discussion_r210326849
  
--- Diff: R/pkg/R/functions.R ---
@@ -3320,7 +3321,7 @@ setMethod("explode",
 #' @aliases sequence sequence,Column-method
 #' @note sequence since 2.4.0
 setMethod("sequence",
- signature(x = "Column", y = "Column"),
+ signature(),
--- End diff --

@felixcheung 
gentle ping


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21835: [SPARK-24779]Add sequence / map_concat / map_from...

2018-07-26 Thread huaxingao
Github user huaxingao commented on a diff in the pull request:

https://github.com/apache/spark/pull/21835#discussion_r205609369
  
--- Diff: R/pkg/R/functions.R ---
@@ -3320,7 +3321,7 @@ setMethod("explode",
 #' @aliases sequence sequence,Column-method
 #' @note sequence since 2.4.0
 setMethod("sequence",
- signature(x = "Column", y = "Column"),
+ signature(),
--- End diff --

I will need to make the ```sequence``` method  still callable without 
base:: prefix, right?
Is the following implementation OK?
```
setMethod("sequence",
  signature(),
  function(x, y = NULL, step = NULL) {
if (is.null(y) && is.null(step))
  base::sequence(x)
else {
  jc <- if (is.null(step)) {
callJStatic("org.apache.spark.sql.functions", "sequence", 
x@jc, y@jc)
  } else {
stopifnot(class(step) == "Column")
callJStatic("org.apache.spark.sql.functions", "sequence", 
x@jc, y@jc, step@jc)
  }
  column(jc)
}
  })
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21835: [SPARK-24779]Add sequence / map_concat / map_from...

2018-07-26 Thread huaxingao
Github user huaxingao commented on a diff in the pull request:

https://github.com/apache/spark/pull/21835#discussion_r205538890
  
--- Diff: R/pkg/R/functions.R ---
@@ -3320,7 +3321,7 @@ setMethod("explode",
 #' @aliases sequence sequence,Column-method
 #' @note sequence since 2.4.0
 setMethod("sequence",
- signature(x = "Column", y = "Column"),
+ signature(),
--- End diff --

@felixcheung 
It seems that if I only have ... in the generic function, 
```
setGeneric("sequence", function(...) { standardGeneric("sequence") })
```
I can't have anything is the signature. Otherwise I will have Error in 
matchSignature.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21835: [SPARK-24779]Add sequence / map_concat / map_from...

2018-07-26 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21835#discussion_r205355152
  
--- Diff: R/pkg/R/functions.R ---
@@ -3320,7 +3321,7 @@ setMethod("explode",
 #' @aliases sequence sequence,Column-method
 #' @note sequence since 2.4.0
 setMethod("sequence",
- signature(x = "Column", y = "Column"),
+ signature(),
--- End diff --

this doesn't seem quite right-  we should be able to keep `signature(x = 
"Column", y = "Column"),` here


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21835: [SPARK-24779]Add sequence / map_concat / map_from...

2018-07-24 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21835#discussion_r20541
  
--- Diff: R/pkg/tests/fulltests/test_context.R ---
@@ -21,10 +21,11 @@ test_that("Check masked functions", {
   # Check that we are not masking any new function from base, stats, 
testthat unexpectedly
   # NOTE: We should avoid adding entries to *namesOfMaskedCompletely* as 
masked functions make it
   # hard for users to use base R functions. Please check when in doubt.
-  namesOfMaskedCompletely <- c("cov", "filter", "sample", "not")
+  namesOfMaskedCompletely <- c("cov", "filter", "sample", "not", 
"sequence")
--- End diff --

thanks, this is a bit tricky because of the way tests work, could you do 
one check - install the package locally (`R CMD INSTALL --library=lib pkg/`) 
and then load the package from lib, in R, `library(SparkR, 
lib.loc='pathto/lib')`
then try calling base::sequence and SparkR::sequence

also, please add a test for base::sequence too - we have that for other 
functions


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21835: [SPARK-24779]Add sequence / map_concat / map_from...

2018-07-24 Thread huaxingao
Github user huaxingao commented on a diff in the pull request:

https://github.com/apache/spark/pull/21835#discussion_r204861059
  
--- Diff: R/pkg/tests/fulltests/test_context.R ---
@@ -21,10 +21,11 @@ test_that("Check masked functions", {
   # Check that we are not masking any new function from base, stats, 
testthat unexpectedly
   # NOTE: We should avoid adding entries to *namesOfMaskedCompletely* as 
masked functions make it
   # hard for users to use base R functions. Please check when in doubt.
-  namesOfMaskedCompletely <- c("cov", "filter", "sample", "not")
+  namesOfMaskedCompletely <- c("cov", "filter", "sample", "not", 
"sequence")
--- End diff --

Thanks @HyukjinKwon @felixcheung for your review. I will change this. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21835: [SPARK-24779]Add sequence / map_concat / map_from...

2018-07-23 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21835#discussion_r204633191
  
--- Diff: R/pkg/R/functions.R ---
@@ -1986,15 +1998,20 @@ setMethod("levenshtein", signature(y = "Column"),
 #' are on the same day of month, or both are the last day of month, time 
of day will be ignored.
 #' Otherwise, the difference is calculated based on 31 days per month, and 
rounded to 8 digits.
 #'
+#' @param roundOff an optional parameter to specify if the result is 
rounded off to 8 digits
 #' @rdname column_datetime_diff_functions
 #' @aliases months_between months_between,Column-method
 #' @note months_between since 1.5.0
 setMethod("months_between", signature(y = "Column"),
-  function(y, x) {
+  function(y, x, roundOff = NULL) {
--- End diff --

we are not doing that consistently, so I'm ok with if a check or 
`callJStatic("org.apache.spark.sql.functions", "months_between", y@jc, x, 
as.logical(roundOff))`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21835: [SPARK-24779]Add sequence / map_concat / map_from...

2018-07-23 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21835#discussion_r204633464
  
--- Diff: R/pkg/tests/fulltests/test_context.R ---
@@ -21,10 +21,11 @@ test_that("Check masked functions", {
   # Check that we are not masking any new function from base, stats, 
testthat unexpectedly
   # NOTE: We should avoid adding entries to *namesOfMaskedCompletely* as 
masked functions make it
   # hard for users to use base R functions. Please check when in doubt.
-  namesOfMaskedCompletely <- c("cov", "filter", "sample", "not")
+  namesOfMaskedCompletely <- c("cov", "filter", "sample", "not", 
"sequence")
--- End diff --

yes, we should definitely try to avoid adding here (as in the comment 
above) unless we absolutely have to.

in this case, base::sequence can be commonly used - we really need a way to 
avoid breaking it.
for example, could you try setting the generic as such:

`setGeneric("sequence", function(...) { standardGeneric("sequence") })`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21835: [SPARK-24779]Add sequence / map_concat / map_from...

2018-07-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21835#discussion_r204617441
  
--- Diff: R/pkg/tests/fulltests/test_context.R ---
@@ -21,10 +21,11 @@ test_that("Check masked functions", {
   # Check that we are not masking any new function from base, stats, 
testthat unexpectedly
   # NOTE: We should avoid adding entries to *namesOfMaskedCompletely* as 
masked functions make it
   # hard for users to use base R functions. Please check when in doubt.
-  namesOfMaskedCompletely <- c("cov", "filter", "sample", "not")
+  namesOfMaskedCompletely <- c("cov", "filter", "sample", "not", 
"sequence")
--- End diff --

@felixcheung, I remember we discourage to exclude this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21835: [SPARK-24779]Add sequence / map_concat / map_from...

2018-07-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21835#discussion_r204616561
  
--- Diff: R/pkg/R/functions.R ---
@@ -1986,15 +1998,20 @@ setMethod("levenshtein", signature(y = "Column"),
 #' are on the same day of month, or both are the last day of month, time 
of day will be ignored.
 #' Otherwise, the difference is calculated based on 31 days per month, and 
rounded to 8 digits.
 #'
+#' @param roundOff an optional parameter to specify if the result is 
rounded off to 8 digits
 #' @rdname column_datetime_diff_functions
 #' @aliases months_between months_between,Column-method
 #' @note months_between since 1.5.0
 setMethod("months_between", signature(y = "Column"),
-  function(y, x) {
+  function(y, x, roundOff = NULL) {
--- End diff --

Can we add a simple check if `roundOff` is a logical or not?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21835: [SPARK-24779]Add sequence / map_concat / map_from...

2018-07-21 Thread huaxingao
GitHub user huaxingao opened a pull request:

https://github.com/apache/spark/pull/21835

[SPARK-24779]Add sequence / map_concat / map_from_entries / an option in 
months_between UDF to disable rounding-off



## What changes were proposed in this pull request?

Add the R version of sequence / map_concat / map_from_entries / an option 
in months_between UDF to disable rounding-off

## How was this patch tested?

Add test in test_sparkSQL.R


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/huaxingao/spark spark-24779

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21835.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21835


commit faff20b70e3577d58976283ac7157717d8b34c8d
Author: Huaxin Gao 
Date:   2018-07-21T02:20:50Z

[SPARK-24779]Add sequence / map_concat / map_from_entries / an option in 
months_between UDF to disable rounding-off




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org