[GitHub] spark pull request #21835: [SPARK-24779]Add sequence / map_concat / map_from...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/21835#discussion_r210489980 --- Diff: R/pkg/R/functions.R --- @@ -3320,7 +3321,7 @@ setMethod("explode", #' @aliases sequence sequence,Column-method #' @note sequence since 2.4.0 setMethod("sequence", - signature(x = "Column", y = "Column"), + signature(), --- End diff -- sorry, I didn't see the reply. yes, we should try to make sequence callable. we shouldn't have to manually call it though and it is better to rely on R internal type/call routing. it's a bit hard to explain but check out `attach` `setGeneric("attach")` or `str` `setGeneric("str")` if you see what I mean. also we should avoid `signature()` empty as well. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21835: [SPARK-24779]Add sequence / map_concat / map_from...
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21835#discussion_r210326849 --- Diff: R/pkg/R/functions.R --- @@ -3320,7 +3321,7 @@ setMethod("explode", #' @aliases sequence sequence,Column-method #' @note sequence since 2.4.0 setMethod("sequence", - signature(x = "Column", y = "Column"), + signature(), --- End diff -- @felixcheung gentle ping --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21835: [SPARK-24779]Add sequence / map_concat / map_from...
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21835#discussion_r205609369 --- Diff: R/pkg/R/functions.R --- @@ -3320,7 +3321,7 @@ setMethod("explode", #' @aliases sequence sequence,Column-method #' @note sequence since 2.4.0 setMethod("sequence", - signature(x = "Column", y = "Column"), + signature(), --- End diff -- I will need to make the ```sequence``` method still callable without base:: prefix, right? Is the following implementation OK? ``` setMethod("sequence", signature(), function(x, y = NULL, step = NULL) { if (is.null(y) && is.null(step)) base::sequence(x) else { jc <- if (is.null(step)) { callJStatic("org.apache.spark.sql.functions", "sequence", x@jc, y@jc) } else { stopifnot(class(step) == "Column") callJStatic("org.apache.spark.sql.functions", "sequence", x@jc, y@jc, step@jc) } column(jc) } }) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21835: [SPARK-24779]Add sequence / map_concat / map_from...
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21835#discussion_r205538890 --- Diff: R/pkg/R/functions.R --- @@ -3320,7 +3321,7 @@ setMethod("explode", #' @aliases sequence sequence,Column-method #' @note sequence since 2.4.0 setMethod("sequence", - signature(x = "Column", y = "Column"), + signature(), --- End diff -- @felixcheung It seems that if I only have ... in the generic function, ``` setGeneric("sequence", function(...) { standardGeneric("sequence") }) ``` I can't have anything is the signature. Otherwise I will have Error in matchSignature. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21835: [SPARK-24779]Add sequence / map_concat / map_from...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/21835#discussion_r205355152 --- Diff: R/pkg/R/functions.R --- @@ -3320,7 +3321,7 @@ setMethod("explode", #' @aliases sequence sequence,Column-method #' @note sequence since 2.4.0 setMethod("sequence", - signature(x = "Column", y = "Column"), + signature(), --- End diff -- this doesn't seem quite right- we should be able to keep `signature(x = "Column", y = "Column"),` here --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21835: [SPARK-24779]Add sequence / map_concat / map_from...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/21835#discussion_r20541 --- Diff: R/pkg/tests/fulltests/test_context.R --- @@ -21,10 +21,11 @@ test_that("Check masked functions", { # Check that we are not masking any new function from base, stats, testthat unexpectedly # NOTE: We should avoid adding entries to *namesOfMaskedCompletely* as masked functions make it # hard for users to use base R functions. Please check when in doubt. - namesOfMaskedCompletely <- c("cov", "filter", "sample", "not") + namesOfMaskedCompletely <- c("cov", "filter", "sample", "not", "sequence") --- End diff -- thanks, this is a bit tricky because of the way tests work, could you do one check - install the package locally (`R CMD INSTALL --library=lib pkg/`) and then load the package from lib, in R, `library(SparkR, lib.loc='pathto/lib')` then try calling base::sequence and SparkR::sequence also, please add a test for base::sequence too - we have that for other functions --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21835: [SPARK-24779]Add sequence / map_concat / map_from...
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21835#discussion_r204861059 --- Diff: R/pkg/tests/fulltests/test_context.R --- @@ -21,10 +21,11 @@ test_that("Check masked functions", { # Check that we are not masking any new function from base, stats, testthat unexpectedly # NOTE: We should avoid adding entries to *namesOfMaskedCompletely* as masked functions make it # hard for users to use base R functions. Please check when in doubt. - namesOfMaskedCompletely <- c("cov", "filter", "sample", "not") + namesOfMaskedCompletely <- c("cov", "filter", "sample", "not", "sequence") --- End diff -- Thanks @HyukjinKwon @felixcheung for your review. I will change this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21835: [SPARK-24779]Add sequence / map_concat / map_from...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/21835#discussion_r204633191 --- Diff: R/pkg/R/functions.R --- @@ -1986,15 +1998,20 @@ setMethod("levenshtein", signature(y = "Column"), #' are on the same day of month, or both are the last day of month, time of day will be ignored. #' Otherwise, the difference is calculated based on 31 days per month, and rounded to 8 digits. #' +#' @param roundOff an optional parameter to specify if the result is rounded off to 8 digits #' @rdname column_datetime_diff_functions #' @aliases months_between months_between,Column-method #' @note months_between since 1.5.0 setMethod("months_between", signature(y = "Column"), - function(y, x) { + function(y, x, roundOff = NULL) { --- End diff -- we are not doing that consistently, so I'm ok with if a check or `callJStatic("org.apache.spark.sql.functions", "months_between", y@jc, x, as.logical(roundOff))` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21835: [SPARK-24779]Add sequence / map_concat / map_from...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/21835#discussion_r204633464 --- Diff: R/pkg/tests/fulltests/test_context.R --- @@ -21,10 +21,11 @@ test_that("Check masked functions", { # Check that we are not masking any new function from base, stats, testthat unexpectedly # NOTE: We should avoid adding entries to *namesOfMaskedCompletely* as masked functions make it # hard for users to use base R functions. Please check when in doubt. - namesOfMaskedCompletely <- c("cov", "filter", "sample", "not") + namesOfMaskedCompletely <- c("cov", "filter", "sample", "not", "sequence") --- End diff -- yes, we should definitely try to avoid adding here (as in the comment above) unless we absolutely have to. in this case, base::sequence can be commonly used - we really need a way to avoid breaking it. for example, could you try setting the generic as such: `setGeneric("sequence", function(...) { standardGeneric("sequence") })` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21835: [SPARK-24779]Add sequence / map_concat / map_from...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21835#discussion_r204617441 --- Diff: R/pkg/tests/fulltests/test_context.R --- @@ -21,10 +21,11 @@ test_that("Check masked functions", { # Check that we are not masking any new function from base, stats, testthat unexpectedly # NOTE: We should avoid adding entries to *namesOfMaskedCompletely* as masked functions make it # hard for users to use base R functions. Please check when in doubt. - namesOfMaskedCompletely <- c("cov", "filter", "sample", "not") + namesOfMaskedCompletely <- c("cov", "filter", "sample", "not", "sequence") --- End diff -- @felixcheung, I remember we discourage to exclude this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21835: [SPARK-24779]Add sequence / map_concat / map_from...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21835#discussion_r204616561 --- Diff: R/pkg/R/functions.R --- @@ -1986,15 +1998,20 @@ setMethod("levenshtein", signature(y = "Column"), #' are on the same day of month, or both are the last day of month, time of day will be ignored. #' Otherwise, the difference is calculated based on 31 days per month, and rounded to 8 digits. #' +#' @param roundOff an optional parameter to specify if the result is rounded off to 8 digits #' @rdname column_datetime_diff_functions #' @aliases months_between months_between,Column-method #' @note months_between since 1.5.0 setMethod("months_between", signature(y = "Column"), - function(y, x) { + function(y, x, roundOff = NULL) { --- End diff -- Can we add a simple check if `roundOff` is a logical or not? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21835: [SPARK-24779]Add sequence / map_concat / map_from...
GitHub user huaxingao opened a pull request: https://github.com/apache/spark/pull/21835 [SPARK-24779]Add sequence / map_concat / map_from_entries / an option in months_between UDF to disable rounding-off ## What changes were proposed in this pull request? Add the R version of sequence / map_concat / map_from_entries / an option in months_between UDF to disable rounding-off ## How was this patch tested? Add test in test_sparkSQL.R You can merge this pull request into a Git repository by running: $ git pull https://github.com/huaxingao/spark spark-24779 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21835.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21835 commit faff20b70e3577d58976283ac7157717d8b34c8d Author: Huaxin Gao Date: 2018-07-21T02:20:50Z [SPARK-24779]Add sequence / map_concat / map_from_entries / an option in months_between UDF to disable rounding-off --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org