[GitHub] [spark] MichaelChirico commented on a change in pull request #28386: [SPARK-31517][R] fix strategy for handling ... names in mutate

2020-04-28 Thread GitBox


MichaelChirico commented on a change in pull request #28386:
URL: https://github.com/apache/spark/pull/28386#discussion_r416376995



##
File path: R/pkg/R/DataFrame.R
##
@@ -2287,16 +2287,19 @@ setMethod("mutate",
 
 # For named arguments, use the names for arguments as the column 
names
 # For unnamed arguments, use the argument symbols as the column 
names
-args <- sapply(substitute(list(...))[-1], deparse)
 ns <- names(cols)
-if (!is.null(ns)) {
-  lapply(seq_along(args), function(i) {
-if (ns[[i]] != "") {
-  args[[i]] <<- ns[[i]]
-}
+if (is.null(ns)) ns <- rep('', length(cols))
+named_idx <- nzchar(ns)
+args <- character(length(ns))
+if (any(named_idx)) args[named_idx] <- ns[named_idx]
+if (!all(named_idx)) {
+  # SPARK-31517: deparse uses width.cutoff on wide input and the
+  #   output is length>1, so need to collapse it to scalar
+  colsub <- substitute(list(...))[-1L]
+  args[!named_idx] <- sapply(which(!named_idx), function(ii) {
+paste(trimws(deparse(colsub[[ii]])), collapse = ' ')

Review comment:
   Have added `trimws` as a backport





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MichaelChirico commented on a change in pull request #28386: [SPARK-31517][R] fix strategy for handling ... names in mutate

2020-04-28 Thread GitBox


MichaelChirico commented on a change in pull request #28386:
URL: https://github.com/apache/spark/pull/28386#discussion_r416376851



##
File path: R/pkg/R/DataFrame.R
##
@@ -3445,7 +3448,7 @@ setMethod("as.data.frame",
 #' @note attach since 1.6.0
 setMethod("attach",
   signature(what = "SparkDataFrame"),
-  function(what, pos = 2L, name = deparse(substitute(what), backtick = 
FALSE),
+  function(what, pos = 2L, name = deparse1(substitute(what), backtick 
= FALSE),

Review comment:
   this is now the signature of `base::attach` in R 4.0.0.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MichaelChirico commented on a change in pull request #28386: [SPARK-31517][R] fix strategy for handling ... names in mutate

2020-04-28 Thread GitBox


MichaelChirico commented on a change in pull request #28386:
URL: https://github.com/apache/spark/pull/28386#discussion_r416375375



##
File path: R/pkg/R/DataFrame.R
##
@@ -2287,16 +2287,19 @@ setMethod("mutate",
 
 # For named arguments, use the names for arguments as the column 
names
 # For unnamed arguments, use the argument symbols as the column 
names
-args <- sapply(substitute(list(...))[-1], deparse)
 ns <- names(cols)
-if (!is.null(ns)) {
-  lapply(seq_along(args), function(i) {
-if (ns[[i]] != "") {
-  args[[i]] <<- ns[[i]]
-}
+if (is.null(ns)) ns <- rep('', length(cols))
+named_idx <- nzchar(ns)
+args <- character(length(ns))
+if (any(named_idx)) args[named_idx] <- ns[named_idx]
+if (!all(named_idx)) {
+  # SPARK-31517: deparse uses width.cutoff on wide input and the
+  #   output is length>1, so need to collapse it to scalar
+  colsub <- substitute(list(...))[-1L]
+  args[!named_idx] <- sapply(which(!named_idx), function(ii) {
+paste(trimws(deparse(colsub[[ii]])), collapse = ' ')

Review comment:
   Just remembered `trimws` is R 3.2.0 & `SparkR` stated dependency is 3.1.0





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MichaelChirico commented on a change in pull request #28386: [SPARK-31517][R] fix strategy for handling ... names in mutate

2020-04-28 Thread GitBox


MichaelChirico commented on a change in pull request #28386:
URL: https://github.com/apache/spark/pull/28386#discussion_r416372474



##
File path: R/pkg/R/DataFrame.R
##
@@ -2287,16 +2287,19 @@ setMethod("mutate",
 
 # For named arguments, use the names for arguments as the column 
names
 # For unnamed arguments, use the argument symbols as the column 
names
-args <- sapply(substitute(list(...))[-1], deparse)

Review comment:
   R 4.0.0 adds `deparse1` that would have been more appropriate here:
   
   > `deparse1()` is a simple utility added in R 4.0.0 to ensure a string 
result (character vector of length one), typically used in name construction, 
as `deparse1(substitute(.))`.
   
   That function is just a wrapper so easy to backport:
   
   ```
   deparse1 = function (expr, collapse = " ", width.cutoff = 500L, ...) 
   paste(deparse(expr, width.cutoff, ...), collapse = collapse)
   ```
   
   (though personally I would still stick with `trimws`)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org