[GitHub] spark pull request #13684: [SPARK-15908][R] Add varargs-type dropDuplicates(...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/13684 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13684: [SPARK-15908][R] Add varargs-type dropDuplicates(...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13684#discussion_r67412795 --- Diff: R/pkg/R/DataFrame.R --- @@ -1949,14 +1950,24 @@ setMethod("where", #' path <- "path/to/file.json" #' df <- read.json(path) #' dropDuplicates(df) +#' dropDuplicates(df, "col1", "col2") #' dropDuplicates(df, c("col1", "col2")) #' } setMethod("dropDuplicates", signature(x = "SparkDataFrame"), - function(x, colNames = columns(x)) { -stopifnot(class(colNames) == "character") - -sdf <- callJMethod(x@sdf, "dropDuplicates", as.list(colNames)) + function(x, ...) { +cols <- list(...) +if (length(cols) == 0) { + sdf <- callJMethod(x@sdf, "dropDuplicates", as.list(columns(x))) +} else { + if (!all(sapply(cols, function(c) { is.character(c) }))) { stop } --- End diff -- It's done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13684: [SPARK-15908][R] Add varargs-type dropDuplicates(...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13684#discussion_r67412381 --- Diff: R/pkg/R/DataFrame.R --- @@ -1949,14 +1950,24 @@ setMethod("where", #' path <- "path/to/file.json" #' df <- read.json(path) #' dropDuplicates(df) +#' dropDuplicates(df, "col1", "col2") #' dropDuplicates(df, c("col1", "col2")) #' } setMethod("dropDuplicates", signature(x = "SparkDataFrame"), - function(x, colNames = columns(x)) { -stopifnot(class(colNames) == "character") - -sdf <- callJMethod(x@sdf, "dropDuplicates", as.list(colNames)) + function(x, ...) { +cols <- list(...) +if (length(cols) == 0) { + sdf <- callJMethod(x@sdf, "dropDuplicates", as.list(columns(x))) +} else { + if (!all(sapply(cols, function(c) { is.character(c) }))) { stop } --- End diff -- Oh, sure! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13684: [SPARK-15908][R] Add varargs-type dropDuplicates(...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/13684#discussion_r67412216 --- Diff: R/pkg/R/DataFrame.R --- @@ -1949,14 +1950,24 @@ setMethod("where", #' path <- "path/to/file.json" #' df <- read.json(path) #' dropDuplicates(df) +#' dropDuplicates(df, "col1", "col2") #' dropDuplicates(df, c("col1", "col2")) #' } setMethod("dropDuplicates", signature(x = "SparkDataFrame"), - function(x, colNames = columns(x)) { -stopifnot(class(colNames) == "character") - -sdf <- callJMethod(x@sdf, "dropDuplicates", as.list(colNames)) + function(x, ...) { +cols <- list(...) +if (length(cols) == 0) { + sdf <- callJMethod(x@sdf, "dropDuplicates", as.list(columns(x))) +} else { + if (!all(sapply(cols, function(c) { is.character(c) }))) { stop } --- End diff -- Can we print a error message in `stop` here ? Something like `stop("all columns names should be characters")` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13684: [SPARK-15908][R] Add varargs-type dropDuplicates(...
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/13684#discussion_r67287926 --- Diff: R/pkg/R/DataFrame.R --- @@ -1856,10 +1856,11 @@ setMethod("where", #' the subset of columns. #' #' @param x A SparkDataFrame. -#' @param colnames A character vector of column names. +#' @param ... A character vector of column names or string column names. +#' If the first argument contains a character vector then the following column names are ignored. --- End diff -- align this line to "A character ..." of the above line --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13684: [SPARK-15908][R] Add varargs-type dropDuplicates(...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13684#discussion_r67280096 --- Diff: R/pkg/R/DataFrame.R --- @@ -1869,14 +1871,22 @@ setMethod("where", #' path <- "path/to/file.json" #' df <- read.json(path) #' dropDuplicates(df) +#' dropDuplicates(df, "col1", "col2") #' dropDuplicates(df, c("col1", "col2")) #' } setMethod("dropDuplicates", signature(x = "SparkDataFrame"), - function(x, colNames = columns(x)) { -stopifnot(class(colNames) == "character") - -sdf <- callJMethod(x@sdf, "dropDuplicates", as.list(colNames)) + function(x, col, ...) { --- End diff -- Thank you for review, @sun-rui ! I'll improve like that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13684: [SPARK-15908][R] Add varargs-type dropDuplicates(...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/13684#discussion_r67234728 --- Diff: R/pkg/R/generics.R --- @@ -462,12 +462,9 @@ setGeneric("describe", function(x, col, ...) { standardGeneric("describe") }) #' @export setGeneric("drop", function(x, ...) { standardGeneric("drop") }) -#' @rdname dropduplicates +#' @rdname dropDuplicates #' @export -setGeneric("dropDuplicates", - function(x, colNames = columns(x)) { - standardGeneric("dropDuplicates") - }) +setGeneric("dropDuplicates", function(x, col, ...) { standardGeneric("dropDuplicates") }) --- End diff -- hmm - putting `col` in the generic means that it will always be required ? I'm not sure how the code path with no arguments will work then. I think we can leave the generic as `dropDuplicates(x, ...)` and then have a more specific implementation. (See `na.omit` for example) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13684: [SPARK-15908][R] Add varargs-type dropDuplicates(...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/13684#discussion_r67233674 --- Diff: R/pkg/R/DataFrame.R --- @@ -1869,14 +1871,23 @@ setMethod("where", #' path <- "path/to/file.json" #' df <- read.json(path) #' dropDuplicates(df) +#' dropDuplicates(df, "col1", "col2") #' dropDuplicates(df, c("col1", "col2")) #' } setMethod("dropDuplicates", signature(x = "SparkDataFrame"), - function(x, colNames = columns(x)) { -stopifnot(class(colNames) == "character") - -sdf <- callJMethod(x@sdf, "dropDuplicates", as.list(colNames)) + function(x, col, ...) { +if (nargs() == 1) { + sdf <- callJMethod(x@sdf, "dropDuplicates", as.list(columns(x))) +} +else { --- End diff -- style nit: the else goes on the previous line --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13684: [SPARK-15908][R] Add varargs-type dropDuplicates(...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13684#discussion_r67213090 --- Diff: R/pkg/R/DataFrame.R --- @@ -1869,6 +1869,7 @@ setMethod("where", #' path <- "path/to/file.json" #' df <- read.json(path) #' dropDuplicates(df) +#' dropDuplicates(df, "col1", "col2") #' dropDuplicates(df, c("col1", "col2")) #' } setMethod("dropDuplicates", --- End diff -- I'll try to merge them into one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13684: [SPARK-15908][R] Add varargs-type dropDuplicates(...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13684#discussion_r67210755 --- Diff: R/pkg/R/DataFrame.R --- @@ -1869,6 +1869,7 @@ setMethod("where", #' path <- "path/to/file.json" #' df <- read.json(path) #' dropDuplicates(df) +#' dropDuplicates(df, "col1", "col2") #' dropDuplicates(df, c("col1", "col2")) #' } setMethod("dropDuplicates", --- End diff -- Actually, I kept the existing `dropDuplicates` since it handles `dropDuplicates(df)` for all columns, too. Don't we still need two functions if we move the case "c(...)"? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13684: [SPARK-15908][R] Add varargs-type dropDuplicates(...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13684#discussion_r67209107 --- Diff: R/pkg/R/DataFrame.R --- @@ -1859,7 +1859,7 @@ setMethod("where", #' @param colnames A character vector of column names. --- End diff -- Yep. Right. I will add that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13684: [SPARK-15908][R] Add varargs-type dropDuplicates(...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/13684#discussion_r67208482 --- Diff: R/pkg/R/DataFrame.R --- @@ -1859,7 +1859,7 @@ setMethod("where", #' @param colnames A character vector of column names. --- End diff -- That sounds good. Would also be good to include `if col contains a character vector then additional column names are ignored` (since thats what the implementation is doing from what I see ?) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13684: [SPARK-15908][R] Add varargs-type dropDuplicates(...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/13684#discussion_r67208357 --- Diff: R/pkg/R/DataFrame.R --- @@ -1869,6 +1869,7 @@ setMethod("where", #' path <- "path/to/file.json" #' df <- read.json(path) #' dropDuplicates(df) +#' dropDuplicates(df, "col1", "col2") #' dropDuplicates(df, c("col1", "col2")) #' } setMethod("dropDuplicates", --- End diff -- Since we have unified the generics, can we also fold this method into the other one ? i.e. move the code here to the `length(col) > 1` case in the other function --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13684: [SPARK-15908][R] Add varargs-type dropDuplicates(...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13684#discussion_r67206540 --- Diff: R/pkg/R/DataFrame.R --- @@ -1859,7 +1859,7 @@ setMethod("where", #' @param colnames A character vector of column names. --- End diff -- Oh, thank you for review, @shivaram . Sure. I'll update the doc. Maybe, something like the following? ``` - #' @param colnames A character vector of column names. + #' @param col A character vector of column names or a string of column name + #' @param ... Additional column names ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13684: [SPARK-15908][R] Add varargs-type dropDuplicates(...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/13684#discussion_r67205381 --- Diff: R/pkg/R/DataFrame.R --- @@ -1859,7 +1859,7 @@ setMethod("where", #' @param colnames A character vector of column names. --- End diff -- can we update this to indicate that we accept the varargs style ? I think we can document `...` as a `@param` for this --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13684: [SPARK-15908][R] Add varargs-type dropDuplicates(...
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/13684 [SPARK-15908][R] Add varargs-type dropDuplicates() function in SparkR ## What changes were proposed in this pull request? This PR adds varargs-type `dropDuplicates` function to SparkR for API parity. Refer to https://issues.apache.org/jira/browse/SPARK-15807, too. ## How was this patch tested? Pass the Jenkins tests with new testcases. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-15908 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13684.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13684 commit f1d6355af9dc8e782680a1fc3fac07f8ca31b82b Author: Dongjoon HyunDate: 2016-06-15T10:08:28Z [SPARK-15908][R] Add varargs-type dropDuplicates() function in SparkR --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org