[GitHub] spark pull request #17674: [SPARK-20375][R] R wrappers for array and map
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17674 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17674: [SPARK-20375][R] R wrappers for array and map
Github user zero323 commented on a diff in the pull request: https://github.com/apache/spark/pull/17674#discussion_r112326680 --- Diff: R/pkg/R/functions.R --- @@ -3652,3 +3652,56 @@ setMethod("posexplode", jc <- callJStatic("org.apache.spark.sql.functions", "posexplode", x@jc) column(jc) }) + +#' create_array +#' +#' Creates a new array column. The input columns must all have the same data type. +#' +#' @param x Column to compute on +#' @param ... other columns +#' +#' @family collection_funcs +#' @rdname create_array +#' @name create_array +#' @aliases create_array,Column-method +#' @export +#' @examples \dontrun{create_array(df$x, df$y, df$z)} +#' @note create_array since 2.3.0 +setMethod("create_array", + signature(x = "Column"), + function(x, ...) { +jcols <- lapply(list(x, ...), function (x) { + stopifnot(class(x) == "Column") + x@jc +}) +jc <- callJStatic("org.apache.spark.sql.functions", "array", jcols) +column(jc) + }) + +#' create_map +#' +#' Creates a new map column. The input columns must be grouped as key-value pairs, +#' e.g. (key1, value1, key2, value2, ...). +#' The key columns must all have the same data type, and can't be null. --- End diff -- My thoughts exactly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17674: [SPARK-20375][R] R wrappers for array and map
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17674#discussion_r112322369 --- Diff: R/pkg/R/functions.R --- @@ -3652,3 +3652,56 @@ setMethod("posexplode", jc <- callJStatic("org.apache.spark.sql.functions", "posexplode", x@jc) column(jc) }) + +#' create_array +#' +#' Creates a new array column. The input columns must all have the same data type. +#' +#' @param x Column to compute on +#' @param ... other columns +#' +#' @family collection_funcs +#' @rdname create_array +#' @name create_array +#' @aliases create_array,Column-method +#' @export +#' @examples \dontrun{create_array(df$x, df$y, df$z)} +#' @note create_array since 2.3.0 +setMethod("create_array", + signature(x = "Column"), + function(x, ...) { +jcols <- lapply(list(x, ...), function (x) { + stopifnot(class(x) == "Column") + x@jc +}) +jc <- callJStatic("org.apache.spark.sql.functions", "array", jcols) +column(jc) + }) + +#' create_map +#' +#' Creates a new map column. The input columns must be grouped as key-value pairs, +#' e.g. (key1, value1, key2, value2, ...). +#' The key columns must all have the same data type, and can't be null. --- End diff -- ok, let's open a JIRA on that separately.. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17674: [SPARK-20375][R] R wrappers for array and map
Github user zero323 commented on a diff in the pull request: https://github.com/apache/spark/pull/17674#discussion_r112288821 --- Diff: R/pkg/R/functions.R --- @@ -3652,3 +3652,56 @@ setMethod("posexplode", jc <- callJStatic("org.apache.spark.sql.functions", "posexplode", x@jc) column(jc) }) + +#' create_array +#' +#' Creates a new array column. The input columns must all have the same data type. +#' +#' @param x Column to compute on +#' @param ... other columns +#' +#' @family collection_funcs +#' @rdname create_array +#' @name create_array +#' @aliases create_array,Column-method +#' @export +#' @examples \dontrun{create_array(df$x, df$y, df$z)} +#' @note create_array since 2.3.0 +setMethod("create_array", + signature(x = "Column"), + function(x, ...) { +jcols <- lapply(list(x, ...), function (x) { + stopifnot(class(x) == "Column") + x@jc +}) +jc <- callJStatic("org.apache.spark.sql.functions", "array", jcols) +column(jc) + }) + +#' create_map +#' +#' Creates a new map column. The input columns must be grouped as key-value pairs, +#' e.g. (key1, value1, key2, value2, ...). +#' The key columns must all have the same data type, and can't be null. --- End diff -- No, it doesn't. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17674: [SPARK-20375][R] R wrappers for array and map
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17674#discussion_r112272220 --- Diff: R/pkg/R/functions.R --- @@ -3652,3 +3652,56 @@ setMethod("posexplode", jc <- callJStatic("org.apache.spark.sql.functions", "posexplode", x@jc) column(jc) }) + +#' create_array +#' +#' Creates a new array column. The input columns must all have the same data type. +#' +#' @param x Column to compute on +#' @param ... other columns +#' +#' @family collection_funcs +#' @rdname create_array +#' @name create_array +#' @aliases create_array,Column-method +#' @export +#' @examples \dontrun{create_array(df$x, df$y, df$z)} +#' @note create_array since 2.3.0 +setMethod("create_array", + signature(x = "Column"), + function(x, ...) { +jcols <- lapply(list(x, ...), function (x) { + stopifnot(class(x) == "Column") + x@jc +}) +jc <- callJStatic("org.apache.spark.sql.functions", "array", jcols) +column(jc) + }) + +#' create_map +#' +#' Creates a new map column. The input columns must be grouped as key-value pairs, +#' e.g. (key1, value1, key2, value2, ...). +#' The key columns must all have the same data type, and can't be null. --- End diff -- actually `but does it work if you add it to an existing dataframe instead of going via createDataFrame? there's some additional type inference going on in the 2nd route.` I mean like ``` a <- as.DataFrame(cars) a$foo <- lit(NaN) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17674: [SPARK-20375][R] R wrappers for array and map
Github user zero323 commented on a diff in the pull request: https://github.com/apache/spark/pull/17674#discussion_r112254287 --- Diff: R/pkg/R/functions.R --- @@ -3652,3 +3652,56 @@ setMethod("posexplode", jc <- callJStatic("org.apache.spark.sql.functions", "posexplode", x@jc) column(jc) }) + +#' create_array +#' +#' Creates a new array column. The input columns must all have the same data type. +#' +#' @param x Column to compute on +#' @param ... other columns +#' +#' @family collection_funcs +#' @rdname create_array +#' @name create_array +#' @aliases create_array,Column-method +#' @export +#' @examples \dontrun{create_array(df$x, df$y, df$z)} +#' @note create_array since 2.3.0 +setMethod("create_array", + signature(x = "Column"), + function(x, ...) { +jcols <- lapply(list(x, ...), function (x) { + stopifnot(class(x) == "Column") + x@jc +}) +jc <- callJStatic("org.apache.spark.sql.functions", "array", jcols) +column(jc) + }) + +#' create_map +#' +#' Creates a new map column. The input columns must be grouped as key-value pairs, +#' e.g. (key1, value1, key2, value2, ...). +#' The key columns must all have the same data type, and can't be null. --- End diff -- It doesn't work with `createDataFrame` either. For `lit` it should be a quick fix because we can call Java `lit` with `Float.NaN`. `createDataFrame` won't be that simple. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17674: [SPARK-20375][R] R wrappers for array and map
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17674#discussion_r112251853 --- Diff: R/pkg/R/functions.R --- @@ -3652,3 +3652,56 @@ setMethod("posexplode", jc <- callJStatic("org.apache.spark.sql.functions", "posexplode", x@jc) column(jc) }) + +#' create_array +#' +#' Creates a new array column. The input columns must all have the same data type. +#' +#' @param x Column to compute on +#' @param ... other columns +#' +#' @family collection_funcs +#' @rdname create_array +#' @name create_array +#' @aliases create_array,Column-method +#' @export +#' @examples \dontrun{create_array(df$x, df$y, df$z)} +#' @note create_array since 2.3.0 +setMethod("create_array", + signature(x = "Column"), + function(x, ...) { +jcols <- lapply(list(x, ...), function (x) { + stopifnot(class(x) == "Column") + x@jc +}) +jc <- callJStatic("org.apache.spark.sql.functions", "array", jcols) +column(jc) + }) + +#' create_map +#' +#' Creates a new map column. The input columns must be grouped as key-value pairs, +#' e.g. (key1, value1, key2, value2, ...). +#' The key columns must all have the same data type, and can't be null. --- End diff -- I wouldn't be surprised that we have some issues with `NaN`... but does it work if you add it to an existing dataframe instead of going via `createDataFrame`? there's some additional type inference going on in the 2nd route. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17674: [SPARK-20375][R] R wrappers for array and map
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17674#discussion_r112251255 --- Diff: R/pkg/R/functions.R --- @@ -3652,3 +3652,56 @@ setMethod("posexplode", jc <- callJStatic("org.apache.spark.sql.functions", "posexplode", x@jc) column(jc) }) + +#' create_array +#' +#' Creates a new array column. The input columns must all have the same data type. +#' +#' @param x Column to compute on +#' @param ... other columns +#' +#' @family collection_funcs --- End diff -- perhaps that what it maps to in R, I haven't checked closely. though I'd think it'd be better to be consistent with Scala so they could be more easily discoverable. also I think we should change the `@family` name into full text instead of the short form some_funcs - that shows up in the generated doc. I didn't get around making all those changes but might make sense in the 2.3 release. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17674: [SPARK-20375][R] R wrappers for array and map
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17674#discussion_r112250707 --- Diff: R/pkg/R/functions.R --- @@ -3652,3 +3652,56 @@ setMethod("posexplode", jc <- callJStatic("org.apache.spark.sql.functions", "posexplode", x@jc) column(jc) }) + +#' create_array +#' +#' Creates a new array column. The input columns must all have the same data type. +#' +#' @param x Column to compute on +#' @param ... other columns --- End diff -- I'd say, yes please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17674: [SPARK-20375][R] R wrappers for array and map
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17674#discussion_r112250519 --- Diff: R/pkg/R/generics.R --- @@ -942,6 +942,14 @@ setGeneric("countDistinct", function(x, ...) { standardGeneric("countDistinct") #' @export setGeneric("crc32", function(x) { standardGeneric("crc32") }) +#' @rdname create_array --- End diff -- actually you are right - I saw `## Column Methods ##` and thought that's the place but you are right, we already have them in both places. I'm fine with what you have --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17674: [SPARK-20375][R] R wrappers for array and map
Github user zero323 commented on a diff in the pull request: https://github.com/apache/spark/pull/17674#discussion_r112172948 --- Diff: R/pkg/R/generics.R --- @@ -942,6 +942,14 @@ setGeneric("countDistinct", function(x, ...) { standardGeneric("countDistinct") #' @export setGeneric("crc32", function(x) { standardGeneric("crc32") }) +#' @rdname create_array --- End diff -- It covers all `o.a.s.sql.functions` right now. I am not sure these two are different enough to be an exception (and what about `struct` which belongs to the same category). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17674: [SPARK-20375][R] R wrappers for array and map
Github user zero323 commented on a diff in the pull request: https://github.com/apache/spark/pull/17674#discussion_r112171802 --- Diff: R/pkg/R/functions.R --- @@ -3652,3 +3652,56 @@ setMethod("posexplode", jc <- callJStatic("org.apache.spark.sql.functions", "posexplode", x@jc) column(jc) }) + +#' create_array +#' +#' Creates a new array column. The input columns must all have the same data type. +#' +#' @param x Column to compute on +#' @param ... other columns +#' +#' @family collection_funcs +#' @rdname create_array +#' @name create_array +#' @aliases create_array,Column-method +#' @export +#' @examples \dontrun{create_array(df$x, df$y, df$z)} +#' @note create_array since 2.3.0 +setMethod("create_array", + signature(x = "Column"), + function(x, ...) { +jcols <- lapply(list(x, ...), function (x) { + stopifnot(class(x) == "Column") + x@jc +}) +jc <- callJStatic("org.apache.spark.sql.functions", "array", jcols) +column(jc) + }) + +#' create_map +#' +#' Creates a new map column. The input columns must be grouped as key-value pairs, +#' e.g. (key1, value1, key2, value2, ...). +#' The key columns must all have the same data type, and can't be null. --- End diff -- I think it is clear from the context that we mean SQL `NULL` and both `lit(NA)` and `lit(NULL)` create SQL `NULL` literal. But this reminds me of something else: ```R > lit(NaN) Column NULL > select(createDataFrame(data.frame(x=c(1))), lit(NaN)) SparkDataFrame[NULL:null] ``` doesn't look right. PySpark handles this correctly ```python >>> lit(float("Nan")) Column ``` with `DoubleType`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17674: [SPARK-20375][R] R wrappers for array and map
Github user zero323 commented on a diff in the pull request: https://github.com/apache/spark/pull/17674#discussion_r112169318 --- Diff: R/pkg/R/functions.R --- @@ -3652,3 +3652,56 @@ setMethod("posexplode", jc <- callJStatic("org.apache.spark.sql.functions", "posexplode", x@jc) column(jc) }) + +#' create_array +#' +#' Creates a new array column. The input columns must all have the same data type. +#' +#' @param x Column to compute on +#' @param ... other columns --- End diff -- Should we adjust this for `concat(_ws)`, `least`, `greatest` and `countDistinct`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17674: [SPARK-20375][R] R wrappers for array and map
Github user zero323 commented on a diff in the pull request: https://github.com/apache/spark/pull/17674#discussion_r112168948 --- Diff: R/pkg/R/functions.R --- @@ -3652,3 +3652,56 @@ setMethod("posexplode", jc <- callJStatic("org.apache.spark.sql.functions", "posexplode", x@jc) column(jc) }) + +#' create_array +#' +#' Creates a new array column. The input columns must all have the same data type. +#' +#' @param x Column to compute on +#' @param ... other columns +#' +#' @family collection_funcs --- End diff -- Do you mean `normal_funcs`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17674: [SPARK-20375][R] R wrappers for array and map
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17674#discussion_r112106253 --- Diff: R/pkg/R/functions.R --- @@ -3652,3 +3652,56 @@ setMethod("posexplode", jc <- callJStatic("org.apache.spark.sql.functions", "posexplode", x@jc) column(jc) }) + +#' create_array +#' +#' Creates a new array column. The input columns must all have the same data type. +#' +#' @param x Column to compute on +#' @param ... other columns +#' +#' @family collection_funcs +#' @rdname create_array +#' @name create_array +#' @aliases create_array,Column-method +#' @export +#' @examples \dontrun{create_array(df$x, df$y, df$z)} +#' @note create_array since 2.3.0 +setMethod("create_array", + signature(x = "Column"), + function(x, ...) { +jcols <- lapply(list(x, ...), function (x) { + stopifnot(class(x) == "Column") + x@jc +}) +jc <- callJStatic("org.apache.spark.sql.functions", "array", jcols) +column(jc) + }) + +#' create_map +#' +#' Creates a new map column. The input columns must be grouped as key-value pairs, +#' e.g. (key1, value1, key2, value2, ...). +#' The key columns must all have the same data type, and can't be null. +#' The value columns must all have the same data type. +#' +#' @param x Column to compute on +#' @param ... other columns +#' +#' @family collection_funcs --- End diff -- ditto `Non-aggregate functions` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17674: [SPARK-20375][R] R wrappers for array and map
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17674#discussion_r112106499 --- Diff: R/pkg/R/generics.R --- @@ -942,6 +942,14 @@ setGeneric("countDistinct", function(x, ...) { standardGeneric("countDistinct") #' @export setGeneric("crc32", function(x) { standardGeneric("crc32") }) +#' @rdname create_array --- End diff -- this is also `## Expression Function Methods ##` might not be the right place --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17674: [SPARK-20375][R] R wrappers for array and map
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17674#discussion_r112105680 --- Diff: R/pkg/R/functions.R --- @@ -3652,3 +3652,56 @@ setMethod("posexplode", jc <- callJStatic("org.apache.spark.sql.functions", "posexplode", x@jc) column(jc) }) + +#' create_array +#' +#' Creates a new array column. The input columns must all have the same data type. +#' +#' @param x Column to compute on +#' @param ... other columns --- End diff -- `@param ... additional Column(s).` is what we have other places --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17674: [SPARK-20375][R] R wrappers for array and map
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17674#discussion_r112106243 --- Diff: R/pkg/R/functions.R --- @@ -3652,3 +3652,56 @@ setMethod("posexplode", jc <- callJStatic("org.apache.spark.sql.functions", "posexplode", x@jc) column(jc) }) + +#' create_array +#' +#' Creates a new array column. The input columns must all have the same data type. +#' +#' @param x Column to compute on +#' @param ... other columns +#' +#' @family collection_funcs --- End diff -- this should be `Non-aggregate functions` as per Scala doc --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17674: [SPARK-20375][R] R wrappers for array and map
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17674#discussion_r112106399 --- Diff: R/pkg/R/functions.R --- @@ -3652,3 +3652,56 @@ setMethod("posexplode", jc <- callJStatic("org.apache.spark.sql.functions", "posexplode", x@jc) column(jc) }) + +#' create_array +#' +#' Creates a new array column. The input columns must all have the same data type. +#' +#' @param x Column to compute on +#' @param ... other columns +#' +#' @family collection_funcs +#' @rdname create_array +#' @name create_array +#' @aliases create_array,Column-method +#' @export +#' @examples \dontrun{create_array(df$x, df$y, df$z)} +#' @note create_array since 2.3.0 +setMethod("create_array", + signature(x = "Column"), + function(x, ...) { +jcols <- lapply(list(x, ...), function (x) { + stopifnot(class(x) == "Column") + x@jc +}) +jc <- callJStatic("org.apache.spark.sql.functions", "array", jcols) +column(jc) + }) + +#' create_map +#' +#' Creates a new map column. The input columns must be grouped as key-value pairs, +#' e.g. (key1, value1, key2, value2, ...). +#' The key columns must all have the same data type, and can't be null. --- End diff -- `null` in JVM is mapped to NA in R - we haven't documented that consistently, but would be good to start thinking about the better way to do that --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17674: [SPARK-20375][R] R wrappers for array and map
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17674#discussion_r112106422 --- Diff: R/pkg/R/functions.R --- @@ -3652,3 +3652,56 @@ setMethod("posexplode", jc <- callJStatic("org.apache.spark.sql.functions", "posexplode", x@jc) column(jc) }) + +#' create_array +#' +#' Creates a new array column. The input columns must all have the same data type. +#' +#' @param x Column to compute on +#' @param ... other columns +#' +#' @family collection_funcs +#' @rdname create_array +#' @name create_array +#' @aliases create_array,Column-method +#' @export +#' @examples \dontrun{create_array(df$x, df$y, df$z)} +#' @note create_array since 2.3.0 +setMethod("create_array", + signature(x = "Column"), + function(x, ...) { +jcols <- lapply(list(x, ...), function (x) { + stopifnot(class(x) == "Column") + x@jc +}) +jc <- callJStatic("org.apache.spark.sql.functions", "array", jcols) +column(jc) + }) + +#' create_map +#' +#' Creates a new map column. The input columns must be grouped as key-value pairs, +#' e.g. (key1, value1, key2, value2, ...). +#' The key columns must all have the same data type, and can't be null. +#' The value columns must all have the same data type. +#' +#' @param x Column to compute on +#' @param ... other columns --- End diff -- `@param ... additional Column(s).` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17674: [SPARK-20375][R] R wrappers for array and map
GitHub user zero323 opened a pull request: https://github.com/apache/spark/pull/17674 [SPARK-20375][R] R wrappers for array and map ## What changes were proposed in this pull request? Adds wrappers for `o.a.s.sql.functions.array` and `o.a.s.sql.functions.map` ## How was this patch tested? Unit tests, `check-cran.sh` You can merge this pull request into a Git repository by running: $ git pull https://github.com/zero323/spark SPARK-20375 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17674.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17674 commit 453a39d7d8fb53b5b7e8169308a67497dddfff75 Author: zero323 Date: 2017-04-18T18:30:02Z Add wrappers for array and map functions --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org