[GitHub] spark pull request #17674: [SPARK-20375][R] R wrappers for array and map

2017-04-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17674


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17674: [SPARK-20375][R] R wrappers for array and map

2017-04-19 Thread zero323
Github user zero323 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17674#discussion_r112326680
  
--- Diff: R/pkg/R/functions.R ---
@@ -3652,3 +3652,56 @@ setMethod("posexplode",
 jc <- callJStatic("org.apache.spark.sql.functions", 
"posexplode", x@jc)
 column(jc)
   })
+
+#' create_array
+#'
+#' Creates a new array column. The input columns must all have the same 
data type.
+#'
+#' @param x Column to compute on
+#' @param ... other columns
+#'
+#' @family collection_funcs
+#' @rdname create_array
+#' @name create_array
+#' @aliases create_array,Column-method
+#' @export
+#' @examples \dontrun{create_array(df$x, df$y, df$z)}
+#' @note create_array since 2.3.0
+setMethod("create_array",
+  signature(x = "Column"),
+  function(x, ...) {
+jcols <- lapply(list(x, ...), function (x) {
+  stopifnot(class(x) == "Column")
+  x@jc
+})
+jc <- callJStatic("org.apache.spark.sql.functions", "array", 
jcols)
+column(jc)
+  })
+
+#' create_map
+#'
+#' Creates a new map column. The input columns must be grouped as 
key-value pairs,
+#' e.g. (key1, value1, key2, value2, ...).
+#' The key columns must all have the same data type, and can't be null.
--- End diff --

My thoughts exactly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17674: [SPARK-20375][R] R wrappers for array and map

2017-04-19 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17674#discussion_r112322369
  
--- Diff: R/pkg/R/functions.R ---
@@ -3652,3 +3652,56 @@ setMethod("posexplode",
 jc <- callJStatic("org.apache.spark.sql.functions", 
"posexplode", x@jc)
 column(jc)
   })
+
+#' create_array
+#'
+#' Creates a new array column. The input columns must all have the same 
data type.
+#'
+#' @param x Column to compute on
+#' @param ... other columns
+#'
+#' @family collection_funcs
+#' @rdname create_array
+#' @name create_array
+#' @aliases create_array,Column-method
+#' @export
+#' @examples \dontrun{create_array(df$x, df$y, df$z)}
+#' @note create_array since 2.3.0
+setMethod("create_array",
+  signature(x = "Column"),
+  function(x, ...) {
+jcols <- lapply(list(x, ...), function (x) {
+  stopifnot(class(x) == "Column")
+  x@jc
+})
+jc <- callJStatic("org.apache.spark.sql.functions", "array", 
jcols)
+column(jc)
+  })
+
+#' create_map
+#'
+#' Creates a new map column. The input columns must be grouped as 
key-value pairs,
+#' e.g. (key1, value1, key2, value2, ...).
+#' The key columns must all have the same data type, and can't be null.
--- End diff --

ok, let's open a JIRA on that separately..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17674: [SPARK-20375][R] R wrappers for array and map

2017-04-19 Thread zero323
Github user zero323 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17674#discussion_r112288821
  
--- Diff: R/pkg/R/functions.R ---
@@ -3652,3 +3652,56 @@ setMethod("posexplode",
 jc <- callJStatic("org.apache.spark.sql.functions", 
"posexplode", x@jc)
 column(jc)
   })
+
+#' create_array
+#'
+#' Creates a new array column. The input columns must all have the same 
data type.
+#'
+#' @param x Column to compute on
+#' @param ... other columns
+#'
+#' @family collection_funcs
+#' @rdname create_array
+#' @name create_array
+#' @aliases create_array,Column-method
+#' @export
+#' @examples \dontrun{create_array(df$x, df$y, df$z)}
+#' @note create_array since 2.3.0
+setMethod("create_array",
+  signature(x = "Column"),
+  function(x, ...) {
+jcols <- lapply(list(x, ...), function (x) {
+  stopifnot(class(x) == "Column")
+  x@jc
+})
+jc <- callJStatic("org.apache.spark.sql.functions", "array", 
jcols)
+column(jc)
+  })
+
+#' create_map
+#'
+#' Creates a new map column. The input columns must be grouped as 
key-value pairs,
+#' e.g. (key1, value1, key2, value2, ...).
+#' The key columns must all have the same data type, and can't be null.
--- End diff --

No, it doesn't.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17674: [SPARK-20375][R] R wrappers for array and map

2017-04-19 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17674#discussion_r112272220
  
--- Diff: R/pkg/R/functions.R ---
@@ -3652,3 +3652,56 @@ setMethod("posexplode",
 jc <- callJStatic("org.apache.spark.sql.functions", 
"posexplode", x@jc)
 column(jc)
   })
+
+#' create_array
+#'
+#' Creates a new array column. The input columns must all have the same 
data type.
+#'
+#' @param x Column to compute on
+#' @param ... other columns
+#'
+#' @family collection_funcs
+#' @rdname create_array
+#' @name create_array
+#' @aliases create_array,Column-method
+#' @export
+#' @examples \dontrun{create_array(df$x, df$y, df$z)}
+#' @note create_array since 2.3.0
+setMethod("create_array",
+  signature(x = "Column"),
+  function(x, ...) {
+jcols <- lapply(list(x, ...), function (x) {
+  stopifnot(class(x) == "Column")
+  x@jc
+})
+jc <- callJStatic("org.apache.spark.sql.functions", "array", 
jcols)
+column(jc)
+  })
+
+#' create_map
+#'
+#' Creates a new map column. The input columns must be grouped as 
key-value pairs,
+#' e.g. (key1, value1, key2, value2, ...).
+#' The key columns must all have the same data type, and can't be null.
--- End diff --

actually `but does it work if you add it to an existing dataframe instead 
of going via createDataFrame? there's some additional type inference going on 
in the 2nd route.`
I mean like
```
a <- as.DataFrame(cars)
a$foo <- lit(NaN)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17674: [SPARK-20375][R] R wrappers for array and map

2017-04-19 Thread zero323
Github user zero323 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17674#discussion_r112254287
  
--- Diff: R/pkg/R/functions.R ---
@@ -3652,3 +3652,56 @@ setMethod("posexplode",
 jc <- callJStatic("org.apache.spark.sql.functions", 
"posexplode", x@jc)
 column(jc)
   })
+
+#' create_array
+#'
+#' Creates a new array column. The input columns must all have the same 
data type.
+#'
+#' @param x Column to compute on
+#' @param ... other columns
+#'
+#' @family collection_funcs
+#' @rdname create_array
+#' @name create_array
+#' @aliases create_array,Column-method
+#' @export
+#' @examples \dontrun{create_array(df$x, df$y, df$z)}
+#' @note create_array since 2.3.0
+setMethod("create_array",
+  signature(x = "Column"),
+  function(x, ...) {
+jcols <- lapply(list(x, ...), function (x) {
+  stopifnot(class(x) == "Column")
+  x@jc
+})
+jc <- callJStatic("org.apache.spark.sql.functions", "array", 
jcols)
+column(jc)
+  })
+
+#' create_map
+#'
+#' Creates a new map column. The input columns must be grouped as 
key-value pairs,
+#' e.g. (key1, value1, key2, value2, ...).
+#' The key columns must all have the same data type, and can't be null.
--- End diff --

It doesn't work with `createDataFrame` either.

For `lit` it should be a quick fix because we can call Java `lit` with 
`Float.NaN`. `createDataFrame` won't be that simple.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17674: [SPARK-20375][R] R wrappers for array and map

2017-04-19 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17674#discussion_r112251853
  
--- Diff: R/pkg/R/functions.R ---
@@ -3652,3 +3652,56 @@ setMethod("posexplode",
 jc <- callJStatic("org.apache.spark.sql.functions", 
"posexplode", x@jc)
 column(jc)
   })
+
+#' create_array
+#'
+#' Creates a new array column. The input columns must all have the same 
data type.
+#'
+#' @param x Column to compute on
+#' @param ... other columns
+#'
+#' @family collection_funcs
+#' @rdname create_array
+#' @name create_array
+#' @aliases create_array,Column-method
+#' @export
+#' @examples \dontrun{create_array(df$x, df$y, df$z)}
+#' @note create_array since 2.3.0
+setMethod("create_array",
+  signature(x = "Column"),
+  function(x, ...) {
+jcols <- lapply(list(x, ...), function (x) {
+  stopifnot(class(x) == "Column")
+  x@jc
+})
+jc <- callJStatic("org.apache.spark.sql.functions", "array", 
jcols)
+column(jc)
+  })
+
+#' create_map
+#'
+#' Creates a new map column. The input columns must be grouped as 
key-value pairs,
+#' e.g. (key1, value1, key2, value2, ...).
+#' The key columns must all have the same data type, and can't be null.
--- End diff --

I wouldn't be surprised that we have some issues with `NaN`...
but does it work if you add it to an existing dataframe instead of going 
via `createDataFrame`? there's some additional type inference going on in the 
2nd route.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17674: [SPARK-20375][R] R wrappers for array and map

2017-04-19 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17674#discussion_r112251255
  
--- Diff: R/pkg/R/functions.R ---
@@ -3652,3 +3652,56 @@ setMethod("posexplode",
 jc <- callJStatic("org.apache.spark.sql.functions", 
"posexplode", x@jc)
 column(jc)
   })
+
+#' create_array
+#'
+#' Creates a new array column. The input columns must all have the same 
data type.
+#'
+#' @param x Column to compute on
+#' @param ... other columns
+#'
+#' @family collection_funcs
--- End diff --

perhaps that what it maps to in R, I haven't checked closely.
though I'd think it'd be better to be consistent with Scala so they could 
be more easily discoverable.

also I think we should change the `@family` name into full text instead of 
the short form some_funcs - that shows up in the generated doc. I didn't get 
around making all those changes but might make sense in the 2.3 release.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17674: [SPARK-20375][R] R wrappers for array and map

2017-04-19 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17674#discussion_r112250707
  
--- Diff: R/pkg/R/functions.R ---
@@ -3652,3 +3652,56 @@ setMethod("posexplode",
 jc <- callJStatic("org.apache.spark.sql.functions", 
"posexplode", x@jc)
 column(jc)
   })
+
+#' create_array
+#'
+#' Creates a new array column. The input columns must all have the same 
data type.
+#'
+#' @param x Column to compute on
+#' @param ... other columns
--- End diff --

I'd say, yes please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17674: [SPARK-20375][R] R wrappers for array and map

2017-04-19 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17674#discussion_r112250519
  
--- Diff: R/pkg/R/generics.R ---
@@ -942,6 +942,14 @@ setGeneric("countDistinct", function(x, ...) { 
standardGeneric("countDistinct")
 #' @export
 setGeneric("crc32", function(x) { standardGeneric("crc32") })
 
+#' @rdname create_array
--- End diff --

actually you are right - I saw `## Column Methods 
##` and thought that's the place but you are right, we 
already have them in both places.

I'm fine with what you have


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17674: [SPARK-20375][R] R wrappers for array and map

2017-04-19 Thread zero323
Github user zero323 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17674#discussion_r112172948
  
--- Diff: R/pkg/R/generics.R ---
@@ -942,6 +942,14 @@ setGeneric("countDistinct", function(x, ...) { 
standardGeneric("countDistinct")
 #' @export
 setGeneric("crc32", function(x) { standardGeneric("crc32") })
 
+#' @rdname create_array
--- End diff --

It covers all `o.a.s.sql.functions` right now. I am not sure these two are 
different enough to be an exception (and what about `struct` which belongs to 
the same category).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17674: [SPARK-20375][R] R wrappers for array and map

2017-04-19 Thread zero323
Github user zero323 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17674#discussion_r112171802
  
--- Diff: R/pkg/R/functions.R ---
@@ -3652,3 +3652,56 @@ setMethod("posexplode",
 jc <- callJStatic("org.apache.spark.sql.functions", 
"posexplode", x@jc)
 column(jc)
   })
+
+#' create_array
+#'
+#' Creates a new array column. The input columns must all have the same 
data type.
+#'
+#' @param x Column to compute on
+#' @param ... other columns
+#'
+#' @family collection_funcs
+#' @rdname create_array
+#' @name create_array
+#' @aliases create_array,Column-method
+#' @export
+#' @examples \dontrun{create_array(df$x, df$y, df$z)}
+#' @note create_array since 2.3.0
+setMethod("create_array",
+  signature(x = "Column"),
+  function(x, ...) {
+jcols <- lapply(list(x, ...), function (x) {
+  stopifnot(class(x) == "Column")
+  x@jc
+})
+jc <- callJStatic("org.apache.spark.sql.functions", "array", 
jcols)
+column(jc)
+  })
+
+#' create_map
+#'
+#' Creates a new map column. The input columns must be grouped as 
key-value pairs,
+#' e.g. (key1, value1, key2, value2, ...).
+#' The key columns must all have the same data type, and can't be null.
--- End diff --

I think it is clear from the context that we mean SQL `NULL` and both 
`lit(NA)` and `lit(NULL)` create SQL `NULL` literal. But this reminds me of 
something else:

```R
> lit(NaN)
Column NULL 

> select(createDataFrame(data.frame(x=c(1))), lit(NaN))
SparkDataFrame[NULL:null]

```

doesn't look right. PySpark handles this correctly 

```python
>>> lit(float("Nan"))
Column
```

with `DoubleType`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17674: [SPARK-20375][R] R wrappers for array and map

2017-04-19 Thread zero323
Github user zero323 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17674#discussion_r112169318
  
--- Diff: R/pkg/R/functions.R ---
@@ -3652,3 +3652,56 @@ setMethod("posexplode",
 jc <- callJStatic("org.apache.spark.sql.functions", 
"posexplode", x@jc)
 column(jc)
   })
+
+#' create_array
+#'
+#' Creates a new array column. The input columns must all have the same 
data type.
+#'
+#' @param x Column to compute on
+#' @param ... other columns
--- End diff --

Should we adjust this for `concat(_ws)`, `least`, `greatest` and 
`countDistinct`?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17674: [SPARK-20375][R] R wrappers for array and map

2017-04-19 Thread zero323
Github user zero323 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17674#discussion_r112168948
  
--- Diff: R/pkg/R/functions.R ---
@@ -3652,3 +3652,56 @@ setMethod("posexplode",
 jc <- callJStatic("org.apache.spark.sql.functions", 
"posexplode", x@jc)
 column(jc)
   })
+
+#' create_array
+#'
+#' Creates a new array column. The input columns must all have the same 
data type.
+#'
+#' @param x Column to compute on
+#' @param ... other columns
+#'
+#' @family collection_funcs
--- End diff --

Do you mean `normal_funcs`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17674: [SPARK-20375][R] R wrappers for array and map

2017-04-18 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17674#discussion_r112106253
  
--- Diff: R/pkg/R/functions.R ---
@@ -3652,3 +3652,56 @@ setMethod("posexplode",
 jc <- callJStatic("org.apache.spark.sql.functions", 
"posexplode", x@jc)
 column(jc)
   })
+
+#' create_array
+#'
+#' Creates a new array column. The input columns must all have the same 
data type.
+#'
+#' @param x Column to compute on
+#' @param ... other columns
+#'
+#' @family collection_funcs
+#' @rdname create_array
+#' @name create_array
+#' @aliases create_array,Column-method
+#' @export
+#' @examples \dontrun{create_array(df$x, df$y, df$z)}
+#' @note create_array since 2.3.0
+setMethod("create_array",
+  signature(x = "Column"),
+  function(x, ...) {
+jcols <- lapply(list(x, ...), function (x) {
+  stopifnot(class(x) == "Column")
+  x@jc
+})
+jc <- callJStatic("org.apache.spark.sql.functions", "array", 
jcols)
+column(jc)
+  })
+
+#' create_map
+#'
+#' Creates a new map column. The input columns must be grouped as 
key-value pairs,
+#' e.g. (key1, value1, key2, value2, ...).
+#' The key columns must all have the same data type, and can't be null.
+#' The value columns must all have the same data type.
+#'
+#' @param x Column to compute on
+#' @param ... other columns
+#'
+#' @family collection_funcs
--- End diff --

ditto `Non-aggregate functions`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17674: [SPARK-20375][R] R wrappers for array and map

2017-04-18 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17674#discussion_r112106499
  
--- Diff: R/pkg/R/generics.R ---
@@ -942,6 +942,14 @@ setGeneric("countDistinct", function(x, ...) { 
standardGeneric("countDistinct")
 #' @export
 setGeneric("crc32", function(x) { standardGeneric("crc32") })
 
+#' @rdname create_array
--- End diff --

this is also `## Expression Function Methods 
##` might not be the right place


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17674: [SPARK-20375][R] R wrappers for array and map

2017-04-18 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17674#discussion_r112105680
  
--- Diff: R/pkg/R/functions.R ---
@@ -3652,3 +3652,56 @@ setMethod("posexplode",
 jc <- callJStatic("org.apache.spark.sql.functions", 
"posexplode", x@jc)
 column(jc)
   })
+
+#' create_array
+#'
+#' Creates a new array column. The input columns must all have the same 
data type.
+#'
+#' @param x Column to compute on
+#' @param ... other columns
--- End diff --

`@param ... additional Column(s).` is what we have other places


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17674: [SPARK-20375][R] R wrappers for array and map

2017-04-18 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17674#discussion_r112106243
  
--- Diff: R/pkg/R/functions.R ---
@@ -3652,3 +3652,56 @@ setMethod("posexplode",
 jc <- callJStatic("org.apache.spark.sql.functions", 
"posexplode", x@jc)
 column(jc)
   })
+
+#' create_array
+#'
+#' Creates a new array column. The input columns must all have the same 
data type.
+#'
+#' @param x Column to compute on
+#' @param ... other columns
+#'
+#' @family collection_funcs
--- End diff --

this should be `Non-aggregate functions` as per Scala doc


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17674: [SPARK-20375][R] R wrappers for array and map

2017-04-18 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17674#discussion_r112106399
  
--- Diff: R/pkg/R/functions.R ---
@@ -3652,3 +3652,56 @@ setMethod("posexplode",
 jc <- callJStatic("org.apache.spark.sql.functions", 
"posexplode", x@jc)
 column(jc)
   })
+
+#' create_array
+#'
+#' Creates a new array column. The input columns must all have the same 
data type.
+#'
+#' @param x Column to compute on
+#' @param ... other columns
+#'
+#' @family collection_funcs
+#' @rdname create_array
+#' @name create_array
+#' @aliases create_array,Column-method
+#' @export
+#' @examples \dontrun{create_array(df$x, df$y, df$z)}
+#' @note create_array since 2.3.0
+setMethod("create_array",
+  signature(x = "Column"),
+  function(x, ...) {
+jcols <- lapply(list(x, ...), function (x) {
+  stopifnot(class(x) == "Column")
+  x@jc
+})
+jc <- callJStatic("org.apache.spark.sql.functions", "array", 
jcols)
+column(jc)
+  })
+
+#' create_map
+#'
+#' Creates a new map column. The input columns must be grouped as 
key-value pairs,
+#' e.g. (key1, value1, key2, value2, ...).
+#' The key columns must all have the same data type, and can't be null.
--- End diff --

`null` in JVM is mapped to NA in R - we haven't documented that 
consistently, but would be good to start thinking about the better way to do 
that


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17674: [SPARK-20375][R] R wrappers for array and map

2017-04-18 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17674#discussion_r112106422
  
--- Diff: R/pkg/R/functions.R ---
@@ -3652,3 +3652,56 @@ setMethod("posexplode",
 jc <- callJStatic("org.apache.spark.sql.functions", 
"posexplode", x@jc)
 column(jc)
   })
+
+#' create_array
+#'
+#' Creates a new array column. The input columns must all have the same 
data type.
+#'
+#' @param x Column to compute on
+#' @param ... other columns
+#'
+#' @family collection_funcs
+#' @rdname create_array
+#' @name create_array
+#' @aliases create_array,Column-method
+#' @export
+#' @examples \dontrun{create_array(df$x, df$y, df$z)}
+#' @note create_array since 2.3.0
+setMethod("create_array",
+  signature(x = "Column"),
+  function(x, ...) {
+jcols <- lapply(list(x, ...), function (x) {
+  stopifnot(class(x) == "Column")
+  x@jc
+})
+jc <- callJStatic("org.apache.spark.sql.functions", "array", 
jcols)
+column(jc)
+  })
+
+#' create_map
+#'
+#' Creates a new map column. The input columns must be grouped as 
key-value pairs,
+#' e.g. (key1, value1, key2, value2, ...).
+#' The key columns must all have the same data type, and can't be null.
+#' The value columns must all have the same data type.
+#'
+#' @param x Column to compute on
+#' @param ... other columns
--- End diff --

`@param ... additional Column(s).`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17674: [SPARK-20375][R] R wrappers for array and map

2017-04-18 Thread zero323
GitHub user zero323 opened a pull request:

https://github.com/apache/spark/pull/17674

[SPARK-20375][R] R wrappers for array and map

## What changes were proposed in this pull request?

Adds wrappers for `o.a.s.sql.functions.array` and `o.a.s.sql.functions.map`

## How was this patch tested?

Unit tests, `check-cran.sh`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zero323/spark SPARK-20375

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17674.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17674


commit 453a39d7d8fb53b5b7e8169308a67497dddfff75
Author: zero323 
Date:   2017-04-18T18:30:02Z

Add wrappers for array and map functions




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org