[GitHub] spark pull request #22939: [SPARK-25446][R] Add schema_of_json() and schema_...

2018-11-29 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22939


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22939: [SPARK-25446][R] Add schema_of_json() and schema_...

2018-11-29 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22939#discussion_r237727671
  
--- Diff: R/pkg/R/functions.R ---
@@ -2230,6 +2237,32 @@ setMethod("from_json", signature(x = "Column", 
schema = "characterOrstructType")
 column(jc)
   })
 
+#' @details
+#' \code{schema_of_json}: Parses a JSON string and infers its schema in 
DDL format.
+#'
+#' @rdname column_collection_functions
+#' @aliases schema_of_json schema_of_json,characterOrColumn-method
+#' @examples
+#'
+#' \dontrun{
+#' json <- '{"name":"Bob"}'
+#' df <- sql("SELECT * FROM range(1)")
+#' head(select(df, schema_of_json(json)))}
+#' @note schema_of_json since 3.0.0
+setMethod("schema_of_json", signature(x = "characterOrColumn"),
+  function(x, ...) {
+if (class(x) == "character") {
+  col <- callJStatic("org.apache.spark.sql.functions", "lit", 
x)
+} else {
+  col <- x@jc
--- End diff --

Yea, I agree. It will throw an analysis exception in that case. I also 
sympathize the concerns here and somewhat we're unclear about this - so I just 
wanted to make it restricted for now.

I'm going to open another PR related with this as a followup (for 
https://github.com/apache/spark/pull/22939#issuecomment-435672008). While I am 
there, I will test when the user passes in a column that is not a literal 
string.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22939: [SPARK-25446][R] Add schema_of_json() and schema_...

2018-11-27 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22939#discussion_r236770223
  
--- Diff: R/pkg/R/functions.R ---
@@ -2230,6 +2237,32 @@ setMethod("from_json", signature(x = "Column", 
schema = "characterOrstructType")
 column(jc)
   })
 
+#' @details
+#' \code{schema_of_json}: Parses a JSON string and infers its schema in 
DDL format.
+#'
+#' @rdname column_collection_functions
+#' @aliases schema_of_json schema_of_json,characterOrColumn-method
+#' @examples
+#'
+#' \dontrun{
+#' json <- '{"name":"Bob"}'
+#' df <- sql("SELECT * FROM range(1)")
+#' head(select(df, schema_of_json(json)))}
+#' @note schema_of_json since 3.0.0
+setMethod("schema_of_json", signature(x = "characterOrColumn"),
+  function(x, ...) {
+if (class(x) == "character") {
+  col <- callJStatic("org.apache.spark.sql.functions", "lit", 
x)
+} else {
+  col <- x@jc
--- End diff --

maybe to think about the design of API in R and Scala and else where - what 
does it look like when the user passes in a column that is not a literal 
string? probably worthwhile to follow up separately.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22939: [SPARK-25446][R] Add schema_of_json() and schema_...

2018-11-11 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22939#discussion_r232540931
  
--- Diff: R/pkg/R/functions.R ---
@@ -2230,6 +2237,32 @@ setMethod("from_json", signature(x = "Column", 
schema = "characterOrstructType")
 column(jc)
   })
 
+#' @details
+#' \code{schema_of_json}: Parses a JSON string and infers its schema in 
DDL format.
+#'
+#' @rdname column_collection_functions
+#' @aliases schema_of_json schema_of_json,characterOrColumn-method
+#' @examples
+#'
+#' \dontrun{
+#' json <- '{"name":"Bob"}'
+#' df <- sql("SELECT * FROM range(1)")
+#' head(select(df, schema_of_json(json)))}
+#' @note schema_of_json since 3.0.0
+setMethod("schema_of_json", signature(x = "characterOrColumn"),
+  function(x, ...) {
+if (class(x) == "character") {
+  col <- callJStatic("org.apache.spark.sql.functions", "lit", 
x)
+} else {
+  col <- x@jc
--- End diff --

Hmm .. do you mind if we go ahead for this one and talk later within 3.0? I 
think we're going to deal with this problem within 3.0 if I am not mistaken. I 
need to make one followup after this anyway.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22939: [SPARK-25446][R] Add schema_of_json() and schema_...

2018-11-11 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22939#discussion_r232500194
  
--- Diff: R/pkg/R/functions.R ---
@@ -2230,6 +2237,32 @@ setMethod("from_json", signature(x = "Column", 
schema = "characterOrstructType")
 column(jc)
   })
 
+#' @details
+#' \code{schema_of_json}: Parses a JSON string and infers its schema in 
DDL format.
+#'
+#' @rdname column_collection_functions
+#' @aliases schema_of_json schema_of_json,characterOrColumn-method
+#' @examples
+#'
+#' \dontrun{
+#' json <- '{"name":"Bob"}'
+#' df <- sql("SELECT * FROM range(1)")
+#' head(select(df, schema_of_json(json)))}
+#' @note schema_of_json since 3.0.0
+setMethod("schema_of_json", signature(x = "characterOrColumn"),
+  function(x, ...) {
+if (class(x) == "character") {
+  col <- callJStatic("org.apache.spark.sql.functions", "lit", 
x)
+} else {
+  col <- x@jc
--- End diff --

just that I thought the shortcut syntax in scala is nicer looking then 
`lit("string")` in R


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22939: [SPARK-25446][R] Add schema_of_json() and schema_...

2018-11-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22939#discussion_r232208151
  
--- Diff: R/pkg/R/functions.R ---
@@ -2230,6 +2237,32 @@ setMethod("from_json", signature(x = "Column", 
schema = "characterOrstructType")
 column(jc)
   })
 
+#' @details
+#' \code{schema_of_json}: Parses a JSON string and infers its schema in 
DDL format.
+#'
+#' @rdname column_collection_functions
+#' @aliases schema_of_json schema_of_json,characterOrColumn-method
+#' @examples
+#'
+#' \dontrun{
+#' json <- '{"name":"Bob"}'
+#' df <- sql("SELECT * FROM range(1)")
+#' head(select(df, schema_of_json(json)))}
+#' @note schema_of_json since 3.0.0
+setMethod("schema_of_json", signature(x = "characterOrColumn"),
+  function(x, ...) {
+if (class(x) == "character") {
+  col <- callJStatic("org.apache.spark.sql.functions", "lit", 
x)
+} else {
+  col <- x@jc
--- End diff --

BTW, `lit` usage already works in many APIs although it looks a bit odd .. 
should be okay.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22939: [SPARK-25446][R] Add schema_of_json() and schema_...

2018-11-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22939#discussion_r232207412
  
--- Diff: R/pkg/R/functions.R ---
@@ -2230,6 +2237,32 @@ setMethod("from_json", signature(x = "Column", 
schema = "characterOrstructType")
 column(jc)
   })
 
+#' @details
+#' \code{schema_of_json}: Parses a JSON string and infers its schema in 
DDL format.
+#'
+#' @rdname column_collection_functions
+#' @aliases schema_of_json schema_of_json,characterOrColumn-method
+#' @examples
+#'
+#' \dontrun{
+#' json <- '{"name":"Bob"}'
+#' df <- sql("SELECT * FROM range(1)")
+#' head(select(df, schema_of_json(json)))}
+#' @note schema_of_json since 3.0.0
+setMethod("schema_of_json", signature(x = "characterOrColumn"),
+  function(x, ...) {
+if (class(x) == "character") {
+  col <- callJStatic("org.apache.spark.sql.functions", "lit", 
x)
+} else {
+  col <- x@jc
--- End diff --

It's just left as column for now .. in case we allow other more cases .. 
Yea, it's a bit odd that we should `schema_of_csv(lit("Amsterdam,2018")))`. 
Maybe later columns or other expressions might have to be supported - in that 
case we can just fix the documentation in R side ..


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22939: [SPARK-25446][R] Add schema_of_json() and schema_...

2018-11-08 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22939#discussion_r232166370
  
--- Diff: R/pkg/R/functions.R ---
@@ -2230,6 +2237,32 @@ setMethod("from_json", signature(x = "Column", 
schema = "characterOrstructType")
 column(jc)
   })
 
+#' @details
+#' \code{schema_of_json}: Parses a JSON string and infers its schema in 
DDL format.
+#'
+#' @rdname column_collection_functions
+#' @aliases schema_of_json schema_of_json,characterOrColumn-method
+#' @examples
+#'
+#' \dontrun{
+#' json <- '{"name":"Bob"}'
+#' df <- sql("SELECT * FROM range(1)")
+#' head(select(df, schema_of_json(json)))}
+#' @note schema_of_json since 3.0.0
+setMethod("schema_of_json", signature(x = "characterOrColumn"),
+  function(x, ...) {
+if (class(x) == "character") {
+  col <- callJStatic("org.apache.spark.sql.functions", "lit", 
x)
+} else {
+  col <- x@jc
--- End diff --

hm.. why not just support string then? it's kinda very odd usage in R 
`schema_of_csv(lit("Amsterdam,2018")))`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22939: [SPARK-25446][R] Add schema_of_json() and schema_...

2018-11-07 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22939#discussion_r231476884
  
--- Diff: R/pkg/R/functions.R ---
@@ -205,11 +205,18 @@ NULL
 #'  also supported for the schema.
 #'  \item \code{from_csv}: a DDL-formatted string
 #'  }
-#' @param ... additional argument(s). In \code{to_json}, \code{to_csv} and 
\code{from_json},
-#'this contains additional named properties to control how it 
is converted, accepts
-#'the same options as the JSON/CSV data source. Additionally 
\code{to_json} supports
-#'the "pretty" option which enables pretty JSON generation. In 
\code{arrays_zip},
-#'this contains additional Columns of arrays to be merged.
+#' @param ... additional argument(s).
+#'  \itemize{
+#'  \item \code{to_json}, \code{from_json} and 
\code{schema_of_json}: this contains
+#'  additional named properties to control how it is converted 
and accepts the
+#'  same options as the JSON data source.
+#'  \item \code{to_json}: it supports the "pretty" option which 
enables pretty
--- End diff --

OK. I added an example 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22939: [SPARK-25446][R] Add schema_of_json() and schema_...

2018-11-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22939#discussion_r231404180
  
--- Diff: R/pkg/R/functions.R ---
@@ -2230,6 +2237,32 @@ setMethod("from_json", signature(x = "Column", 
schema = "characterOrstructType")
 column(jc)
   })
 
+#' @details
+#' \code{schema_of_json}: Parses a JSON string and infers its schema in 
DDL format.
+#'
+#' @rdname column_collection_functions
+#' @aliases schema_of_json schema_of_json,characterOrColumn-method
+#' @examples
+#'
+#' \dontrun{
+#' json <- '{"name":"Bob"}'
+#' df <- sql("SELECT * FROM range(1)")
+#' head(select(df, schema_of_json(json)))}
+#' @note schema_of_json since 3.0.0
+setMethod("schema_of_json", signature(x = "characterOrColumn"),
+  function(x, ...) {
+if (class(x) == "character") {
+  col <- callJStatic("org.apache.spark.sql.functions", "lit", 
x)
+} else {
+  col <- x@jc
--- End diff --

Yup.. only literal works but columns don't work.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22939: [SPARK-25446][R] Add schema_of_json() and schema_...

2018-11-06 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22939#discussion_r231403096
  
--- Diff: R/pkg/R/functions.R ---
@@ -2230,6 +2237,32 @@ setMethod("from_json", signature(x = "Column", 
schema = "characterOrstructType")
 column(jc)
   })
 
+#' @details
+#' \code{schema_of_json}: Parses a JSON string and infers its schema in 
DDL format.
+#'
+#' @rdname column_collection_functions
+#' @aliases schema_of_json schema_of_json,characterOrColumn-method
+#' @examples
+#'
+#' \dontrun{
+#' json <- '{"name":"Bob"}'
+#' df <- sql("SELECT * FROM range(1)")
+#' head(select(df, schema_of_json(json)))}
+#' @note schema_of_json since 3.0.0
+setMethod("schema_of_json", signature(x = "characterOrColumn"),
+  function(x, ...) {
+if (class(x) == "character") {
+  col <- callJStatic("org.apache.spark.sql.functions", "lit", 
x)
+} else {
+  col <- x@jc
--- End diff --

you are saying this `select(df, schema_of_csv(df$schemaCol))` is not 
allowed?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22939: [SPARK-25446][R] Add schema_of_json() and schema_...

2018-11-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22939#discussion_r231029629
  
--- Diff: R/pkg/R/functions.R ---
@@ -2230,6 +2237,32 @@ setMethod("from_json", signature(x = "Column", 
schema = "characterOrstructType")
 column(jc)
   })
 
+#' @details
+#' \code{schema_of_json}: Parses a JSON string and infers its schema in 
DDL format.
+#'
+#' @rdname column_collection_functions
+#' @aliases schema_of_json schema_of_json,characterOrColumn-method
+#' @examples
+#'
+#' \dontrun{
+#' json <- '{"name":"Bob"}'
+#' df <- sql("SELECT * FROM range(1)")
+#' head(select(df, schema_of_json(json)))}
+#' @note schema_of_json since 3.0.0
+setMethod("schema_of_json", signature(x = "characterOrColumn"),
+  function(x, ...) {
+if (class(x) == "character") {
+  col <- callJStatic("org.apache.spark.sql.functions", "lit", 
x)
+} else {
+  col <- x@jc
--- End diff --

Yea .. that was discussed at https://github.com/apache/spark/pull/22775. 
The usecase of `schema_of_csv` or `schema_of_json` will usually be like .. 
copying and pasting one record from the actual data manually. That's disallowed 
for now conservatively.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22939: [SPARK-25446][R] Add schema_of_json() and schema_...

2018-11-05 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22939#discussion_r231025592
  
--- Diff: R/pkg/R/functions.R ---
@@ -205,11 +205,18 @@ NULL
 #'  also supported for the schema.
 #'  \item \code{from_csv}: a DDL-formatted string
 #'  }
-#' @param ... additional argument(s). In \code{to_json}, \code{to_csv} and 
\code{from_json},
-#'this contains additional named properties to control how it 
is converted, accepts
-#'the same options as the JSON/CSV data source. Additionally 
\code{to_json} supports
-#'the "pretty" option which enables pretty JSON generation. In 
\code{arrays_zip},
-#'this contains additional Columns of arrays to be merged.
+#' @param ... additional argument(s).
+#'  \itemize{
+#'  \item \code{to_json}, \code{from_json} and 
\code{schema_of_json}: this contains
+#'  additional named properties to control how it is converted 
and accepts the
+#'  same options as the JSON data source.
+#'  \item \code{to_json}: it supports the "pretty" option which 
enables pretty
--- End diff --

I know it's there before but I'd like to suggest to give an example - doc 
or code example below.
it's a bit different from python/scala I think


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22939: [SPARK-25446][R] Add schema_of_json() and schema_...

2018-11-05 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22939#discussion_r231025282
  
--- Diff: R/pkg/R/functions.R ---
@@ -2230,6 +2237,32 @@ setMethod("from_json", signature(x = "Column", 
schema = "characterOrstructType")
 column(jc)
   })
 
+#' @details
+#' \code{schema_of_json}: Parses a JSON string and infers its schema in 
DDL format.
+#'
+#' @rdname column_collection_functions
+#' @aliases schema_of_json schema_of_json,characterOrColumn-method
+#' @examples
+#'
+#' \dontrun{
+#' json <- '{"name":"Bob"}'
+#' df <- sql("SELECT * FROM range(1)")
+#' head(select(df, schema_of_json(json)))}
+#' @note schema_of_json since 3.0.0
+setMethod("schema_of_json", signature(x = "characterOrColumn"),
+  function(x, ...) {
+if (class(x) == "character") {
+  col <- callJStatic("org.apache.spark.sql.functions", "lit", 
x)
+} else {
+  col <- x@jc
--- End diff --

ok but one use could be 

`select(df, schema_of_csv(df$schemaCol))`

like an actual col not a literal string?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22939: [SPARK-25446][R] Add schema_of_json() and schema_...

2018-11-05 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22939#discussion_r230688803
  
--- Diff: R/pkg/R/functions.R ---
@@ -2230,6 +2237,32 @@ setMethod("from_json", signature(x = "Column", 
schema = "characterOrstructType")
 column(jc)
   })
 
+#' @details
+#' \code{schema_of_json}: Parses a JSON string and infers its schema in 
DDL format.
+#'
+#' @rdname column_collection_functions
+#' @aliases schema_of_json schema_of_json,characterOrColumn-method
+#' @examples
+#'
+#' \dontrun{
+#' json <- '{"name":"Bob"}'
+#' df <- sql("SELECT * FROM range(1)")
+#' head(select(df, schema_of_json(json)))}
+#' @note schema_of_json since 3.0.0
+setMethod("schema_of_json", signature(x = "characterOrColumn"),
+  function(x, ...) {
+if (class(x) == "character") {
+  col <- callJStatic("org.apache.spark.sql.functions", "lit", 
x)
+} else {
+  col <- x@jc
--- End diff --

That's actually related with Scala API. There are too many overridden 
versions of functions in `function.scala` so we're trying to reduce it. Column 
is preferred over other specific types because Column can cover other 
expression cases.. in Python or R, they can be easily supported so other types 
and column are all supported. To cut it short, for consistency with Scala API.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22939: [SPARK-25446][R] Add schema_of_json() and schema_...

2018-11-05 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22939#discussion_r230686967
  
--- Diff: R/pkg/R/functions.R ---
@@ -205,11 +205,18 @@ NULL
 #'  also supported for the schema.
 #'  \item \code{from_csv}: a DDL-formatted string
 #'  }
-#' @param ... additional argument(s). In \code{to_json}, \code{to_csv} and 
\code{from_json},
-#'this contains additional named properties to control how it 
is converted, accepts
-#'the same options as the JSON/CSV data source. Additionally 
\code{to_json} supports
-#'the "pretty" option which enables pretty JSON generation. In 
\code{arrays_zip},
-#'this contains additional Columns of arrays to be merged.
+#' @param ... additional argument(s).
+#'  \itemize{
+#'  \item \code{to_json}, \code{from_json} and 
\code{schema_of_json}: this contains
+#'  additional named properties to control how it is converted 
and accepts the
+#'  same options as the JSON data source.
+#'  \item \code{to_json}: it supports the "pretty" option which 
enables pretty
--- End diff --

Yup.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22939: [SPARK-25446][R] Add schema_of_json() and schema_...

2018-11-04 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22939#discussion_r230649513
  
--- Diff: R/pkg/R/functions.R ---
@@ -205,11 +205,18 @@ NULL
 #'  also supported for the schema.
 #'  \item \code{from_csv}: a DDL-formatted string
 #'  }
-#' @param ... additional argument(s). In \code{to_json}, \code{to_csv} and 
\code{from_json},
-#'this contains additional named properties to control how it 
is converted, accepts
-#'the same options as the JSON/CSV data source. Additionally 
\code{to_json} supports
-#'the "pretty" option which enables pretty JSON generation. In 
\code{arrays_zip},
-#'this contains additional Columns of arrays to be merged.
+#' @param ... additional argument(s).
+#'  \itemize{
+#'  \item \code{to_json}, \code{from_json} and 
\code{schema_of_json}: this contains
+#'  additional named properties to control how it is converted 
and accepts the
+#'  same options as the JSON data source.
+#'  \item \code{to_json}: it supports the "pretty" option which 
enables pretty
--- End diff --

actually, how does `pretty` work? is it `pretty = TRUE`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22939: [SPARK-25446][R] Add schema_of_json() and schema_...

2018-11-04 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22939#discussion_r230650176
  
--- Diff: R/pkg/R/functions.R ---
@@ -2230,6 +2237,32 @@ setMethod("from_json", signature(x = "Column", 
schema = "characterOrstructType")
 column(jc)
   })
 
+#' @details
+#' \code{schema_of_json}: Parses a JSON string and infers its schema in 
DDL format.
+#'
+#' @rdname column_collection_functions
+#' @aliases schema_of_json schema_of_json,characterOrColumn-method
+#' @examples
+#'
+#' \dontrun{
+#' json <- '{"name":"Bob"}'
+#' df <- sql("SELECT * FROM range(1)")
+#' head(select(df, schema_of_json(json)))}
+#' @note schema_of_json since 3.0.0
+setMethod("schema_of_json", signature(x = "characterOrColumn"),
+  function(x, ...) {
+if (class(x) == "character") {
+  col <- callJStatic("org.apache.spark.sql.functions", "lit", 
x)
+} else {
+  col <- x@jc
--- End diff --

what's the use when x is a Column?
`schema_of_csv(lit("Amsterdam,2018")))` seems a bit odd to me...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22939: [SPARK-25446][R] Add schema_of_json() and schema_...

2018-11-04 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22939#discussion_r230649120
  
--- Diff: R/pkg/R/functions.R ---
@@ -2230,6 +2237,32 @@ setMethod("from_json", signature(x = "Column", 
schema = "characterOrstructType")
 column(jc)
   })
 
+#' @details
+#' \code{schema_of_json}: Parses a JSON string and infers its schema in 
DDL format.
+#'
+#' @rdname column_collection_functions
+#' @aliases schema_of_json schema_of_json,characterOrColumn-method
+#' @examples
+#'
+#' \dontrun{
+#' json <- '{"name":"Bob"}'
--- End diff --

I think we should avoid mixing `'` and `"` in doc


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22939: [SPARK-25446][R] Add schema_of_json() and schema_...

2018-11-04 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22939#discussion_r230649693
  
--- Diff: R/pkg/R/functions.R ---
@@ -2260,6 +2293,32 @@ setMethod("from_csv", signature(x = "Column", schema 
= "characterOrColumn"),
 column(jc)
   })
 
+#' @details
+#' \code{schema_of_csv}: Parses a CSV string and infers its schema in DDL 
format.
+#'
+#' @rdname column_collection_functions
+#' @aliases schema_of_csv schema_of_csv,characterOrColumn-method
+#' @examples
+#'
+#' \dontrun{
+#' csv <- "'Amsterdam,2018'"
--- End diff --

I"m a bit confused `"'Amsterdam,2018'"` vs `"Amsterdam,2018"`
does the latter work?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22939: [SPARK-25446][R] Add schema_of_json() and schema_...

2018-11-04 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22939#discussion_r230589804
  
--- Diff: R/pkg/R/functions.R ---
@@ -202,14 +202,18 @@ NULL
 #'  \itemize{
 #'  \item \code{from_json}: a structType object to use as the 
schema to use
 #'  when parsing the JSON string. Since Spark 2.3, the 
DDL-formatted string is
-#'  also supported for the schema.
-#'  \item \code{from_csv}: a DDL-formatted string
+#'  also supported for the schema. Since Spark 3.0, 
\code{schema_of_json} or
+#'  a DDL-formatted string literal can also be accepted.
--- End diff --

hmm, `the DDL-formatted string` is different to `a DDL-formatted string 
literal`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22939: [SPARK-25446][R] Add schema_of_json() and schema_...

2018-11-04 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22939#discussion_r230586828
  
--- Diff: R/pkg/R/functions.R ---
@@ -202,14 +202,18 @@ NULL
 #'  \itemize{
 #'  \item \code{from_json}: a structType object to use as the 
schema to use
 #'  when parsing the JSON string. Since Spark 2.3, the 
DDL-formatted string is
-#'  also supported for the schema.
-#'  \item \code{from_csv}: a DDL-formatted string
+#'  also supported for the schema. Since Spark 3.0, 
\code{schema_of_json} or
+#'  a DDL-formatted string literal can also be accepted.
+#'  \item \code{from_csv}: a structType object, DDL-formatted 
string, \code{schema_of_csv}
+#'  or DDL-formatted string literal
 #'  }
-#' @param ... additional argument(s). In \code{to_json}, \code{to_csv} and 
\code{from_json},
-#'this contains additional named properties to control how it 
is converted, accepts
-#'the same options as the JSON/CSV data source. Additionally 
\code{to_json} supports
-#'the "pretty" option which enables pretty JSON generation. In 
\code{arrays_zip},
-#'this contains additional Columns of arrays to be merged.
+#' @param ... additional argument(s). In \code{to_json}, \code{from_json} 
and
--- End diff --

Yea I think so. Let me try.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22939: [SPARK-25446][R] Add schema_of_json() and schema_...

2018-11-04 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22939#discussion_r230584990
  
--- Diff: R/pkg/R/functions.R ---
@@ -202,14 +202,18 @@ NULL
 #'  \itemize{
 #'  \item \code{from_json}: a structType object to use as the 
schema to use
 #'  when parsing the JSON string. Since Spark 2.3, the 
DDL-formatted string is
-#'  also supported for the schema.
-#'  \item \code{from_csv}: a DDL-formatted string
+#'  also supported for the schema. Since Spark 3.0, 
\code{schema_of_json} or
+#'  a DDL-formatted string literal can also be accepted.
+#'  \item \code{from_csv}: a structType object, DDL-formatted 
string, \code{schema_of_csv}
+#'  or DDL-formatted string literal
 #'  }
-#' @param ... additional argument(s). In \code{to_json}, \code{to_csv} and 
\code{from_json},
-#'this contains additional named properties to control how it 
is converted, accepts
-#'the same options as the JSON/CSV data source. Additionally 
\code{to_json} supports
-#'the "pretty" option which enables pretty JSON generation. In 
\code{arrays_zip},
-#'this contains additional Columns of arrays to be merged.
+#' @param ... additional argument(s). In \code{to_json}, \code{from_json} 
and
--- End diff --

Would it be better to make a list of items here like for the `schema` 
parameter?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22939: [SPARK-25446][R] Add schema_of_json() and schema_...

2018-11-04 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/22939

[SPARK-25446][R] Add schema_of_json() and schema_of_csv() to R

## What changes were proposed in this pull request?

This PR proposes to expose `schema_of_json` and `schema_of_csv` at R side.

**`schema_of_json`**:

```r
> json <- '{"name":"Bob"}'
> df <- sql("SELECT * FROM range(1)")
> head(select(df, schema_of_json(json)))
  schema_of_json({"name":"Bob"})
1struct
```

**`schema_of_csv`**:

```r
> csv <- "Amsterdam,2018"
> df <- sql("SELECT * FROM range(1)")
> head(select(df, schema_of_csv(csv)))
  schema_of_csv(Amsterdam,2018)
1struct<_c0:string,_c1:int>
```

This is useful when it's used with [to|from]_[csv|json]:

```r
> df <- sql("SELECT named_struct('name', 'Bob') as people")
> df <- mutate(df, people_json = to_json(df$people))
> head(select(df, from_json(df$people_json, 
schema_of_json(head(df)$people_json
  from_json(people_json)
1Bob
```

```r
> df <- sql("SELECT named_struct('name', 'Bob') as people")
> df <- mutate(df, people_json = to_csv(df$people))
> head(select(df, from_csv(df$people_json, 
schema_of_csv(head(df)$people_json
  from_csv(people_json)
1   Bob
```

## How was this patch tested?

Manually tested, unit tests added, documentation manually built and 
verified.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark SPARK-25446

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22939.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22939


commit c4a78fc0b14876a857bdd9b2f8f094744dd76c04
Author: hyukjinkwon 
Date:   2018-11-04T08:46:20Z

Add schema_of_json() and schema_of_csv() to R




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org