[GitHub] spark pull request #21472: [SPARK-24445][SQL] Schema in json format for from...

2018-06-15 Thread MaxGekk
Github user MaxGekk closed the pull request at:

https://github.com/apache/spark/pull/21472


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21472: [SPARK-24445][SQL] Schema in json format for from...

2018-06-13 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21472#discussion_r194992611
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -747,8 +748,13 @@ case class StructsToJson(
 
 object JsonExprUtils {
 
-  def validateSchemaLiteral(exp: Expression): StructType = exp match {
-case Literal(s, StringType) => 
CatalystSqlParser.parseTableSchema(s.toString)
+  def validateSchemaLiteral(exp: Expression): DataType = exp match {
+case Literal(s, StringType) =>
+  try {
+DataType.fromJson(s.toString)
--- End diff --

@maropu @HyukjinKwon Please, have a look at the PR: 
https://github.com/apache/spark/pull/21550


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21472: [SPARK-24445][SQL] Schema in json format for from...

2018-06-03 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21472#discussion_r192584018
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -747,8 +748,13 @@ case class StructsToJson(
 
 object JsonExprUtils {
 
-  def validateSchemaLiteral(exp: Expression): StructType = exp match {
-case Literal(s, StringType) => 
CatalystSqlParser.parseTableSchema(s.toString)
+  def validateSchemaLiteral(exp: Expression): DataType = exp match {
+case Literal(s, StringType) =>
+  try {
+DataType.fromJson(s.toString)
--- End diff --

If possible, I like @HyukjinKwon 's approach. I remember correctly we just 
keep json schema formats for back-compatibility. In future major releases, I 
think we possibly drop the support.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21472: [SPARK-24445][SQL] Schema in json format for from...

2018-06-03 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21472#discussion_r192583925
  
--- Diff: sql/core/src/test/resources/sql-tests/inputs/json-functions.sql 
---
@@ -25,6 +25,10 @@ select from_json('{"a":1}', 'a InvalidType');
 select from_json('{"a":1}', 'a INT', named_struct('mode', 'PERMISSIVE'));
 select from_json('{"a":1}', 'a INT', map('mode', 1));
 select from_json();
+-- from_json - schema in json format
+select from_json('{"a":1}', 
'{"type":"struct","fields":[{"name":"a","type":"integer", "nullable":true}]}');
+select from_json('{"a":1}', '{"type":"map", "keyType":"string", 
"valueType":"integer","valueContainsNull":false}');
+
--- End diff --

To make the output file changes smaller, can you add new tests in the end 
of file?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21472: [SPARK-24445][SQL] Schema in json format for from...

2018-06-01 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21472#discussion_r192380680
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -747,8 +748,13 @@ case class StructsToJson(
 
 object JsonExprUtils {
 
-  def validateSchemaLiteral(exp: Expression): StructType = exp match {
-case Literal(s, StringType) => 
CatalystSqlParser.parseTableSchema(s.toString)
+  def validateSchemaLiteral(exp: Expression): DataType = exp match {
+case Literal(s, StringType) =>
+  try {
+DataType.fromJson(s.toString)
--- End diff --

I mean, like what we do in Python side:


https://github.com/apache/spark/blob/4f1e38649ebc7710850b7c40e6fb355775e7bb7f/python/pyspark/sql/types.py#L808-L814





---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21472: [SPARK-24445][SQL] Schema in json format for from...

2018-06-01 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21472#discussion_r192379878
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -747,8 +748,13 @@ case class StructsToJson(
 
 object JsonExprUtils {
 
-  def validateSchemaLiteral(exp: Expression): StructType = exp match {
-case Literal(s, StringType) => 
CatalystSqlParser.parseTableSchema(s.toString)
+  def validateSchemaLiteral(exp: Expression): DataType = exp match {
+case Literal(s, StringType) =>
+  try {
+DataType.fromJson(s.toString)
--- End diff --

I mean, type string support by `CatalystSqlParser.parseDataType` (like 
array<...> or struct<...>) into `from_json`. This will support `struct` 
if I am not mistaken.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21472: [SPARK-24445][SQL] Schema in json format for from...

2018-06-01 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21472#discussion_r192358871
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -747,8 +748,13 @@ case class StructsToJson(
 
 object JsonExprUtils {
 
-  def validateSchemaLiteral(exp: Expression): StructType = exp match {
-case Literal(s, StringType) => 
CatalystSqlParser.parseTableSchema(s.toString)
+  def validateSchemaLiteral(exp: Expression): DataType = exp match {
+case Literal(s, StringType) =>
+  try {
+DataType.fromJson(s.toString)
--- End diff --

> Shall we add the support with type itself with 
CatalystSqlParser.parseDataType too?

I will do but it won't solve customer's problem fully.

> Also, are you able to use catalogString?

I just check that:
```scala
val schema = MapType(StringType, IntegerType).catalogString
val ds = spark.sql(
  s"""
|select from_json('{"a":1}', '$schema')
  """.stripMargin)
ds.show()
```
and got this one:

```
extraneous input '<' expecting {'SELECT', 'FROM', ...}(line 1, pos 3)

== SQL ==
map
---^^^
; line 2 pos 7
```

The same with ` val schema = new StructType().add("a", 
IntegerType).catalogString
`

```
== SQL ==
struct
--^^^
; line 2 pos 7
org.apache.spark.sql.AnalysisException
```

Am I doing something wrong?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21472: [SPARK-24445][SQL] Schema in json format for from...

2018-06-01 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21472#discussion_r192342580
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -747,8 +748,13 @@ case class StructsToJson(
 
 object JsonExprUtils {
 
-  def validateSchemaLiteral(exp: Expression): StructType = exp match {
-case Literal(s, StringType) => 
CatalystSqlParser.parseTableSchema(s.toString)
+  def validateSchemaLiteral(exp: Expression): DataType = exp match {
+case Literal(s, StringType) =>
+  try {
+DataType.fromJson(s.toString)
--- End diff --

> Schema in DDL supports only `StructType` as root types. It is not 
possible to specify `MapType` like in the test: 

Shall we add the support with type itself with 
`CatalystSqlParser.parseDataType` too? 
Also, are you able to use `catalogString`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21472: [SPARK-24445][SQL] Schema in json format for from...

2018-06-01 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21472#discussion_r192339647
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -747,8 +748,13 @@ case class StructsToJson(
 
 object JsonExprUtils {
 
-  def validateSchemaLiteral(exp: Expression): StructType = exp match {
-case Literal(s, StringType) => 
CatalystSqlParser.parseTableSchema(s.toString)
+  def validateSchemaLiteral(exp: Expression): DataType = exp match {
+case Literal(s, StringType) =>
+  try {
+DataType.fromJson(s.toString)
--- End diff --

> How do they get the metadata ...

Metadata is stored together with data in distributed fs and loaded by a 
standard facilities of language. 

> and how do they insert it into SQL?

SQL statements are formed programmatically as  strings, and loaded schemas 
are inserted in particular positions in the string ( you can think about it as 
quasiquotes in Scala). The formed sql statements are sent via JDBC to Spark.

> Is that the only way to do it?

Probably it is possible to convert schemas in JSON format to DDL format but:
- it requires much more effort and time than just modifying 5 lines 
proposed in the PR
- Schema in DDL supports only `StructType` as root types. It is not 
possible to specify `MapType` like in the test: 
https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala#L330-L345
 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21472: [SPARK-24445][SQL] Schema in json format for from...

2018-06-01 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21472#discussion_r192323643
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -747,8 +748,13 @@ case class StructsToJson(
 
 object JsonExprUtils {
 
-  def validateSchemaLiteral(exp: Expression): StructType = exp match {
-case Literal(s, StringType) => 
CatalystSqlParser.parseTableSchema(s.toString)
+  def validateSchemaLiteral(exp: Expression): DataType = exp match {
+case Literal(s, StringType) =>
+  try {
+DataType.fromJson(s.toString)
--- End diff --

Usually they should be consistent but we don't necessarily support the 
obsolete functionality newly and consistently. I'm not sure how common it is to 
write the JSON literal as a schema via SQL. How do they get the metadata and 
how do they insert it into SQL? Is that the only way to do it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21472: [SPARK-24445][SQL] Schema in json format for from...

2018-05-31 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21472#discussion_r192309953
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -747,8 +748,13 @@ case class StructsToJson(
 
 object JsonExprUtils {
 
-  def validateSchemaLiteral(exp: Expression): StructType = exp match {
-case Literal(s, StringType) => 
CatalystSqlParser.parseTableSchema(s.toString)
+  def validateSchemaLiteral(exp: Expression): DataType = exp match {
+case Literal(s, StringType) =>
+  try {
+DataType.fromJson(s.toString)
--- End diff --

I believe we should support JSON format because:
- Functionality of SQL and Scala (and other languages) DSL should be equal 
otherwise we push users to use Scala DSL because SQL has less features.
- The feature allows to save/restore schema in JSON format. Customer's use 
case is to have data in JSON format + meta info including schema in JSON format 
too. Schema in JSON format gives them more opportunities for processing in 
programatic way. 
- For now JSON format give us more flexibility and allows `MapType` (and 
`ArrayType`) as the root type for result of `from_json` 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21472: [SPARK-24445][SQL] Schema in json format for from...

2018-05-31 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21472#discussion_r192293089
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -747,8 +748,13 @@ case class StructsToJson(
 
 object JsonExprUtils {
 
-  def validateSchemaLiteral(exp: Expression): StructType = exp match {
-case Literal(s, StringType) => 
CatalystSqlParser.parseTableSchema(s.toString)
+  def validateSchemaLiteral(exp: Expression): DataType = exp match {
+case Literal(s, StringType) =>
+  try {
+DataType.fromJson(s.toString)
--- End diff --

I don't think we should support JSON format here. DDL formatted schema is 
preferred. JSON in functions.scala is supported for backword compatibility 
because SQL functions wasn't added first.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21472: [SPARK-24445][SQL] Schema in json format for from...

2018-05-31 Thread MaxGekk
GitHub user MaxGekk opened a pull request:

https://github.com/apache/spark/pull/21472

[SPARK-24445][SQL] Schema in json format for from_json in SQL

## What changes were proposed in this pull request?

In the PR, I propose to support schema in JSON format for the `from_json()` 
function in SQL as it has been already implemented in Scala DSL for example 
[there](https://github.com/apache/spark/blob/branch-2.3/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L3225-L3229):

```scala
val dataType = try {
  DataType.fromJson(schema)
} catch {
  case NonFatal(_) => StructType.fromDDL(schema)
}
```

The changes will allow to specify `MapType` in SQL that's impossible at the 
moment: 

```sql
select from_json('{"a":1}', '{"type":"map", "keyType":"string", 
"valueType":"integer","valueContainsNull":false}')
```

## How was this patch tested?

Added a couple cases to `json-functions.sql`


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/MaxGekk/spark-1 from_json-sql-schema

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21472.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21472


commit 778162074cce7046f5f48d02780c8bac251fe96b
Author: Maxim Gekk 
Date:   2018-05-31T18:49:23Z

SQL tests for schema in json format

commit 139ef7e98c27c5f24d7a25fec8789309657cb5d3
Author: Maxim Gekk 
Date:   2018-05-31T19:42:27Z

Support schema in json format




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org