[GitHub] spark pull request #15274: [SPARK-17699] Support for parsing JSON string col...

2016-09-29 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15274


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15274: [SPARK-17699] Support for parsing JSON string col...

2016-09-27 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/15274#discussion_r80837055
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -467,3 +469,26 @@ case class JsonTuple(children: Seq[Expression])
   }
 }
 
+/**
+ * Converts an json input string to a [[StructType]] with the specified 
schema.
+ */
+case class JsonToStruct(schema: StructType, options: Map[String, String], 
child: Expression)
--- End diff --

Ah, yes, it definitly should.  Let me update.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15274: [SPARK-17699] Support for parsing JSON string col...

2016-09-27 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/15274#discussion_r80836637
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -467,3 +469,26 @@ case class JsonTuple(children: Seq[Expression])
   }
 }
 
+/**
+ * Converts an json input string to a [[StructType]] with the specified 
schema.
+ */
+case class JsonToStruct(schema: StructType, options: Map[String, String], 
child: Expression)
--- End diff --

Should this override `ExpectsInputTypes`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15274: [SPARK-17699] Support for parsing JSON string col...

2016-09-27 Thread marmbrus
GitHub user marmbrus opened a pull request:

https://github.com/apache/spark/pull/15274

[SPARK-17699] Support for parsing JSON string columns

Spark SQL has great support for reading text files that contain JSON data.  
However, in many cases the JSON data is just one column amongst others.  This 
is particularly true when reading from sources such as Kafka.  This PR adds a 
new functions `from_json` that converts a string column into a nested 
`StructType` with a user specified schema.

Example usage:
```scala
val df = Seq("""{"a": 1}""").toDS()
val schema = new StructType().add("a", IntegerType)

df.select(from_json($"value", schema) as 'json) // => [json: ]
```

This PR adds support for java, scala and python.  I leveraged our existing 
JSON parsing support by moving it into catalyst (so that we could define 
expressions using it).  I left SQL out for now, because I'm not sure how users 
would specify a schema.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/marmbrus/spark jsonParser

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15274.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15274


commit 62f56a7e4529b35f58a229097b012bc984fd458f
Author: Michael Armbrust 
Date:   2016-09-28T02:49:22Z

[SPARK-17699] Support for parsing JSON string columns




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org