Re: How do I flatten JSON blobs into a Data Frame using Spark/Spark SQL

2016-12-05 Thread Hyukjin Kwon
Hi Kant, Ah, I thought you wanted to find the workaround to so it. Then wouldn't this be easily able to reach the same goal with the workaround without new such API? Thanks. On 6 Dec 2016 4:11 a.m., "kant kodali" wrote: > Hi Kwon, > > Thanks for this but Isn't this

Re: How do I flatten JSON blobs into a Data Frame using Spark/Spark SQL

2016-12-05 Thread kant kodali
Hi Kwon, Thanks for this but Isn't this what Michael suggested? Thanks, kant On Mon, Dec 5, 2016 at 4:45 AM, Hyukjin Kwon wrote: > Hi Kant, > > How about doing something like this? > > import org.apache.spark.sql.functions._ > > // val df2 =

Re: How do I flatten JSON blobs into a Data Frame using Spark/Spark SQL

2016-12-05 Thread Hyukjin Kwon
Hi Kant, How about doing something like this? import org.apache.spark.sql.functions._ // val df2 = df.select(df("body").cast(StringType).as("body")) val df2 = Seq("""{"a": 1}""").toDF("body") val schema = spark.read.json(df2.as[String].rdd).schema df2.select(from_json(col("body"),

Re: How do I flatten JSON blobs into a Data Frame using Spark/Spark SQL

2016-12-05 Thread kant kodali
Hi Michael, " Personally, I usually take a small sample of data and use schema inference on that. I then hardcode that schema into my program. This makes your spark jobs much faster and removes the possibility of the schema changing underneath the covers." This may or may not work for us. Not

Re: How do I flatten JSON blobs into a Data Frame using Spark/Spark SQL

2016-11-28 Thread Michael Armbrust
You could open up a JIRA to add a version of from_json that supports schema inference, but unfortunately that would not be super easy to implement. In particular, it would introduce a weird case where only this specific function would block for a long time while we infer the schema (instead of

Re: How do I flatten JSON blobs into a Data Frame using Spark/Spark SQL

2016-11-23 Thread kant kodali
Hi Michael, Looks like all from_json functions will require me to pass schema and that can be little tricky for us but the code below doesn't require me to pass schema at all. import org.apache.spark.sql._ val rdd = df2.rdd.map { case Row(j: String) => j } spark.read.json(rdd).show() On Tue,

Re: How do I flatten JSON blobs into a Data Frame using Spark/Spark SQL

2016-11-22 Thread Michael Armbrust
The first release candidate should be coming out this week. You can subscribe to the dev list if you want to follow the release schedule. On Mon, Nov 21, 2016 at 9:34 PM, kant kodali wrote: > Hi Michael, > > I only see spark 2.0.2 which is what I am using currently. Any idea

Re: How do I flatten JSON blobs into a Data Frame using Spark/Spark SQL

2016-11-21 Thread kant kodali
Hi Michael, I only see spark 2.0.2 which is what I am using currently. Any idea on when 2.1 will be released? Thanks, kant On Mon, Nov 21, 2016 at 5:12 PM, Michael Armbrust wrote: > In Spark 2.1 we've added a from_json >

Re: How do I flatten JSON blobs into a Data Frame using Spark/Spark SQL

2016-11-21 Thread Michael Armbrust
In Spark 2.1 we've added a from_json function that I think will do what you want. On Fri, Nov 18, 2016 at 2:29 AM, kant kodali wrote: > This seem to work > >

Re: How do I flatten JSON blobs into a Data Frame using Spark/Spark SQL

2016-11-18 Thread kant kodali
This seem to work import org.apache.spark.sql._ val rdd = df2.rdd.map { case Row(j: String) => j } spark.read.json(rdd).show() However I wonder if this any inefficiency here ? since I have to apply this function for billion rows.

How do I flatten JSON blobs into a Data Frame using Spark/Spark SQL

2016-11-17 Thread kant kodali
Hi All, I would like to flatten JSON blobs into a Data Frame using Spark/Spark SQl inside Spark-Shell. val df = spark.sql("select body from test limit 3"); // body is a json encoded blob column val df2 = df.select(df("body").cast(StringType).as("body")) when I do df2.show // shows the 3