Hi Michael, Looks like all from_json functions will require me to pass schema and that can be little tricky for us but the code below doesn't require me to pass schema at all.
import org.apache.spark.sql._ val rdd = df2.rdd.map { case Row(j: String) => j } spark.read.json(rdd).show() On Tue, Nov 22, 2016 at 2:42 PM, Michael Armbrust <mich...@databricks.com> wrote: > The first release candidate should be coming out this week. You can > subscribe to the dev list if you want to follow the release schedule. > > On Mon, Nov 21, 2016 at 9:34 PM, kant kodali <kanth...@gmail.com> wrote: > >> Hi Michael, >> >> I only see spark 2.0.2 which is what I am using currently. Any idea on >> when 2.1 will be released? >> >> Thanks, >> kant >> >> On Mon, Nov 21, 2016 at 5:12 PM, Michael Armbrust <mich...@databricks.com >> > wrote: >> >>> In Spark 2.1 we've added a from_json >>> <https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L2902> >>> function that I think will do what you want. >>> >>> On Fri, Nov 18, 2016 at 2:29 AM, kant kodali <kanth...@gmail.com> wrote: >>> >>>> This seem to work >>>> >>>> import org.apache.spark.sql._ >>>> val rdd = df2.rdd.map { case Row(j: String) => j } >>>> spark.read.json(rdd).show() >>>> >>>> However I wonder if this any inefficiency here ? since I have to apply >>>> this function for billion rows. >>>> >>>> >>> >> >