Re: RDD to DataFrame question with JsValue in the mix
On 7/1/2016 6:42 AM, Akhil Das wrote: case class Holder(str: String, js:JsValue) Hello, Thanks! I tried that before posting the question to the list but I keep getting an error such as this even after the map() operation to convert (String,JsValue) -> Holder and then toDF(). I am simply invoking the following: val rddDF:DataFrame = rdd.map(x => Holder(x._1,x._2)).toDF rddDF.registerTempTable("rddf") rddDF.schema.mkString(",") And getting the following: [2016-07-01 11:57:02,720] WARN .jobserver.JobManagerActor [] [akka://JobServer/user/context-supervisor/test] - Exception from job d4c9d145-92bf-4c64-8904-91c917bd61d3: java.lang.UnsupportedOperationException: Schema for type play.api.libs.json.JsValue is not supported at org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:718) at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:30) at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:693) at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:691) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.immutable.List.foreach(List.scala:318) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:691) at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:30) at org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:630) at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:30) at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:414) at org.apache.spark.sql.SQLImplicits.rddToDataFrameHolder(SQLImplicits.scala:94) - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: RDD to DataFrame question with JsValue in the mix
Something like this? import sqlContext.implicits._ case class Holder(str: String, js:JsValue) yourRDD.map(x => Holder(x._1, x._2)).toDF() On Fri, Jul 1, 2016 at 3:36 AM, Dood@ODDOwrote: > Hello, > > I have an RDD[(String,JsValue)] that I want to convert into a DataFrame > and then run SQL on. What is the easiest way to get the JSON (in form of > JsValue) "understood" by the process? > > Thanks! > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > -- Cheers!
RDD to DataFrame question with JsValue in the mix
Hello, I have an RDD[(String,JsValue)] that I want to convert into a DataFrame and then run SQL on. What is the easiest way to get the JSON (in form of JsValue) "understood" by the process? Thanks! - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
DataFrame question
Hi All, I am working with dataframes and have been struggling with this thing, any pointers would be helpful. I've a Json file with the schema like this, links: array (nullable = true) ||-- element: struct (containsNull = true) |||-- desc: string (nullable = true) |||-- id: string (nullable = true) I want to fetch id and desc as an RDD like this RDD[(String,String)] i am using dataframes*df.select(links.desc,links.id http://links.id/).rdd* the above dataframe is returning an RDD like this RDD[(List(String),List(String)] So, links:[{one,1},{two,2},{three,3}] json should return and RDD[(one,1),(two,2),(three,3)] can anyone tell me how the dataframe select should be modified?
Re: DataFrame question
You probably want to explode the array to produce one row per element: df.select(explode(df(links)).alias(link)) On Tue, Jul 7, 2015 at 10:29 AM, Naveen Madhire vmadh...@umail.iu.edu wrote: Hi All, I am working with dataframes and have been struggling with this thing, any pointers would be helpful. I've a Json file with the schema like this, links: array (nullable = true) ||-- element: struct (containsNull = true) |||-- desc: string (nullable = true) |||-- id: string (nullable = true) I want to fetch id and desc as an RDD like this RDD[(String,String)] i am using dataframes*df.select(links.desc,links.id http://links.id/).rdd* the above dataframe is returning an RDD like this RDD[(List(String),List(String)] So, links:[{one,1},{two,2},{three,3}] json should return and RDD[(one,1),(two,2),(three,3)] can anyone tell me how the dataframe select should be modified?