Re: RDD to DataFrame question with JsValue in the mix

2016-07-01 Thread Dood

On 7/1/2016 6:42 AM, Akhil Das wrote:

case class Holder(str: String, js:JsValue)


Hello,

Thanks!

I tried that before posting the question to the list but I keep getting 
an error such as this even after the map() operation to convert 
(String,JsValue) -> Holder and then toDF().


I am simply invoking the following:

val rddDF:DataFrame = rdd.map(x => Holder(x._1,x._2)).toDF
rddDF.registerTempTable("rddf")

rddDF.schema.mkString(",")


And getting the following:

[2016-07-01 11:57:02,720] WARN  .jobserver.JobManagerActor [] 
[akka://JobServer/user/context-supervisor/test] - Exception from job 
d4c9d145-92bf-4c64-8904-91c917bd61d3:
java.lang.UnsupportedOperationException: Schema for type 
play.api.libs.json.JsValue is not supported
at 
org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:718)
at 
org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:30)
at 
org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:693)
at 
org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:691)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)

at scala.collection.immutable.List.foreach(List.scala:318)
at 
scala.collection.TraversableLike$class.map(TraversableLike.scala:244)

at scala.collection.AbstractTraversable.map(Traversable.scala:105)
at 
org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:691)
at 
org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:30)
at 
org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:630)
at 
org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:30)
at 
org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:414)
at 
org.apache.spark.sql.SQLImplicits.rddToDataFrameHolder(SQLImplicits.scala:94)




-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: RDD to DataFrame question with JsValue in the mix

2016-07-01 Thread Akhil Das
Something like this?

import sqlContext.implicits._
case class Holder(str: String, js:JsValue)

yourRDD.map(x => Holder(x._1, x._2)).toDF()



On Fri, Jul 1, 2016 at 3:36 AM, Dood@ODDO  wrote:

> Hello,
>
> I have an RDD[(String,JsValue)] that I want to convert into a DataFrame
> and then run SQL on. What is the easiest way to get the JSON (in form of
> JsValue) "understood" by the process?
>
> Thanks!
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


-- 
Cheers!


RDD to DataFrame question with JsValue in the mix

2016-06-30 Thread Dood

Hello,

I have an RDD[(String,JsValue)] that I want to convert into a DataFrame 
and then run SQL on. What is the easiest way to get the JSON (in form of 
JsValue) "understood" by the process?


Thanks!

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



DataFrame question

2015-07-07 Thread Naveen Madhire
Hi All,

I am working with dataframes and have been struggling with this thing, any
pointers would be helpful.

I've a Json file with the schema like this,

links: array (nullable = true)
 ||-- element: struct (containsNull = true)
 |||-- desc: string (nullable = true)
 |||-- id: string (nullable = true)


I want to fetch id and desc as an RDD like this RDD[(String,String)]

i am using dataframes*df.select(links.desc,links.id
http://links.id/).rdd*

the above dataframe is returning an RDD like this
RDD[(List(String),List(String)]


So, links:[{one,1},{two,2},{three,3}] json should return and
RDD[(one,1),(two,2),(three,3)]

can anyone tell me how the dataframe select should be modified?


Re: DataFrame question

2015-07-07 Thread Michael Armbrust
You probably want to explode the array to produce one row per element:

df.select(explode(df(links)).alias(link))

On Tue, Jul 7, 2015 at 10:29 AM, Naveen Madhire vmadh...@umail.iu.edu
wrote:

 Hi All,

 I am working with dataframes and have been struggling with this thing, any
 pointers would be helpful.

 I've a Json file with the schema like this,

 links: array (nullable = true)
  ||-- element: struct (containsNull = true)
  |||-- desc: string (nullable = true)
  |||-- id: string (nullable = true)


 I want to fetch id and desc as an RDD like this RDD[(String,String)]

 i am using dataframes*df.select(links.desc,links.id
 http://links.id/).rdd*

 the above dataframe is returning an RDD like this
 RDD[(List(String),List(String)]


 So, links:[{one,1},{two,2},{three,3}] json should return and
 RDD[(one,1),(two,2),(three,3)]

 can anyone tell me how the dataframe select should be modified?