Re: SQL query over (Long, JSON string) tuples
Eventually it would be nice for us to have some sort of function to do the conversion you are talking about on a single column, but for now I usually hack it as you suggested: val withId = origRDD.map { case (id, str) => s"""{"id":$id, ${str.trim.drop(1)}""" } val table = sqlContext.jsonRDD(withId) On Thu, Jan 29, 2015 at 1:29 AM, Tobias Pfeiffer wrote: > Hi Ayoub, > > thanks for your mail! > > On Thu, Jan 29, 2015 at 6:23 PM, Ayoub > wrote: >> >> SQLContext and hiveContext have a "jsonRDD" method which accept an >> RDD[String] where the string is a JSON String a returns a SchemaRDD, it >> extends RDD[Row] which the type you want. >> >> After words you should be able to do a join to keep your tuple. >> > > I'm afraid that's not so easy, because you can only join on a certain key, > and the key is exactly what I have to drop in order to infer the schema. > > Thanks > Tobias > >
Re: SQL query over (Long, JSON string) tuples
Hi Ayoub, thanks for your mail! On Thu, Jan 29, 2015 at 6:23 PM, Ayoub wrote: > > SQLContext and hiveContext have a "jsonRDD" method which accept an > RDD[String] where the string is a JSON String a returns a SchemaRDD, it > extends RDD[Row] which the type you want. > > After words you should be able to do a join to keep your tuple. > I'm afraid that's not so easy, because you can only join on a certain key, and the key is exactly what I have to drop in order to infer the schema. Thanks Tobias
Re: SQL query over (Long, JSON string) tuples
Hello, SQLContext and hiveContext have a "jsonRDD" method which accept an RDD[String] where the string is a JSON String a returns a SchemaRDD, it extends RDD[Row] which the type you want. After words you should be able to do a join to keep your tuple. Best, Ayoub. 2015-01-29 10:12 GMT+01:00 Tobias Pfeiffer : > Hi, > > I have data as RDD[(Long, String)], where the Long is a timestamp and the > String is a JSON-encoded string. I want to infer the schema of the JSON and > then do a SQL statement on the data (no aggregates, just column selection > and UDF application), but still have the timestamp associated with each row > of the result. I completely fail to see how that would be possible. Any > suggestions? > > I can't even see how I would get an RDD[(Long, Row)] so that I *might* be > able to add the timestamp to the row after schema inference. Is there *any* > way other than string-manipulating the JSON string and adding the timestamp > to it? > > Thanks > Tobias > -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Re-SQL-query-over-Long-JSON-string-tuples-tp21419.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
SQL query over (Long, JSON string) tuples
Hi, I have data as RDD[(Long, String)], where the Long is a timestamp and the String is a JSON-encoded string. I want to infer the schema of the JSON and then do a SQL statement on the data (no aggregates, just column selection and UDF application), but still have the timestamp associated with each row of the result. I completely fail to see how that would be possible. Any suggestions? I can't even see how I would get an RDD[(Long, Row)] so that I *might* be able to add the timestamp to the row after schema inference. Is there *any* way other than string-manipulating the JSON string and adding the timestamp to it? Thanks Tobias