Re: SQL query over (Long, JSON string) tuples

2015-01-29 Thread Ayoub
string-manipulating the JSON string and adding the timestamp to it? Thanks Tobias -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Re-SQL-query-over-Long-JSON-string-tuples-tp21419.html Sent from the Apache Spark User List mailing list archive

Re: SQL query over (Long, JSON string) tuples

2015-01-29 Thread Tobias Pfeiffer
Hi Ayoub, thanks for your mail! On Thu, Jan 29, 2015 at 6:23 PM, Ayoub benali.ayoub.i...@gmail.com wrote: SQLContext and hiveContext have a jsonRDD method which accept an RDD[String] where the string is a JSON String a returns a SchemaRDD, it extends RDD[Row] which the type you want. After

SQL query over (Long, JSON string) tuples

2015-01-29 Thread Tobias Pfeiffer
Hi, I have data as RDD[(Long, String)], where the Long is a timestamp and the String is a JSON-encoded string. I want to infer the schema of the JSON and then do a SQL statement on the data (no aggregates, just column selection and UDF application), but still have the timestamp associated with

Re: SQL query over (Long, JSON string) tuples

2015-01-29 Thread Michael Armbrust
Eventually it would be nice for us to have some sort of function to do the conversion you are talking about on a single column, but for now I usually hack it as you suggested: val withId = origRDD.map { case (id, str) = s{id:$id, ${str.trim.drop(1)} } val table = sqlContext.jsonRDD(withId) On