Hello, SQLContext and hiveContext have a "jsonRDD" method which accept an RDD[String] where the string is a JSON String a returns a SchemaRDD, it extends RDD[Row] which the type you want.
After words you should be able to do a join to keep your tuple. Best, Ayoub. 2015-01-29 10:12 GMT+01:00 Tobias Pfeiffer <t...@preferred.jp>: > Hi, > > I have data as RDD[(Long, String)], where the Long is a timestamp and the > String is a JSON-encoded string. I want to infer the schema of the JSON and > then do a SQL statement on the data (no aggregates, just column selection > and UDF application), but still have the timestamp associated with each row > of the result. I completely fail to see how that would be possible. Any > suggestions? > > I can't even see how I would get an RDD[(Long, Row)] so that I *might* be > able to add the timestamp to the row after schema inference. Is there *any* > way other than string-manipulating the JSON string and adding the timestamp > to it? > > Thanks > Tobias > -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Re-SQL-query-over-Long-JSON-string-tuples-tp21419.html Sent from the Apache Spark User List mailing list archive at Nabble.com.