Also check this out https://github.com/databricks/reference-apps/blob/master/logs_analyzer/chapter1/scala/src/main/scala/com/databricks/apps/logs/chapter1/LogAnalyzerStreamingSQL.scala
From the data bricks reference app: https://github.com/databricks/reference-apps From: Ewan Leith Date: Tuesday, September 29, 2015 at 5:09 PM To: Daniel Haviv, user Subject: RE: Converting a DStream to schemaRDD Something like: dstream.foreachRDD { rdd => val df = sqlContext.read.json(rdd) df.select(…) } https://spark.apache.org/docs/latest/streaming-programming-guide.html#output-operations-on-dstreams Might be the place to start, it’ll convert each batch of dstream into an RDD then let you work it as if it were a standard RDD dataset. Ewan From: Daniel Haviv [mailto:daniel.ha...@veracity-group.com] Sent: 29 September 2015 15:03 To: user <user@spark.apache.org<mailto:user@spark.apache.org>> Subject: Converting a DStream to schemaRDD Hi, I have a DStream which is a stream of RDD[String]. How can I pass a DStream to sqlContext.jsonRDD and work with it as a DF ? Thank you. Daniel