All the operations being done are using the dstream. I do read an RDD in memory which is collected and converted into a map and used for lookups as part of DStream operations. This RDD is loaded only once and converted into map that is then used on streamed data.
Do you mean non streaming jobs on RDD using raw kafka data? Log File attached: streaming.gz <http://apache-spark-user-list.1001560.n3.nabble.com/file/n11229/streaming.gz> -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Could-not-compute-split-block-not-found-tp11186p11229.html Sent from the Apache Spark User List mailing list archive at Nabble.com.