Arko, On Sat, Oct 4, 2014 at 1:40 AM, Arko Provo Mukherjee < arkoprovomukher...@gmail.com> wrote: > > Apologies if this is a stupid question but I am trying to understand > why this can or cannot be done. As far as I understand that streaming > algorithms need to be different from batch algorithms as the streaming > algorithms are generally incremental. Hence the question whether the > RDD transformations can be extended to streaming or not. >
I don't think that streaming algorithms are "generally incremental" in Spark Streaming. In fact, data is collected and every N seconds (minutes/...), the data collected during that interval is batch-processed as with normal batch operations. In fact, using data previously obtained from the stream (in previous intervals) is a bit more complicated than plain batch processing. If the graph you want to create only uses data from one interval/batch, that should be dead simple. You might want to have a look at https://spark.apache.org/docs/latest/streaming-programming-guide.html#discretized-streams-dstreams Tobias