Hi, computations are triggered by an output operation. No output operation, no computation. Therefore in your code example,
On Thu, Aug 21, 2014 at 11:58 PM, Josh J <joshjd...@gmail.com> wrote: > > JavaPairReceiverInputDStream<String, String> messages = > KafkaUtils.createStream(jssc, args[0], args[1], topicMap); > > Duration windowLength = new Duration(30000); > Duration slideInterval = new Duration(30000); > JavaPairDStream<String,String> windowMessages1 = > messages.window(windowLength,slideInterval); > JavaPairDStream<String,String> windowMessages2 = > messages.window(windowLength,slideInterval); > nothing would actually happen. However, if you add output operations, you can use the same window multiple times (in which case caching the data might make sense). So if your windowLength and slideInterval are the same, then there would be no point in having two of them, you could just say: windowMessages1.saveAsHadoopFiles(...) // output operation 1 windowMessages1.print() // output operation 2 windowMessages1.map(someOtherFancyOperation).print() // output operation 3 after processing By default, these output operations are processed one after another. There is a undocumented parameter "spark.streaming.concurrentJobs" (cf. < http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-questions-td1494.html>) that allows to run output operations in parallel. I haven't used it, though. Tobias