Re: multiple windows from the same DStream ?

Tobias Pfeiffer Sun, 24 Aug 2014 19:12:07 -0700

Hi,

computations are triggered by an output operation. No output operation, no
computation. Therefore in your code example,

On Thu, Aug 21, 2014 at 11:58 PM, Josh J <joshjd...@gmail.com> wrote:
>
>         JavaPairReceiverInputDStream<String, String> messages =
>                 KafkaUtils.createStream(jssc, args[0], args[1], topicMap);
>
>         Duration windowLength = new Duration(30000);
>         Duration slideInterval = new Duration(30000);
>         JavaPairDStream<String,String> windowMessages1 =
> messages.window(windowLength,slideInterval);
>         JavaPairDStream<String,String> windowMessages2 =
> messages.window(windowLength,slideInterval);
>

nothing would actually happen. However, if you add output operations, you
can use the same window multiple times (in which case caching the data
might make sense). So if your windowLength and slideInterval are the same,
then there would be no point in having two of them, you could just say:

  windowMessages1.saveAsHadoopFiles(...)  // output operation 1
  windowMessages1.print()  // output operation 2
  windowMessages1.map(someOtherFancyOperation).print()  // output operation
3 after processing

By default, these output operations are processed one after another. There
is a undocumented parameter "spark.streaming.concurrentJobs" (cf. <
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-questions-td1494.html>)
that allows to run output operations in parallel. I haven't used it, though.

Tobias

Re: multiple windows from the same DStream ?

Reply via email to