depends on what you are using it for. Three parameters are important:
1. Batch interval 2. WindowsDuration 3. SlideDuration Batch interval is the basic interval at which the system with receive the data in batches. This is the interval set when creating a StreamingContext. For example, if you set the batch interval as 2 second, then any input DStream will generate RDDs of received data at 2 second intervals. A window operator is defined by two parameters - WindowDuration / WindowsLength - the length of the window SlideDuration / SlidingInterval - the interval at which the window will slide or move forward Generally speaking, the larger the batch window, the better the overall performance, but the streaming data output will be updated less frequently.....you will likely run into problems setting your batch window *< 0.5 sec,* and/or when the batch window < the amount of time it takes to run the task.... Beyond that, the window length and sliding interval need to be multiples of the batch window, but will depend entirely on your reporting requirements. Consider batch window = 10 secs window length = 300 seconds sliding interval = 60 seconds In this scenario, you will be creating an output every 60 seconds, aggregating data that you were collecting every 10 seconds from the source over a previous 300 seconds If you were trying to create continuously streaming output as fast as possible (for example for complex event processing, see below), then you would probably (almost always) be setting your sliding interval = batch window and then shrinking the batch window as short as possible. Example val sparkConf = new SparkConf(). setAppName("CEP_streaming"). set("spark.driver.allowMultipleContexts", "true"). set("spark.hadoop.validateOutputSpecs", "false") *val ssc = new StreamingContext(sparkConf, Seconds(2))* *// window length - The duration of the window below that must be multiple of batch interval n in = > StreamingContext(sparkConf, Seconds(n))val windowLength = 4// sliding interval - The interval at which the window operation is performed in other words data is collected within this "previous interval'val slidingInterval = 2 //* keep this the same as batch window for continuous streaming. You are aggregating data that you are collecting over the batch Window HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 23 May 2016 at 16:32, nsalian <nsal...@cloudera.com> wrote: > Thanks for the question. > What kind of data rate are you expecting to receive? > > > > > ----- > Neelesh S. Salian > Cloudera > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/What-is-the-minimum-value-allowed-for-StreamingContext-s-Seconds-parameter-tp27007p27008.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >