If you are using Kafka as the message queue, Spark will process accordingly the time slices, even if it is late, like in your example. But it will fail sometime, due the fact that your process will ask for a message that is older than the oldest message in Kafka. If your process takes longer than the streaming time, let's say at your system peak time during day, but it takes much less time at night, when your system is mostly idle, the streaming will work and process correctly (though it's risky if the late time slices don't finish during the idle time).
Best thing to do is try to optimize your job to fit at the time streaming time and avoid overflows. :) Regards, Ricardo On Sun, Jan 17, 2016 at 2:32 PM, pyspark2555 [via Apache Spark User List] < ml-node+s1001560n25986...@n3.nabble.com> wrote: > Hi, > > If BatchDuration is set to 1 second in StreamingContext and the actual > processing time is longer than one second, then how does Spark handle that? > > For example, I am receiving a continuous Input stream. Every 1 second > (batch duration), the RDDs will be processed. What if this processing time > is longer than 1 second? What happens in the next batch duration? > > Thanks. > Amit > > ------------------------------ > If you reply to this email, your message will be added to the discussion > below: > > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-BatchDuration-and-Processing-time-tp25986.html > To start a new topic under Apache Spark User List, email > ml-node+s1001560n1...@n3.nabble.com > To unsubscribe from Apache Spark User List, click here > <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=cmljYXJkby5wYWl2YUBjb3JwLmdsb2JvLmNvbXwxfDQ1MDcxMTc2Mw==> > . > NAML > <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> > -- Ricardo Paiva Big Data / Semântica 2483-6432 *globo.com* <http://www.globo.com> -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-BatchDuration-and-Processing-time-tp25986p25989.html Sent from the Apache Spark User List mailing list archive at Nabble.com.