Re: Spark Streaming: BatchDuration and Processing time

Ricardo Paiva Mon, 18 Jan 2016 05:21:08 -0800

If you are using Kafka as the message queue, Spark will process accordingly
the time slices, even if it is late, like in your example. But it will fail
sometime, due the fact that your process will ask for a message that is
older than the oldest message in Kafka.
If your process takes longer than the streaming time, let's say at your
system peak time during day, but it takes much less time at night, when
your system is mostly idle, the streaming will work and process correctly
(though it's risky if the late time slices don't finish during the idle
time).


Best thing to do is try to optimize your job to fit at the time streaming
time and avoid overflows. :)

Regards,

Ricardo





On Sun, Jan 17, 2016 at 2:32 PM, pyspark2555 [via Apache Spark User List] <
ml-node+s1001560n25986...@n3.nabble.com> wrote:

> Hi,
>
> If BatchDuration is set to 1 second in StreamingContext and the actual
> processing time is longer than one second, then how does Spark handle that?
>
> For example, I am receiving a continuous Input stream. Every 1 second
> (batch duration), the RDDs will be processed. What if this processing time
> is longer than 1 second? What happens in the next batch duration?
>
> Thanks.
> Amit
>
> ------------------------------
> If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-BatchDuration-and-Processing-time-tp25986.html
> To start a new topic under Apache Spark User List, email
> ml-node+s1001560n1...@n3.nabble.com
> To unsubscribe from Apache Spark User List, click here
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=cmljYXJkby5wYWl2YUBjb3JwLmdsb2JvLmNvbXwxfDQ1MDcxMTc2Mw==>
> .
> NAML
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>



-- 
Ricardo Paiva
Big Data / Semântica
2483-6432
*globo.com* <http://www.globo.com>




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-BatchDuration-and-Processing-time-tp25986p25989.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Spark Streaming: BatchDuration and Processing time

Reply via email to