Hi, Thanks for explanation, but it does not prove Spark will OOM at some point. You assume enough data to store but there could be none.
Jacek On 6 Aug 2016 4:23 a.m., "Mohammed Guller" <moham...@glassbeam.com> wrote: > Assume the batch interval is 10 seconds and batch processing time is 30 > seconds. So while Spark Streaming is processing the first batch, the > receiver will have a backlog of 20 seconds worth of data. By the time Spark > Streaming finishes batch #2, the receiver will have 40 seconds worth of > data in memory buffer. This backlog will keep growing as time passes > assuming data streams in consistently at the same rate. > > Also keep in mind that windowing operations on a DStream implicitly > persist every RDD in a DStream in memory. > > Mohammed > > -----Original Message----- > From: Jacek Laskowski [mailto:ja...@japila.pl] > Sent: Thursday, August 4, 2016 4:25 PM > To: Mohammed Guller > Cc: Saurav Sinha; user > Subject: Re: Explanation regarding Spark Streaming > > On Fri, Aug 5, 2016 at 12:48 AM, Mohammed Guller <moham...@glassbeam.com> > wrote: > > and eventually you will run out of memory. > > Why? Mind elaborating? > > Jacek >