Re: Explanation regarding Spark Streaming

Mich Talebzadeh Sat, 06 Aug 2016 12:25:54 -0700

Hi,

I think the default storage level
<http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence>is
MEMORY_ONLY


HTH



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 6 August 2016 at 18:16, Mohammed Guller <moham...@glassbeam.com> wrote:

> Hi Jacek,
>
> Yes, I am assuming that data streams in consistently at the same rate (for
> example, 100MB/s).
>
>
>
> BTW, even if the persistence level for streaming data is set to
> MEMORY_AND_DISK_SER_2 (the default), once Spark runs out of memory, data
> will spill to  disk. That will make the application performance even worse.
>
>
>
> Mohammed
>
>
>
> *From:* Jacek Laskowski [mailto:ja...@japila.pl]
> *Sent:* Saturday, August 6, 2016 1:54 AM
> *To:* Mohammed Guller
> *Cc:* Saurav Sinha; user
> *Subject:* RE: Explanation regarding Spark Streaming
>
>
>
> Hi,
>
> Thanks for explanation, but it does not prove Spark will OOM at some
> point. You assume enough data to store but there could be none.
>
> Jacek
>
>
>
> On 6 Aug 2016 4:23 a.m., "Mohammed Guller" <moham...@glassbeam.com> wrote:
>
> Assume the batch interval is 10 seconds and batch processing time is 30
> seconds. So while Spark Streaming is processing the first batch, the
> receiver will have a backlog of 20 seconds worth of data. By the time Spark
> Streaming finishes batch #2, the receiver will have 40 seconds worth of
> data in memory buffer. This backlog will keep growing as time passes
> assuming data streams in consistently at the same rate.
>
> Also keep in mind that windowing operations on a DStream implicitly
> persist every RDD in a DStream in memory.
>
> Mohammed
>
> -----Original Message-----
> From: Jacek Laskowski [mailto:ja...@japila.pl]
> Sent: Thursday, August 4, 2016 4:25 PM
> To: Mohammed Guller
> Cc: Saurav Sinha; user
> Subject: Re: Explanation regarding Spark Streaming
>
> On Fri, Aug 5, 2016 at 12:48 AM, Mohammed Guller <moham...@glassbeam.com>
> wrote:
> > and eventually you will run out of memory.
>
> Why? Mind elaborating?
>
> Jacek
>

Re: Explanation regarding Spark Streaming

Reply via email to