Hi Daan,

You may find this link Re: Is "spark streaming" streaming or mini-batch?
<https://www.mail-archive.com/user@spark.apache.org/msg55914.html>
helpful. This was a thread in this forum not long ago.

HTH

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 13 September 2016 at 14:25, DandyDev <debie.d...@gmail.com> wrote:

> Hi all!
>
> When reading about Spark Streaming and its execution model, I see diagrams
> like this a lot:
>
> <http://apache-spark-user-list.1001560.n3.nabble.com/file/n27699/lambda-
> architecture-with-spark-spark-streaming-kafka-cassandra-
> akka-and-scala-31-638.jpg>
>
> It does a fine job explaining how DStreams consist of micro batches that
> are
> basically RDDs. There are however some things I don't understand:
>
> - RDDs are distributed by design, but micro batches are conceptually small.
> How/why are these micro batches distributed so that they need to be
> implemented as RDD?
> - The above image doesn't explain how Spark Streaming parallelizes data.
> According to the image, a stream of events get broken into micro batches
> over the axis of time (time 0 to 1 is a micro batch, time 1 to 2 is a micro
> batch, etc.). How does parallelism come into play here? Is it that even
> within a "time slot" (eg. time 0 to 1) there can be so many events, that
> multiple micro batches for that time slot will be created and distributed
> across the executors?
>
> Clarification would be helpful!
>
> Daan
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/Spark-Streaming-dividing-DStream-
> into-mini-batches-tp27699.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Reply via email to