I am sure few Spark gurus can explain this much better than me. So here we
go.
A DStream is an abstraction that breaks a continuous stream of data into
small chunks. This is called "micro-batching" and Spark streaming is all
about micro-batching
You have batch interval, windows length and
I have checked that doc sir.
My understand every batch interval of data always generates one RDD, So why do
we need to use foreachRDD when there is only one.
Sorry for this question but bit confusing me.
Thanks
On Wednesday, 7 September 2016, 18:05, Mich Talebzadeh
Hi,
What is so confusing about RDD. Have you checked this doc?
http://spark.apache.org/docs/latest/streaming-programming-guide.html#design-patterns-for-using-foreachrdd
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
Hi,
A bit confusing to me
How many layers involved in DStream.foreachRDD.
Do I need to loop over it more than once? I mean DStream.foreachRDD{ rdd = > }
I am trying to get individual lines in RDD.
Thanks