Re: dstream.foreachRDD iteration

2016-09-07 Thread Mich Talebzadeh
I am sure few Spark gurus can explain this much better than me. So here we go. A DStream is an abstraction that breaks a continuous stream of data into small chunks. This is called "micro-batching" and Spark streaming is all about micro-batching You have batch interval, windows length and

Re: dstream.foreachRDD iteration

2016-09-07 Thread Ashok Kumar
I have checked that doc sir. My understand every batch interval of data always generates one RDD, So why do we need to use foreachRDD when there is only one. Sorry for this question but bit confusing me. Thanks On Wednesday, 7 September 2016, 18:05, Mich Talebzadeh

Re: dstream.foreachRDD iteration

2016-09-07 Thread Mich Talebzadeh
Hi, What is so confusing about RDD. Have you checked this doc? http://spark.apache.org/docs/latest/streaming-programming-guide.html#design-patterns-for-using-foreachrdd HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

dstream.foreachRDD iteration

2016-09-07 Thread Ashok Kumar
Hi, A bit confusing to me How many layers involved in DStream.foreachRDD. Do I need to loop over it more than once? I mean  DStream.foreachRDD{ rdd = > } I am trying to get individual lines in RDD. Thanks