I have created a Custom Receiver to fetch records pertaining to a specific
query from Elastic Search and have implemented Streaming RDD transformations to
process the data generated by the receiver.
The final RDD is a sorted list of name value pairs and I want to read the top
20 results
Polisetty hpoli...@icloud.com mailto:hpoli...@icloud.com
To: Tathagata Das t...@databricks.com mailto:t...@databricks.com
Cc: user user@spark.apache.org mailto:user@spark.apache.org
Sent: Monday, April 6, 2015 2:02 PM
Subject: Re: How to restrict foreach on a streaming RDD only once upon
receiver
YouTube version cued to that place:
http://www.youtube.com/watch?v=W5Uece_JmNst=23m18s
From: Hari Polisetty hpoli...@icloud.com
To: Tathagata Das t...@databricks.com
Cc: user user@spark.apache.org
Sent: Monday, April 6, 2015 2:02 PM
Subject: Re: How to restrict foreach on a streaming RDD
Yes, I’m using updateStateByKey and it works. But then I need to perform
further computation on this Stateful RDD (see code snippet below). I perform
forEach on the final RDD and get the top 10 records. I just don’t want the
foreach to be performed every time a new batch is received. Only when
So you want to sort based on the total count of the all the records
received through receiver? In that case, you have to combine all the counts
using updateStateByKey (