May be you can make use of the Window operations <https://spark.apache.org/docs/1.2.0/streaming-programming-guide.html#window-operations>, Also another approach would be to keep your incoming data in Hbase/Redis/Cassandra kind of database and then whenever you need to average it, you just query the database and average it.
Thanks Best Regards On Thu, May 28, 2015 at 1:22 AM, David Webber <david.web...@gmail.com> wrote: > Hi All, > > I'm new to Spark and I'd like some help understanding if a particular use > case would be a good fit for Spark Streaming. > > I have an imaginary stream of sensor data consisting of integers 1-10. > Every time the sensor reads "10" I'd like to average all the numbers that > were received since the last "10" > > example input: 10 5 8 4 6 2 1 2 8 8 8 1 6 9 1 3 10 1 3 10 ... > desired output: 4.8, 2.0 > > I'm confused about what happens if sensor readings fall into different > RDDs. > > RDD1: 10 5 8 4 6 2 1 2 8 8 8 > RDD2: 1 6 9 1 3 10 1 3 10 > output: ???, 2.0 > > My imaginary sensor doesn't read at fixed time intervals, so breaking the > stream into RDDs by time interval won't ensure the data is packaged > properly. Additionally, multiple sensors are writing to the same stream > (though I think flatMap can parse the origin stream into streams for > individual sensors, correct?). > > My best guess for processing goes like > 1) flatMap() to break out individual sensor streams > 2) Custom parser to accumulate input data until "10" is found, then create > a > new output RDD for each sensor and data grouping > 3) average the values from step 2 > > I would greatly appreciate pointers to some specific documentation or > examples if you have seen something like this before. > > Thanks, > David > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/RDD-boundaries-and-triggering-processing-using-tags-in-the-data-tp23060.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >