Implement Count by Minute in Spark Streaming

2014-10-26 Thread Ji ZHANG
Hi, Suppose I have a stream of logs and I want to count them by minute. The result is like: 2014-10-26 18:38:00 100 2014-10-26 18:39:00 150 2014-10-26 18:40:00 200 One way to do this is to set the batch interval to 1 min, but each batch would be quite large. Or I can use updateStateByKey where

Re: Implement Count by Minute in Spark Streaming

2014-10-26 Thread Asit Parija
Hi , You can use Redis to store the keys and value as count by doing an update function whenever you receive that minute key , being an in memory database it would faster than SQL .You can do an update at the end of each batch to update the count of the key if it exists or create in case