Re: Calculating Timeseries Aggregation

Sandip Mehta Wed, 18 Nov 2015 18:40:06 -0800
Thank you TD for your time and help.

SM
> On 19-Nov-2015, at 6:58 AM, Tathagata Das <t...@databricks.com> wrote:
> 
> There are different ways to do the rollups. Either update rollups from the 
> streaming application, or you can generate roll ups in a later process - say 
> periodic Spark job every hour. Or you could just generate rollups on demand, 
> when it is queried.
> The whole thing depends on your downstream requirements - if you always to 
> have up to date rollups to show up in dashboard (even day-level stuff), then 
> the first approach is better. Otherwise, second and third approaches are more 
> efficient.
> 
> TD
> 
> 
> On Wed, Nov 18, 2015 at 7:15 AM, Sandip Mehta <sandip.mehta....@gmail.com 
> <mailto:sandip.mehta....@gmail.com>> wrote:
> TD thank you for your reply.
> 
> I agree on data store requirement. I am using HBase as an underlying store.
> 
> So for every batch interval of say 10 seconds
> 
> - Calculate the time dimension ( minutes, hours, day, week, month and quarter 
> ) along with other dimensions and metrics
> - Update relevant base table at each batch interval for relevant metrics for 
> a given set of dimensions.
> 
> Only caveat I see is I’ll have to update each of the different roll up table 
> for each batch window.
> 
> Is this a valid approach for calculating time series aggregation?
> 
> Regards
> SM
> 
> For minutes level aggregates I have set up a streaming window say 10 seconds 
> and storing minutes level aggregates across multiple dimension in HBase at 
> every window interval. 
> 
>> On 18-Nov-2015, at 7:45 AM, Tathagata Das <t...@databricks.com 
>> <mailto:t...@databricks.com>> wrote:
>> 
>> For this sort of long term aggregations you should use a dedicated data 
>> storage systems. Like a database, or a key-value store. Spark Streaming 
>> would just aggregate and push the necessary data to the data store. 
>> 
>> TD
>> 
>> On Sat, Nov 14, 2015 at 9:32 PM, Sandip Mehta <sandip.mehta....@gmail.com 
>> <mailto:sandip.mehta....@gmail.com>> wrote:
>> Hi,
>> 
>> I am working on requirement of calculating real time metrics and building 
>> prototype  on Spark streaming. I need to build aggregate at Seconds, 
>> Minutes, Hours and Day level.
>> 
>> I am not sure whether I should calculate all these aggregates as  different 
>> Windowed function on input DStream or shall I use updateStateByKey function 
>> for the same. If I have to use updateStateByKey for these time series 
>> aggregation, how can I remove keys from the state after different time 
>> lapsed?
>> 
>> Please suggest.
>> 
>> Regards
>> SM
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
>> <mailto:user-unsubscr...@spark.apache.org>
>> For additional commands, e-mail: user-h...@spark.apache.org 
>> <mailto:user-h...@spark.apache.org>
>> 
>> 
> 
>
Re: Calculating Timeseries Aggregation

Reply via email to