Thank you TD for your time and help.
SM
> On 19-Nov-2015, at 6:58 AM, Tathagata Das <t...@databricks.com> wrote:
>
> There are different ways to do the rollups. Either update rollups from the
> streaming application, or you can generate roll ups in a later process - say
> periodic Spark job every hour. Or you could just generate rollups on demand,
> when it is queried.
> The whole thing depends on your downstream requirements - if you always to
> have up to date rollups to show up in dashboard (even day-level stuff), then
> the first approach is better. Otherwise, second and third approaches are more
> efficient.
>
> TD
>
>
> On Wed, Nov 18, 2015 at 7:15 AM, Sandip Mehta <sandip.mehta....@gmail.com
> <mailto:sandip.mehta....@gmail.com>> wrote:
> TD thank you for your reply.
>
> I agree on data store requirement. I am using HBase as an underlying store.
>
> So for every batch interval of say 10 seconds
>
> - Calculate the time dimension ( minutes, hours, day, week, month and quarter
> ) along with other dimensions and metrics
> - Update relevant base table at each batch interval for relevant metrics for
> a given set of dimensions.
>
> Only caveat I see is I’ll have to update each of the different roll up table
> for each batch window.
>
> Is this a valid approach for calculating time series aggregation?
>
> Regards
> SM
>
> For minutes level aggregates I have set up a streaming window say 10 seconds
> and storing minutes level aggregates across multiple dimension in HBase at
> every window interval.
>
>> On 18-Nov-2015, at 7:45 AM, Tathagata Das <t...@databricks.com
>> <mailto:t...@databricks.com>> wrote:
>>
>> For this sort of long term aggregations you should use a dedicated data
>> storage systems. Like a database, or a key-value store. Spark Streaming
>> would just aggregate and push the necessary data to the data store.
>>
>> TD
>>
>> On Sat, Nov 14, 2015 at 9:32 PM, Sandip Mehta <sandip.mehta....@gmail.com
>> <mailto:sandip.mehta....@gmail.com>> wrote:
>> Hi,
>>
>> I am working on requirement of calculating real time metrics and building
>> prototype on Spark streaming. I need to build aggregate at Seconds,
>> Minutes, Hours and Day level.
>>
>> I am not sure whether I should calculate all these aggregates as different
>> Windowed function on input DStream or shall I use updateStateByKey function
>> for the same. If I have to use updateStateByKey for these time series
>> aggregation, how can I remove keys from the state after different time
>> lapsed?
>>
>> Please suggest.
>>
>> Regards
>> SM
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> <mailto:user-unsubscr...@spark.apache.org>
>> For additional commands, e-mail: user-h...@spark.apache.org
>> <mailto:user-h...@spark.apache.org>
>>
>>
>
>