Re: Aw: Re: Combining group by and time window

2016-04-02 Thread Arun Mahadevan
Hi Daniela,

> Okay, could I do the grouping already in Kafka? For example would it be 
> possible to use one topic per region or to use one topic with a partition for 
> every region? Then the messages would already be grouped when the arrive at 
> Storm. Is this correct?

You would need a kafka spout instance per topic and a separate windowed bolt 
instance that receives from the corresponding kafka spout. But such a topology 
would be difficult to manage as the number of topics increases. The other 
option is to do the grouping within the windowed bolt like I mentioned in the 
last mail. 

> Would the windowing and the aggregation for each time window be separated in 
> two bolts or is both done in one bolt?

Separate bolts are not needed for aggregation, it can be done inside the 
windowed bolt.


Thanks,
Arun




On 3/31/16, 1:23 AM, "Maria Musterfrau" <daniela_4...@gmx.at> wrote:

>Hi Arun
>
>Sorry, I did not see your reply in the dev mailing list. Thank you very much!
>
>Okay, could I do the grouping already in Kafka? For example would it be 
>possible to use one topic per region or to use one topic with a partition for 
>every region? Then the messages would already be grouped when the arrive at 
>Storm. Is this correct?
>
>Would the windowing and the aggregation for each time window be separated in 
>two bolts or is both done in one bolt?
>
>Thank you in advance.
>
>Regards,
>Daniela
> 
> 
>
>Gesendet: Mittwoch, 30. März 2016 um 20:15 Uhr
>Von: "Arun Iyer" <ai...@hortonworks.com>
>An: "user@storm.apache.org" <user@storm.apache.org>, "daniela_4...@gmx.at" 
><daniela_4...@gmx.at>
>Betreff: Re: Combining group by and time window
>
>Reposting the reply that was posted to dev mailing list :-
> 
>
>For storm core, windowed bolts would give you the tuples in the last minute 
>but you would have to do the grouping yourself. You could of-course use a 
>fields grouping to split the load across the windowed bolts. For trident you 
>might want to take a look at the windowing apis that were added recently and 
>see if it fits your need. You have to choose between trident and core based on 
>your use cases, the guarantee you need and if you need batching vs per tuple 
>processing etc.
> 
>- Arun
>
> 
> 
>From: Maria Musterfrau
>Reply-To: "user@storm.apache.org"
>Date: Wednesday, March 30, 2016 at 10:56 PM
>To: "user@storm.apache.org[user@storm.apache.org]"
>Subject: Fw: Combining group by and time window
> 
>
>Does anyone have an idea?
> 
>Thank you in advance.
> 
>Regards,
>Daniela
> 
>
>Gesendet: Montag, 28. März 2016 um 21:06 Uhr
>Von: "Maria Musterfrau" <daniela_4...@gmx.at[daniela_4...@gmx.at]>
>An: user@storm.apache.org[user@storm.apache.org]
>Betreff: Combining group by and time window
>
>Hi,
> 
>I have a stream with time series data from different regions. I would like to 
>group the stream by the different regions and to add up the values of the last 
>minute (time window) per region. The sums should be persisted to Redis or 
>something like this.
> 
>I already found out that Storm Trident provides a group by function to split 
>the stream. I think this could be useful.
>Storm core provides time windows, so I could use it for the aggregation.
> 
>But how can I combine these two components? Or is this not possible?
> 
>Would it be useful to do the grouping already in Kafka (with different topics) 
>or is it better to do it in Storm
> 
>Thank you in advance.
> 
>Regards,
>Daniela
>



Re: Combining group by and time window

2016-03-30 Thread Spico Florin
hello!
from storm perspective, regarding window functionality, storm 1.0 will add
the implementation for window bolt. there is a verry good article regarding
on hortonwork what kind of functionality is provided. please have a look at
https://community.hortonworks.com/articles/14171/windowing-and-state-checkpointing-in-apache-storm.html
i hope that it helps.
regards, florin
On Wednesday, March 30, 2016, Maria Musterfrau <daniela_4...@gmx.at> wrote:
> Does anyone have an idea?
>
> Thank you in advance.
>
> Regards,
> Daniela
>
> Gesendet: Montag, 28. März 2016 um 21:06 Uhr
> Von: "Maria Musterfrau" <daniela_4...@gmx.at>
> An: user@storm.apache.org
> Betreff: Combining group by and time window
> Hi,
>
> I have a stream with time series data from different regions. I would
like to group the stream by the different regions and to add up the values
of the last minute (time window) per region. The sums should be persisted
to Redis or something like this.
>
> I already found out that Storm Trident provides a group by function to
split the stream. I think this could be useful.
> Storm core provides time windows, so I could use it for the aggregation.
>
> But how can I combine these two components? Or is this not possible?
>
> Would it be useful to do the grouping already in Kafka (with different
topics) or is it better to do it in Storm
>
> Thank you in advance.
>
> Regards,
> Daniela


Fw: Combining group by and time window

2016-03-30 Thread Maria Musterfrau

Does anyone have an idea?

 

Thank you in advance.

 

Regards,

Daniela

 

Gesendet: Montag, 28. März 2016 um 21:06 Uhr
Von: "Maria Musterfrau" <daniela_4...@gmx.at>
An: user@storm.apache.org
Betreff: Combining group by and time window



Hi,

 

I have a stream with time series data from different regions. I would like to group the stream by the different regions and to add up the values of the last minute (time window) per region. The sums should be persisted to Redis or something like this.

 

I already found out that Storm Trident provides a group by function to split the stream. I think this could be useful.

Storm core provides time windows, so I could use it for the aggregation.

 

But how can I combine these two components? Or is this not possible?

 

Would it be useful to do the grouping already in Kafka (with different topics) or is it better to do it in Storm

 

Thank you in advance.

 

Regards,

Daniela