Hi Daniela,
> Okay, could I do the grouping already in Kafka? For example would it be
> possible to use one topic per region or to use one topic with a partition for
> every region? Then the messages would already be grouped when the arrive at
> Storm. Is this correct?
You would need a kafka spout instance per topic and a separate windowed bolt
instance that receives from the corresponding kafka spout. But such a topology
would be difficult to manage as the number of topics increases. The other
option is to do the grouping within the windowed bolt like I mentioned in the
last mail.
> Would the windowing and the aggregation for each time window be separated in
> two bolts or is both done in one bolt?
Separate bolts are not needed for aggregation, it can be done inside the
windowed bolt.
Thanks,
Arun
On 3/31/16, 1:23 AM, "Maria Musterfrau" <daniela_4...@gmx.at> wrote:
>Hi Arun
>
>Sorry, I did not see your reply in the dev mailing list. Thank you very much!
>
>Okay, could I do the grouping already in Kafka? For example would it be
>possible to use one topic per region or to use one topic with a partition for
>every region? Then the messages would already be grouped when the arrive at
>Storm. Is this correct?
>
>Would the windowing and the aggregation for each time window be separated in
>two bolts or is both done in one bolt?
>
>Thank you in advance.
>
>Regards,
>Daniela
>
>
>
>Gesendet: Mittwoch, 30. März 2016 um 20:15 Uhr
>Von: "Arun Iyer" <ai...@hortonworks.com>
>An: "user@storm.apache.org" <user@storm.apache.org>, "daniela_4...@gmx.at"
><daniela_4...@gmx.at>
>Betreff: Re: Combining group by and time window
>
>Reposting the reply that was posted to dev mailing list :-
>
>
>For storm core, windowed bolts would give you the tuples in the last minute
>but you would have to do the grouping yourself. You could of-course use a
>fields grouping to split the load across the windowed bolts. For trident you
>might want to take a look at the windowing apis that were added recently and
>see if it fits your need. You have to choose between trident and core based on
>your use cases, the guarantee you need and if you need batching vs per tuple
>processing etc.
>
>- Arun
>
>
>
>From: Maria Musterfrau
>Reply-To: "user@storm.apache.org"
>Date: Wednesday, March 30, 2016 at 10:56 PM
>To: "user@storm.apache.org[user@storm.apache.org]"
>Subject: Fw: Combining group by and time window
>
>
>Does anyone have an idea?
>
>Thank you in advance.
>
>Regards,
>Daniela
>
>
>Gesendet: Montag, 28. März 2016 um 21:06 Uhr
>Von: "Maria Musterfrau" <daniela_4...@gmx.at[daniela_4...@gmx.at]>
>An: user@storm.apache.org[user@storm.apache.org]
>Betreff: Combining group by and time window
>
>Hi,
>
>I have a stream with time series data from different regions. I would like to
>group the stream by the different regions and to add up the values of the last
>minute (time window) per region. The sums should be persisted to Redis or
>something like this.
>
>I already found out that Storm Trident provides a group by function to split
>the stream. I think this could be useful.
>Storm core provides time windows, so I could use it for the aggregation.
>
>But how can I combine these two components? Or is this not possible?
>
>Would it be useful to do the grouping already in Kafka (with different topics)
>or is it better to do it in Storm
>
>Thank you in advance.
>
>Regards,
>Daniela
>