Hi Matthias

Thank you very much for your reply.

How can I ensure that the spout fetches the values every minute?

Thank you in advance.

Regards,
Daniela


2016-04-11 11:10 GMT+02:00 Matthias J. Sax <mj...@apache.org>:

> Hi,
>
> @Arun: thanks for correcting my (it's hard to keep up to date with the
> latest changes those days... :))
>
> @Daniela:
>  - processing time means that the windows are aligned to the wall-clock
> time of your machine when you process your data; this implies some
> non-determinism and not repeatability if you process historical data
>  - event time means that the windows are aligned to timestamps that are
> encoded in your tuples (eg, as an attribute); this allows for
> deterministic processing as the result of a computation is independent
> on the time you perform the computation
>
> For your Redis idea:
>
> You can certainly do this. A Spout fetches all data from Redis each
> minute and forwards it to the window bolt. Ie, you do not fetch directly
> within you agg-bolt.
>
> However, if you use a custom aggregate function, you might be able to do
> this without Redis in between and de-duplicate in you aggregate function
> directly. When the window closes, you store the value for each device in
> a hash-map (key: Device-ID). During processing, for each value in the
> window. If a second value comes in, you overwrite it. As long as you do
> not have too many devices (ie, the window and hash-map does fix in
> memory) this should be the simplest approach.
>
>
> -Matthias
>
>
> On 04/10/2016 10:46 PM, Daniela Stoiber wrote:
> > HI Arun,
> >
> > thank you for your reply.
> >
> > But my problem is that I need to add up the values over all devices, but
> I am only allowed to use the most recent value of each device. A value is
> valid as long as there is no new value for this device available.
> >
> > So if I receive a message with device A with value 1, value 1 should be
> used for the sum as long as the value of A does not change.
> > When I receive a new value for A, the new value should be used for the
> sum and the old one should be replaced.
> >
> > Therefore I thought to use Redis to store this information:
> > Device                Value
> > A             1
> > B             10
> > C             4
> >
> > Then I would like to pull every minute the most recent value of each
> device to build the sum. Therefore I would like to use the windowed bolt.
> But I am not sure if it is possible to pull data out of Redis within a
> windowed bolt.
> >
> > Thank you in advance.
> >
> > Regards,
> > Daniela
> >
> >
> > -----Ursprüngliche Nachricht-----
> > Von: Arun Iyer [mailto:ai...@hortonworks.com] Im Auftrag von Arun
> Mahadevan
> > Gesendet: Sonntag, 10. April 2016 20:55
> > An: dev@storm.apache.org
> > Betreff: Re: AW: Use only latest values
> >
> > Hi Matthias,
> >
> > WindowedBolt does support event time. In trident its is not yet exposed.
> >
> > Hi Daniela,
> >
> > You could solve your use cases in different ways. One would be to have a
> WindowedBolt with a 1 min tumbling window, do your custom aggregation (e.g.
> sum) every time the window tumbles and emit the results to another bolt
> where you update the count in Redis. Most of your state saving could also
> be automated by defining a Stateful bolt that would periodically checkpoint
> your state (sum per device). You could also club both windowing and state
> into a StatefulWindowedBolt implementation. You can evaluate the options
> and decide based on your use cases.
> >
> > Take a look at the sample topologies (SlidingWindowTopology,
> SlidingTupleTsTopology, StatefulTopology, StatefulWindowingTopology) in
> storm-starter and the docs for more info.
> >
> > https://github.com/apache/storm/blob/master/docs/Windowing.md
> >
> > https://github.com/apache/storm/blob/master/docs/State-checkpointing.md
> >
> >
> > -Arun
> >
> >
> >
> >
> > On 4/10/16, 4:30 PM, "Matthias J. Sax" <mj...@apache.org> wrote:
> >
> >> A tumbling window (ie, non-overlapping window) is the right approach (a
> >> sliding window is overlapping).
> >>
> >> The window goes into your aggregation bolt (windowing and aggregation
> >> goes hand in hand, ie, when the window gets closed, the aggregation is
> >> triggered and the window content is handed over to the aggregation
> >> function).
> >>
> >> Be aware that Storm (currently) only supports processing time window
> >> (an no event time windows).
> >>
> >> -Matthias
> >>
> >>
> >> On 04/10/2016 09:56 AM, Daniela Stoiber wrote:
> >>> Hi,
> >>>
> >>> thank you for your reply.
> >>>
> >>> How can I ensure that the latest values are pulled from Redis the sum
> >>> is updated every minute? Do I need a sliding window with an interval
> >>> of 1 minute? Where would this sliding window be located in my topology?
> >>>
> >>> Thank you in advance.
> >>>
> >>> Regards,
> >>> Daniela
> >>>
> >>> -----Ursprüngliche Nachricht-----
> >>> Von: Matthias J. Sax [mailto:mj...@apache.org]
> >>> Gesendet: Samstag, 9. April 2016 12:13
> >>> An: dev@storm.apache.org
> >>> Betreff: Re: Use only latest values
> >>>
> >>> Sounds reasonable.
> >>>
> >>>
> >>> On 04/09/2016 08:34 AM, Daniela Stoiber wrote:
> >>>> Hi,
> >>>>
> >>>>
> >>>>
> >>>> I would like to cache values and to use only the latest "valid"
> >>>> values to build a sum.
> >>>>
> >>>> In more detail, I receive values from devices periodically. I would
> >>>> like to add up all the valid values each minute. But not every
> >>>> device sends a new value every minute. And as long as there is no
> >>>> new value the old one should be used for the sum. As soon as I
> >>>> receive a new value from a device I would like to overwrite the old
> >>>> value and to use the new one for the sum. Would that be possible
> >>>> with the combination of
> >>> Storm and Redis?
> >>>>
> >>>>
> >>>>
> >>>> My idea was to use the following:
> >>>>
> >>>>
> >>>>
> >>>> - Kafka Spout
> >>>>
> >>>> - Storm Bolt for storing the tuples in Redis and for overwriting the
> >>>> values as soon as a new one is delivered
> >>>>
> >>>> - Storm Bolt for reading the latest tuples from Redis
> >>>>
> >>>> - Storm Bolt for grouping (I would like to group the devices per
> >>>> region)
> >>>>
> >>>> - Storm Bolt for aggregation
> >>>>
> >>>> - Storm Bolt for storing the results again in Redis
> >>>>
> >>>>
> >>>>
> >>>> Thank you in advance.
> >>>>
> >>>>
> >>>>
> >>>> Regards,
> >>>>
> >>>> Daniela
> >>>>
> >>>>
> >>>
> >>>
> >>
> >
> >
>
>

Reply via email to