Re: Calculating time elapsed using event start / stop notification messages

2017-04-21 Thread Eno Thereska
Hi Ali,

One starting point would be the low level Processor API, where you get each 
event and process it. You can also use a persistent state store to keep track 
of the events seen so far, it can probably be an in-memory store. An an entry 
can probably be deleted once both start and stop events have been observed. If 
each record also has event timestamp, that would help with sorting the time in 
your processing logic.

After computing the time differences, you can either write that different to a 
topic, and then use a KTable to read from it and compute various windowed 
aggregates; or alternatively you can do the per hour/day/month processing in 
your own logic and stay entirely in the Processor API world.

Hope this helps
Eno

> On 21 Apr 2017, at 15:20, Ali Akhtar  wrote:
> 
> I have a tricky use case where a user initiates an event (by clicking a
> button) and then stops it (by clicking it again, losing connection, closing
> the browser, etc).
> 
> Each time the event starts or stops, a notification is sent to a kafka
> topic, with the user's id as the message key and the current timestamp, and
> the state of the event (started, or stopped).
> 
> I'm using Kafka streaming to process these events.
> 
> Based on the notifications, I need to determine the total time spent
> 'working', i.e the time between user clicked start, and they stopped. Per
> hour, per day, etc.
> 
> E.g total time spent 'working' per hour, per day.
> 
> Any ideas how this could be achieved, while accounting for messages
> arriving out of order due to latency, etc (e.g the stop notification may
> arrive before start)?
> 
> Would the kafka streams local store be of any use here (all events by the
> same user will have the same message key), or should i use Redis? Or do I
> need an hourly job which runs and processes last hour's events?



Calculating time elapsed using event start / stop notification messages

2017-04-21 Thread Ali Akhtar
I have a tricky use case where a user initiates an event (by clicking a
button) and then stops it (by clicking it again, losing connection, closing
the browser, etc).

Each time the event starts or stops, a notification is sent to a kafka
topic, with the user's id as the message key and the current timestamp, and
the state of the event (started, or stopped).

I'm using Kafka streaming to process these events.

Based on the notifications, I need to determine the total time spent
'working', i.e the time between user clicked start, and they stopped. Per
hour, per day, etc.

E.g total time spent 'working' per hour, per day.

Any ideas how this could be achieved, while accounting for messages
arriving out of order due to latency, etc (e.g the stop notification may
arrive before start)?

Would the kafka streams local store be of any use here (all events by the
same user will have the same message key), or should i use Redis? Or do I
need an hourly job which runs and processes last hour's events?