For your case, setting the window time to some small value (say a second)
doesn't provide small enough granularity to check for completeness and
produce as necessary?  This would generally be how what you're describing
would be handled.

SAMZA-23 would be a great contribution, if you're interested.
-jg


On Mon, Apr 14, 2014 at 4:36 AM, Nicolas Bär <[email protected]> wrote:

> Hi
>
> I'm using Samza to aggregate data within sliding windows. I'm especially
> interested in the fault tolerance aspect of Samza and it's ability to
> restore from the latest committed offset. I've looked into the
> WindowableTask interface, but this implementation relies on the system
> time, rather then the timestamp of the messages received. The main problem
> is to make sure all data for one window is available and in case of node
> failure the complete window is restored. With the current implementation of
> offset committing (committed by time interval, window size or upon
> `taskCoordinator.commit()`) this is quite cumbersome. I have two different
> options in mind to overcome this problem and would like to hear your advice
> on this.
>
> 1. Using the Key-Value store the data of a the current window is cached and
> after a restore one could figure out what data to include from the cache
> and from Kafka. Unfortunately, this sounds like reimplementing the offset
> management of Kafka.
>
> 2. Set the time based commit interval to Long.MAX_VALUE and handle all
> commit intervals through `taskCoordinator.commit()`. In this case I could
> commit the offset after each sliding window and in case of failure Samza
> would restore from the beginning of the new sliding window. The only
> problem with this approach is `taskCoordinator.commit()` will commit the
> offsets of all partitions. But the data may be different between partitions
> and therefore sliding windows will not match with other partitions. I've
> looked into SAMZA-23 [1] and found a proposal to coordinate commits for
> each partition individually. Since this would solve my problem, I'd like to
> provide a patch for this issue. Is there any interest in this?
>
> Best
> Nicolas
>
>
> [1]: https://issues.apache.org/jira/browse/SAMZA-23
>

Reply via email to