Given that nodes crash, it seems essential to build reliability like it is
in lower layers - think TCP. One trivial approach could be to use
monotonically increasing sequence numbers to identify events in a stream.

   1. Event ordering: Hold events until all the previous sequence numbers
   have been received.
   2. Exactly-once: If a sequence number is smaller than previous one, it
   is a duplicate.
   3. Fault-tolerance: Store the latest sequence number along with the
   checkpoint, replay events from there onwards.

This, of course, comes at a performance overhead and should be optional.

As I said, this is the first approach that comes to mind. It is indeed an
interesting problem, and I feel we should not need to re-do stuff that is
done at lower layers.

Please suggest improvements/alternatives as you see fit.

Thanks
Karthik

On Mon, Aug 6, 2012 at 10:01 PM, Flavio Junqueira <f...@yahoo-inc.com> wrote:

>
> On Aug 6, 2012, at 10:11 PM, Karthik Kambatla wrote:
>
> Flavio - it is indeed tricky to offer exactly-once semantics. My
> understanding is that the underlying comm-layer could filter out subsequent
> duplicate events; however, we need to sacrifice ordering.
>
>
> I was also thinking that if a node crash and we recover from checkpoints,
> we may end up having messages applied twice.
>
> -Flavio
>
> Thanks
> Karthik
>
> On Mon, Aug 6, 2012 at 6:43 AM, "Benjamin Süß" <gothi...@gmx.de> wrote:
>
>> Hi Matthieu,
>>
>> thank you for your reply. I had a specific use case in mind, indeed:
>>
>> I am trying to track RFID tags in distributed systems. This means, that
>> not even a single tag scan may get lost. And of course, none are to be sent
>> twice or even more often as this would heavily confuse any surveillance
>> routines I am going to implement.
>>
>> Regarding your answers, especially point 3, I do not think this can be
>> done with S4 at the moment, can it?
>>
>> Regards,
>> Benjamin
>>
>> -------- Original-Nachricht --------
>> > Datum: Tue, 31 Jul 2012 17:30:27 +0200
>> > Von: Matthieu Morel <mmo...@apache.org>
>> > An: s4-u...@incubator.apache.org
>> > Betreff: Re: Thoughts on adding guaranteed message processing
>>
>> > On 7/31/12 2:54 PM, "Benjamin Süß" wrote:
>> > > Hi there,
>> > >
>> > > it is stated in several places that S4 does not include guaranteed
>> > one-time message processing. So my question is: are there currently any
>> plans on
>> > adding this to S4? Or is it certain this is not going to happen? If
>> there
>> > are any plans on this, can I find further information somewhere?
>> > >
>> >
>> > There are typically 3 requirements for guaranteeing one-time message
>> > processing:
>> >
>> > 1. reliable communication channels
>> >
>> > 2. replayable input stream: you need an upstream component that is able
>> > to store/bufferize the whole stream and replay on demand.
>> >
>> > 3. tracking of messages, using some sort of piggybacking, possibly
>> > requiring manual input from the user.
>> >
>> >
>> > In S4 0.5.0, we already address 1. by providing communications through
>> > TCP by default. Requirement 2. is quite straightforward to implement, by
>> > adding some machinery to connect to a component such as Apache Kafka for
>> > instance. We are considering options for 3.
>> >
>> >
>> > Do you have a specific use case in mind?
>> >
>> > Regards,
>> >
>> > Matthieu
>> >
>>
>
>
>

Reply via email to