Thanks Matthias!

My comments below.

Regards,
Jorge.

El lun., 30 ene. 2017 a las 18:40, Matthias J. Sax (<matth...@confluent.io>)
escribió:

> It would be enough, IMHO :)
>
> However, we need to discuss some details about this.
>
> 1) we could extend the reset tool with an flag --start-from-offsets and
> the user can specify an offset per partition
>
> This would give the most flexibility, but it is hard to use. Especially
> if you have many partitions, we do not want to hand in this information
> per command line (maybe an "offset file" would work).
>
> Doing this per topic or even global seems to be of little use because it
> lacks proper semantic interpretation.
>

Agree, this option is of little use but could be helpful to give backward
compatibility for clients that don't have timestamp index but nevertheless
they want to rewind to an specific offset.


>
>
> 2) we could extend the reset tool with an flag --start-from-timestamp
> that could be globally applied to all partitions (I guess, that is what
> you have in mind)
>
> Has the advantage that it is easier to use. However, what should the
> parameter format be? Milliseconds since the Epoch (what is the internal

format) seems hard to use either.
>

I think 'dateTime' and 'duration' are valid options here: you could define
to reprocess since 2017-01-01T09:00:00 and also reprocess since P1M - 1
month ago. XML duration (https://www.w3.org/TR/xmlschema-2/#duration) and
dateTime (https://www.w3.org/TR/xmlschema-2/#dateTime) lexical
representations could work here.


>
> There is also the question, how we deal with out-of-order data. Maybe
> it's sufficient though, to just set to the first tuple with equal of
> greater timestamp than the specified time. (and educate user, that they
> might see some older data, ie, with smaller ts, if there is later
> arriving out of order records).
>

Agree.


>
> We might want to exploit broker's timestamp index. But what about older
> brokers that do not have timestamp index, as we do have client backward
> compatibility now? We might just say, "not supported" though.
>
>
item (1) could be valid to give this option to older brokers.


> What about data, that lacks a proper timestamp and users work with
> custom timestamp extractor? Should we support this, too?
>
>
Haven't thought about this this use-case, could be a valid use case.


> Maybe we need a KIP discussion for this. It seems to be a broader feature.
>
>
Yes, I will love to do that. I also believe this could be a valid use-case
to be added to 'kafka-consumer-groups' command-line tool, and have an
external tool to rewind consumer-groups offsets.


> -Matthias
>
>
>
> On 1/30/17 2:07 AM, Jorge Esteban Quilcate Otoya wrote:
> > Thanks Eno and Matthias for your feedback!
> >
> > I've check KIP-95 and Matthias blog post (
> >
> https://www.confluent.io/blog/data-reprocessing-with-kafka-streams-resetting-a-streams-application/
> )
> > and I have a clearer idea on how stream internals work.
> >
> > In a general use-case, following Application Reset Tool's procedure:
> > ---
> >
> >    1. for any specified input topic, it resets all offsets to zero
> >    2. for any specified intermediate topic, seeks to the end for all
> >    partitions
> >    3. for all internal topic
> >       1. resets all offsets to zero
> >       2. deletes the topic
> >
> > ---
> > But instead of resetting input topics to zero, resetting input topics to
> > offset by timestamp wouldn't be enough?
> >
> > I will definitely take a look to StreamsResetter and give a try to
> support
> > this feature.
> >
> >
> > El lun., 30 ene. 2017 a las 1:43, Matthias J. Sax (<
> matth...@confluent.io>)
> > escribió:
> >
> >> You can always built you own little tool similar to StreamsResetter.java
> >> to get this done. Ie, you set the committed offset "manually" based on
> >> timestamps before you start your application.
> >>
> >> But as Eno mentioned, you need to think carefully about what a
> >> consistent reset point would be because you cannot reset the
> >> application's state...
> >>
> >> If you start you application with an empty state, this might be less of
> >> an concern though and seems reasonable.
> >>
> >>
> >> -Matthias
> >>
> >> On 1/29/17 12:55 PM, Eno Thereska wrote:
> >>> Hi Jorge,
> >>>
> >>> This is currently not possible, but it is likely to be considered for
> >> discussion. One challenge is that, if you have multiple topics, it is
> >> difficult to rewind them all back to a consistent point in time. KIP-95,
> >> currently under discussion, is handling the slightly different issue, of
> >> stopping the consuming at a point in time:
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-95%3A+Incremental+Batch+Processing+for+Kafka+Streams
> >> <
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-95:+Incremental+Batch+Processing+for+Kafka+Streams
> >>> .
> >>>
> >>> Thanks
> >>> Eno
> >>>> On 29 Jan 2017, at 19:29, Jorge Esteban Quilcate Otoya <
> >> quilcate.jo...@gmail.com> wrote:
> >>>>
> >>>> Hi everyone,
> >>>>
> >>>> I was wondering if its possible to rewind consumers offset in Kafka
> >> Stream
> >>>> using timestamp as with `offsetsForTimes(Map<TopicPartition, Long>
> >>>> timestampsToSearch)` in KafkaConsumer.
> >>>>
> >>>> I know its possible to go back to `earliest` offset in topic or
> >> `latest`,
> >>>> but would be useful to go back using timestamp as with Consumer API
> do.
> >>>>
> >>>> Maybe is there an option to do this already and I'm missing something?
> >>>>
> >>>> Thanks in advance for your feedback!
> >>>>
> >>>> Jorge.
> >>>
> >>>
> >>
> >>
> >
>
>

Reply via email to