from:"Walker Carlson"

Re: [VOTE] KIP-1038: Add Custom Error Handler to Producer

2024-05-20 Thread Walker Carlson

Hey Alieh,

Thanks for the KIP.

+1 binding

Walker

On Tue, May 7, 2024 at 10:57 AM Alieh Saeedi 
wrote:

> Hi all,
>
> It seems that we have no more comments, discussions, or feedback on
> KIP-1038; therefore, I’d like to open voting for the KIP: Add Custom Error
> Handler to Producer
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1038%3A+Add+Custom+Error+Handler+to+Producer
> >
>
>
> Cheers,
> Alieh
>

[jira] [Created] (KAFKA-16699) Have Streams treat InvalidPidMappingException like a ProducerFencedException

2024-05-10 Thread Walker Carlson (Jira)

Walker Carlson created KAFKA-16699:
--

 Summary: Have Streams treat InvalidPidMappingException like a 
ProducerFencedException
 Key: KAFKA-16699
 URL: https://issues.apache.org/jira/browse/KAFKA-16699
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Reporter: Walker Carlson






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] KIP-1024: Make the restore behavior of GlobalKTables with custom processors configureable

2024-04-30 Thread Walker Carlson

Hey Bruno,

Thanks for the feedback. Sorry for the late reply, I was hoping others
might weigh in as well and it got away from me.

a) I like this but I think we should separate this out. This kip has
already dragged on more than it should and I think that is a big enough
change to get done by itself.

b) I'm a bit resistant to adding a new type of processor or store for this
change. It feels excessively complicated for what should be a small change.
I think that it might be good but I don't want to expand the scope more
than absolutely necessary here.

best,
walker

On Wed, Apr 10, 2024 at 4:34 AM Bruno Cadonna  wrote:

> Hi Walker,
>
> Thanks for the updates!
>
>
> (1) While I like naming the methods differently, I have also to say that
> I do not like addIsomorphicGlobalStore() because it does not really tell
> what the method does. I could also not come up with a better name than
> addGlobalStoreWithReprocessingOnRestore(). However, I had two ideas on
> which I would like to have your opinion.
>
> (a) Add a new GlobalStoreBuilder in which users can set if the global
> state store should reprocess on restore. Additionally, to the option to
> enable or disable reprocessing on restore, you could also NOT offer a
> way to enable or disable logging in the GlobalStoreBuilder. Currently,
> if users enable logging for a store builder that they pass into
> addGlobalStore(), Kafka Streams needs to explicitly disable it again,
> which is not ideal.
>
> (b) Add a new GlobalProcessorSupplier in which users can set if the
> global state store should reprocess on restore. Another ugliness that
> could be fixed with this is passing Void, Void to ProcessorSupplier. The
> GlobalProcessorSupplier would just have two type parameters .
> The nice aspect of this idea is that the option to enable/disable
> reprocessing on restore is only needed when a processor supplier is
> passed into the methods. That is not true for idea (a).
>
>
> (2) Yes, that was my intent.
>
>
> Best,
> Bruno
>
> On 4/9/24 9:33 PM, Walker Carlson wrote:
> > Hey all,
> >
> > (1) no I hadn't considered just naming the methods differently. I
> actually
> > really like this idea and am for it. Except we need 3 different methods
> > now. One for no processor, one for a processor that should restore and
> one
> > that reprocesses. How about `addCustomGlobalStore` and
> > `addIsomorphicGlobalStore` and then just `addGlobalStateStore` for the no
> > processor case? If everyone likes that I can add that to the KIP and
> rename
> > the methods.
> >
> > (2) we can have the the built in case use StoreBuilder > KeyValueStore> and manually check for the TimestampedKeyValueStore. That
> is
> > fine with me.
> >
> > Bruno I hope that was what you were intending.
> >
> > (3) For the scala api, do we need to make it match the java api or are we
> > just making the minimum changes? as if we take point 1 I don't know how
> > much we need to change.
> >
> > Thanks,
> > Walker
> >
> >
> > On Tue, Apr 2, 2024 at 8:38 AM Matthias J. Sax  wrote:
> >
> >> One more thing:
> >>
> >> I was just looking into the WIP PR, and it seems we will also need to
> >> change `StreamsBuilder.scala`. The KIP needs to cover this changes as
> well.
> >>
> >>
> >> -Matthias
> >>
> >> On 4/1/24 10:33 PM, Bruno Cadonna wrote:
> >>> Hi Walker and Matthias,
> >>>
> >>> (2)
> >>> That is exactly my point about having a compile time error versus a
> >>> runtime error. The added flexibility as proposed by Matthias sounds
> good
> >>> to me.
> >>>
> >>> Regarding the Named parameter, I was not aware that the processor that
> >>> writes records to the global state store is named according to the name
> >>> passed in by Consumed. I thought Consumed strictly specifies the names
> >>> of source processors. So I am fine with not having an overload with a
> >>> Named parameter.
> >>>
> >>> Best,
> >>> Bruno
> >>>
> >>> On 3/31/24 11:30 AM, Matthias J. Sax wrote:
> >>>> Two more follow up thoughts:
> >>>>
> >>>> (1) I am still not a big fan of the boolean parameter we introduce.
> >>>> Did you consider to use different method names, like
> >>>> `addReadOnlyGlobalStore()` (for the optimized method, that would not
> >>>> reprocess data on restore), and maybe add `addModifiableGlobalStore()`
> >>>> (not a good name, but we cannot re-u

Re: [DISCUSS] KIP-1024: Make the restore behavior of GlobalKTables with custom processors configureable

2024-04-09 Thread Walker Carlson

Hey all,

(1) no I hadn't considered just naming the methods differently. I actually
really like this idea and am for it. Except we need 3 different methods
now. One for no processor, one for a processor that should restore and one
that reprocesses. How about `addCustomGlobalStore` and
`addIsomorphicGlobalStore` and then just `addGlobalStateStore` for the no
processor case? If everyone likes that I can add that to the KIP and rename
the methods.

(2) we can have the the built in case use StoreBuilder and manually check for the TimestampedKeyValueStore. That is
fine with me.

Bruno I hope that was what you were intending.

(3) For the scala api, do we need to make it match the java api or are we
just making the minimum changes? as if we take point 1 I don't know how
much we need to change.

Thanks,
Walker


On Tue, Apr 2, 2024 at 8:38 AM Matthias J. Sax  wrote:

> One more thing:
>
> I was just looking into the WIP PR, and it seems we will also need to
> change `StreamsBuilder.scala`. The KIP needs to cover this changes as well.
>
>
> -Matthias
>
> On 4/1/24 10:33 PM, Bruno Cadonna wrote:
> > Hi Walker and Matthias,
> >
> > (2)
> > That is exactly my point about having a compile time error versus a
> > runtime error. The added flexibility as proposed by Matthias sounds good
> > to me.
> >
> > Regarding the Named parameter, I was not aware that the processor that
> > writes records to the global state store is named according to the name
> > passed in by Consumed. I thought Consumed strictly specifies the names
> > of source processors. So I am fine with not having an overload with a
> > Named parameter.
> >
> > Best,
> > Bruno
> >
> > On 3/31/24 11:30 AM, Matthias J. Sax wrote:
> >> Two more follow up thoughts:
> >>
> >> (1) I am still not a big fan of the boolean parameter we introduce.
> >> Did you consider to use different method names, like
> >> `addReadOnlyGlobalStore()` (for the optimized method, that would not
> >> reprocess data on restore), and maybe add `addModifiableGlobalStore()`
> >> (not a good name, but we cannot re-use existing `addGlobalStore()` --
> >> maybe somebody else has a good idea about a better `addXxxGlobalStore`
> >> that would describe it well).
> >>
> >> (2) I was thinking about Bruno's comment to limit the scope the store
> >> builder for the optimized case. I think we should actually do
> >> something about it, because in the end, the runtime (ie, the
> >> `Processor` we hard wire) would need to pick a store it supports and
> >> cast to the corresponding store? If the cast fails, we hit a runtime
> >> exception, but by putting the store we cast to into the signature we
> >> can actually convert it into a compile time error what seems better.
> >> -- If we want, we could make it somewhat flexible and support both
> >> `KeyValueStore` and `TimestampedKeyValueStore` -- ie, the signature
> >> would be `KeyValueStore` but we explicitly check if the builder gives
> >> us a `TimestampedKeyValueStore` instance and use it properly.
> >>
> >> If putting the signature does not work for some reason, we should at
> >> least clearly call it out in the JavaDocs what store type is expected.
> >>
> >>
> >>
> >> -Matthias
> >>
> >>
> >>
> >> On 3/28/24 5:05 PM, Walker Carlson wrote:
> >>> Hey all,
> >>>
> >>> Thanks for the feedback Bruno, Almog and Matthias!
> >>>
> >>> Almog: I like the idea, but I agree with Matthais. I actually looked at
> >>> that ticket a bit when doing this and found that while similar they are
> >>> actually pretty unrelated codewise. I would love to see it get taken
> >>> care
> >>> of.
> >>>
> >>> Bruno and Matthias: The Named parameter doesn't really make sense to
> >>> me to
> >>> put it here. The store in the Store builder is already named through
> >>> what
> >>> Matthais described and the processor doesn't actually have a name. That
> >>> would be the processor node that gets named via the Named parameter
> (in
> >>> the DSL) and the internal streams builder uses the consumed object to
> >>> make
> >>> a source name. I think we should just keep the Consumed object and used
> >>> that for the processor node name.
> >>>
> >>> As for the limitation of the store builder interface I don't think it
> is
> >>> necessary. It could be something we add lat

Re: [DISCUSS] KIP-1024: Make the restore behavior of GlobalKTables with custom processors configureable

2024-03-28 Thread Walker Carlson

> >> While it is an interesting question, I am in favor of deferring this to
> >> a separate KIP.
> >>
> >> Best,
> >> Bruno
> >>
> >> On 3/26/24 12:49 AM, Almog Gavra wrote:
> >>> Hello Folk!
> >>>
> >>> Glad to see improvements to the GlobalKTables in discussion! I think
> they
> >>> deserve more love :)
> >>>
> >>> Scope creep alert (which I'm generally against and certainly still
> >> support
> >>> this KIP without but I want to see if there's an elegant way to address
> >>> both problems). The KIP mentions that "Now the restore is done by
> >>> reprocessing using an instance from the customer processor supplier"
> >> which
> >>> I suppose fixed a long-standing bug (
> >>> https://issues.apache.org/jira/browse/KAFKA-8037) but only for
> >>> GlobalKTables and not for normal KTables that use the source-changelog
> >>> optimization. Since this API could be used to signal "I want to
> reprocess
> >>> on restore" I'm wondering whether it makes sense to design this API in
> a
> >>> way that could be extended for KTables as well so a fix for KAFKA-8037
> >>> would be possible with the same mechanism. Thoughts?
> >>>
> >>> Cheers,
> >>> Almog
> >>>
> >>> On Mon, Mar 25, 2024 at 11:06 AM Walker Carlson
> >>>  wrote:
> >>>
> >>>> Hey Bruno,
> >>>>
> >>>> 1) I'm actually not sure why that is in there. It certainly doesn't
> >> match
> >>>> the convention. Best to remove it and match the other methods.
> >>>>
> >>>> 2) Yeah, I thought about it but I'm not convinced it is a necessary
> >>>> restriction. It might be useful for the already defined processors but
> >> then
> >>>> they might as well use the `globalTable` method. I think the add state
> >>>> store option should go for maximum flexibility.
> >>>>
> >>>> Best,
> >>>> Walker
> >>>>
> >>>>
> >>>>
> >>>> On Fri, Mar 22, 2024 at 10:01 AM Bruno Cadonna 
> >> wrote:
> >>>>
> >>>>> Hi Walker,
> >>>>>
> >>>>> A couple of follow-up questions.
> >>>>>
> >>>>> 1.
> >>>>> Why do you propose to explicitly pass a parameter "storeName" in
> >>>>> StreamsBuilder#addGlobalStore?
> >>>>> The StoreBuilder should already provide a name for the store, if I
> >>>>> understand the code correctly.
> >>>>> I would avoid using the same name for the source node and the state
> >>>>> store, because it limits the flexibility in naming. Why do you not
> use
> >>>>> Named for the name of the source node?
> >>>>>
> >>>>> 2.
> >>>>> Did you consider Matthias' proposal to restrict the type of the store
> >>>>> builder to `StoreBuilder` (or even
> >>>>> `StoreBuilder`) for the case
> where
> >>>>> the processor is built-in?
> >>>>>
> >>>>>
> >>>>> Best,
> >>>>> Bruno
> >>>>>
> >>>>> On 3/13/24 11:05 PM, Walker Carlson wrote:
> >>>>>> Thanks for the feedback Bruno, Matthias, and Lucas!
> >>>>>>
> >>>>>> There is a decent amount but I'm going to try and just hit the major
> >>>>> points
> >>>>>> as I would like to keep this change simple.
> >>>>>>
> >>>>>> I've made corrections for the mistakes pointed out. Thanks for the
> >>>>>> suggestions everyone.
> >>>>>>
> >>>>>> The main sticking point seems to be with the method of signalling
> the
> >>>>>> restore behavior. It seems we can all agree with how the API should
> >>>> look
> >>>>>> with the default option we are adding. I think keeping the option to
> >>>> load
> >>>>>> directly from the topic into the store is a good idea. It is much
> more
> >>>>>> performant and could make a simple metric collector processor much
> >>>>> simpler.
> >>>>>>
> >>>>>> I think something that Matthais said about creating a special class
> of
> >>>>>> processors for the global stores helps me think about the issue. I
> >> tend
> >>>>> to
> >>>>>> fall into the category that we should keep global stores open to the
> >>>>>> possibility of having child nodes in the future. I don't really see
> >> the
> >>>>>> downside of having that as an option. It might not be best for a lot
> >> of
> >>>>>> cases, but something simple could be very useful to put in the PAPI.
> >>>>>>
> >>>>>> I like the idea of having a `GlobalStoreParameters` but only if we
> >>>> decide
> >>>>>> to make the processor need to extend an interface like
> >>>>>> 'GobalStoreProcessor`. If not that seems excessive.
> >>>>>>
> >>>>>> As of right now I don't see a better option than having a boolean
> flag
> >>>>> for
> >>>>>> the reprocessOnRestore option. I expanded the description in the
> docs
> >>>> so
> >>>>> I
> >>>>>> hope that helps.
> >>>>>>
> >>>>>> I am more than willing to take other ideas on it.
> >>>>>>
> >>>>>> thanks,
> >>>>>> Walker
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
>

[VOTE] KIP-1024: Make the restore behavior of GlobalKTables with custom processors configureable

2024-03-25 Thread Walker Carlson

Hello everybody,

I think we have had some pretty good discussion on this kip and it seems
that we are close if not yet settled on the final version.

So I would like to open up the voting for KIP-1024:
https://cwiki.apache.org/confluence/x/E4t3EQ

Thanks everyone!
Walker

Re: [DISCUSS] KIP-1024: Make the restore behavior of GlobalKTables with custom processors configureable

2024-03-25 Thread Walker Carlson

Hey Bruno,

1) I'm actually not sure why that is in there. It certainly doesn't match
the convention. Best to remove it and match the other methods.

2) Yeah, I thought about it but I'm not convinced it is a necessary
restriction. It might be useful for the already defined processors but then
they might as well use the `globalTable` method. I think the add state
store option should go for maximum flexibility.

Best,
Walker



On Fri, Mar 22, 2024 at 10:01 AM Bruno Cadonna  wrote:

> Hi Walker,
>
> A couple of follow-up questions.
>
> 1.
> Why do you propose to explicitly pass a parameter "storeName" in
> StreamsBuilder#addGlobalStore?
> The StoreBuilder should already provide a name for the store, if I
> understand the code correctly.
> I would avoid using the same name for the source node and the state
> store, because it limits the flexibility in naming. Why do you not use
> Named for the name of the source node?
>
> 2.
> Did you consider Matthias' proposal to restrict the type of the store
> builder to `StoreBuilder` (or even
> `StoreBuilder`) for the case where
> the processor is built-in?
>
>
> Best,
> Bruno
>
> On 3/13/24 11:05 PM, Walker Carlson wrote:
> > Thanks for the feedback Bruno, Matthias, and Lucas!
> >
> > There is a decent amount but I'm going to try and just hit the major
> points
> > as I would like to keep this change simple.
> >
> > I've made corrections for the mistakes pointed out. Thanks for the
> > suggestions everyone.
> >
> > The main sticking point seems to be with the method of signalling the
> > restore behavior. It seems we can all agree with how the API should look
> > with the default option we are adding. I think keeping the option to load
> > directly from the topic into the store is a good idea. It is much more
> > performant and could make a simple metric collector processor much
> simpler.
> >
> > I think something that Matthais said about creating a special class of
> > processors for the global stores helps me think about the issue. I tend
> to
> > fall into the category that we should keep global stores open to the
> > possibility of having child nodes in the future. I don't really see the
> > downside of having that as an option. It might not be best for a lot of
> > cases, but something simple could be very useful to put in the PAPI.
> >
> > I like the idea of having a `GlobalStoreParameters` but only if we decide
> > to make the processor need to extend an interface like
> > 'GobalStoreProcessor`. If not that seems excessive.
> >
> > As of right now I don't see a better option than having a boolean flag
> for
> > the reprocessOnRestore option. I expanded the description in the docs so
> I
> > hope that helps.
> >
> > I am more than willing to take other ideas on it.
> >
> > thanks,
> > Walker
> >
>

Re: [DISCUSS] KIP-1024: Make the restore behavior of GlobalKTables with custom processors configureable

2024-03-13 Thread Walker Carlson

Thanks for the feedback Bruno, Matthias, and Lucas!

There is a decent amount but I'm going to try and just hit the major points
as I would like to keep this change simple.

I've made corrections for the mistakes pointed out. Thanks for the
suggestions everyone.

The main sticking point seems to be with the method of signalling the
restore behavior. It seems we can all agree with how the API should look
with the default option we are adding. I think keeping the option to load
directly from the topic into the store is a good idea. It is much more
performant and could make a simple metric collector processor much simpler.

I think something that Matthais said about creating a special class of
processors for the global stores helps me think about the issue. I tend to
fall into the category that we should keep global stores open to the
possibility of having child nodes in the future. I don't really see the
downside of having that as an option. It might not be best for a lot of
cases, but something simple could be very useful to put in the PAPI.

I like the idea of having a `GlobalStoreParameters` but only if we decide
to make the processor need to extend an interface like
'GobalStoreProcessor`. If not that seems excessive.

As of right now I don't see a better option than having a boolean flag for
the reprocessOnRestore option. I expanded the description in the docs so I
hope that helps.

I am more than willing to take other ideas on it.

thanks,
Walker

[jira] [Created] (KAFKA-16316) Make the restore behavior of GlobalKTables with custom processors configureable

2024-02-29 Thread Walker Carlson (Jira)

Walker Carlson created KAFKA-16316:
--

 Summary: Make the restore behavior of GlobalKTables with custom 
processors configureable
 Key: KAFKA-16316
 URL: https://issues.apache.org/jira/browse/KAFKA-16316
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Reporter: Walker Carlson
Assignee: Walker Carlson


Take the change implemented in https://issues.apache.org/jira/browse/KAFKA-7663 
and make it optional through adding a couple methods to the API



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[DISCUSS] KIP-1024: Make the restore behavior of GlobalKTables with custom processors configureable

2024-02-29 Thread Walker Carlson

Hello everybody,

I wanted to propose a change to our addGlobalStore methods so that the
restore behavior can be controlled on a preprocessor level. This should
help Kafka Stream users to better tune Global stores added with the
processor API to better fit their needs.

Details are in the kip here: https://cwiki.apache.org/confluence/x/E4t3EQ

Thanks,
Walker

Re: [VOTE] KIP-969: Support range interactive queries (IQv2) for versioned state stores

2023-12-13 Thread Walker Carlson

+1 binding

Thanks for the KIP Alieh!

On Mon, Dec 11, 2023 at 2:14 PM Alieh Saeedi 
wrote:

> Hi everyone,
>
> Thanks to everyone who has reviewed KIP-969, and participated in the
> discussion thread!
>
> I'd also like to thank you in advance for taking the time to vote.
>
> Cheers,
> Alieh
>

Re: [VOTE] KIP-985 Add reverseRange and reverseAll query over kv-store in IQv2

2023-10-17 Thread Walker Carlson

+1 (binding_

Thanks!

On Tue, Oct 17, 2023 at 3:22 AM Lucas Brutschy
 wrote:

> +1 (binding)
>
> Thanks for the KIP!
>
> On Tue, Oct 17, 2023 at 2:31 AM Matthias J. Sax  wrote:
> >
> > +1 (binding)
> >
> >
> > On 10/13/23 9:24 AM, Hanyu (Peter) Zheng wrote:
> > > Hello everyone,
> > >
> > > I would like to start a vote for KIP-985 that Add reverseRange and
> > > reverseAll query over kv-store in IQv2.
> > >
> > > Sincerely,
> > > Hanyu
> > >
> > > On Fri, Oct 13, 2023 at 9:15 AM Hanyu (Peter) Zheng <
> pzh...@confluent.io>
> > > wrote:
> > >
> > >>
> > >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-985:+Add+reverseRange+and+reverseAll+query+over+kv-store+in+IQv2
> > >>
> > >> --
> > >>
> > >> [image: Confluent] 
> > >> Hanyu (Peter) Zheng he/him/his
> > >> Software Engineer Intern
> > >> +1 (213) 431-7193 <+1+(213)+431-7193>
> > >> Follow us: [image: Blog]
> > >> <
> https://www.confluent.io/blog?utm_source=footer&utm_medium=email&utm_campaign=ch.email-signature_type.community_content.blog
> >[image:
> > >> Twitter] [image: LinkedIn]
> > >> [image: Slack]
> > >> [image: YouTube]
> > >> 
> > >>
> > >> [image: Try Confluent Cloud for Free]
> > >> <
> https://www.confluent.io/get-started?utm_campaign=tm.fm-apac_cd.inbound&utm_source=gmail&utm_medium=organic
> >
> > >>
> > >
> > >
>

Re: [VOTE] KIP-960: Support single-key_single-timestamp interactive queries (IQv2) for versioned state stores

2023-10-11 Thread Walker Carlson

+1 (binding)

Thanks for the kip Alieh!

Walker

On Wed, Oct 11, 2023 at 3:52 AM Bruno Cadonna  wrote:

> Thanks for the KIP, Alieh!
>
> +1 (binding)
>
> Best,
> Bruno
>
> On 10/10/23 1:14 AM, Matthias J. Sax wrote:
> > One more nit: as discussed on the related KIP-698 thread, we should not
> > use `get` as prefix for the getters.
> >
> > So it should be `K key()` and `Optional asOfTimestamp()`.
> >
> >
> > Otherwise the KIP LGTM.
> >
> >
> > +1 (binding)
> >
> >
> > -Matthias
> >
> > On 10/6/23 2:50 AM, Alieh Saeedi wrote:
> >> Hi everyone,
> >>
> >> Since KIP-960 is reduced to the simplest IQ type and all further
> comments
> >> are related to the following-up KIPs, I decided to finalize it at this
> >> point.
> >>
> >>
> >> A huge thank you to everyone who has reviewed this KIP (and also the
> >> following-up ones), and
> >> participated in the discussion thread!
> >>
> >> I'd also like to thank you in advance for taking the time to vote.
> >>
> >> Best,
> >> Alieh
> >>
>

Re: [DISCUSS] KIP-985 Add reverseRange and reverseAll query over kv-store in IQv2

2023-10-03 Thread Walker Carlson

Hello Hanyu,

Looking over your kip things mostly make sense but I have a couple of
comments.


   1. You have "withDescandingOrder()". I think you mean "descending" :)
   Also there are still a few places in the do where its called "setReverse"
   2. Also I like "WithDescendingKeys()" better
   3. I'm not sure of what ordering guarantees we are offering. Perhaps we
   can add a section to the motivation clearly spelling out the current
   ordering and the new offering?
   4. When you say "use unbounded reverseQuery to achieve reverseAll" do
   you mean "use unbounded RangeQuery to achieve reverseAll"? as far as I can
   tell we don't have a reverseQuery as a named object?


Looking good so far

best,
Walker

On Tue, Oct 3, 2023 at 2:13 PM Colt McNealy  wrote:

> Hello Hanyu,
>
> Thank you for the KIP. I agree with Matthias' proposal to keep the naming
> convention consistent with KIP-969. I favor the `.withDescendingKeys()`
> name.
>
> I am curious about one thing. RocksDB guarantees that records returned
> during a range scan are lexicographically ordered by the bytes of the keys
> (either ascending or descending order, as specified in the query). This
> means that results within a single partition are indeed ordered.** My
> reading of KIP-805 suggests to me that you don't need to specify the
> partition number you are querying in IQv2, which means that you can have a
> valid reversed RangeQuery over a store with "multiple partitions" in it.
>
> Currently, IQv1 does not guarantee order of keys in this scenario. Does
> IQv2 support ordering across partitions? Such an implementation would
> require opening a rocksdb range scan** on multiple rocksdb instances (one
> per partition), and polling the first key of each. Whether or not this is
> ordered, could we please add that to the documentation?
>
> **(How is this implemented/guaranteed in an `inMemoryKeyValueStore`? I
> don't know about that implementation).
>
> Colt McNealy
>
> *Founder, LittleHorse.dev*
>
>
> On Tue, Oct 3, 2023 at 1:35 PM Hanyu (Peter) Zheng
>  wrote:
>
> > ok, I will update it. Thank you  Matthias
> >
> > Sincerely,
> > Hanyu
> >
> > On Tue, Oct 3, 2023 at 11:23 AM Matthias J. Sax 
> wrote:
> >
> > > Thanks for the KIP Hanyu!
> > >
> > >
> > > I took a quick look and it think the proposal makes sense overall.
> > >
> > > A few comments about how to structure the KIP.
> > >
> > > As you propose to not add `ReverseRangQuery` class, the code example
> > > should go into "Rejected Alternatives" section, not in the "Proposed
> > > Changes" section.
> > >
> > > For the `RangeQuery` code example, please omit all existing methods
> etc,
> > > and only include what will be added/changed. This make it simpler to
> > > read the KIP.
> > >
> > >
> > > nit: typo
> > >
> > > >  the fault value is false
> > >
> > > Should be "the default value is false".
> > >
> > >
> > > Not sure if `setReverse()` is the best name. Maybe
> `withDescandingOrder`
> > > (or similar, I guess `withReverseOrder` would also work) might be
> > > better? Would be good to align to KIP-969 proposal that suggest do use
> > > `withDescendingKeys` methods for "reverse key-range"; if we go with
> > > `withReverseOrder` we should change KIP-969 accordingly.
> > >
> > > Curious to hear what others think about naming this consistently across
> > > both KIPs.
> > >
> > >
> > > -Matthias
> > >
> > >
> > > On 10/3/23 9:17 AM, Hanyu (Peter) Zheng wrote:
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-985%3A+Add+reverseRange+and+reverseAll+query+over+kv-store+in+IQv2
> > > >
> > >
> >
> >
> > --
> >
> > [image: Confluent] 
> > Hanyu (Peter) Zheng he/him/his
> > Software Engineer Intern
> > +1 (213) 431-7193 <+1+(213)+431-7193>
> > Follow us: [image: Blog]
> > <
> >
> https://www.confluent.io/blog?utm_source=footer&utm_medium=email&utm_campaign=ch.email-signature_type.community_content.blog
> > >[image:
> > Twitter] [image: LinkedIn]
> > [image: Slack]
> > [image: YouTube]
> > 
> >
> > [image: Try Confluent Cloud for Free]
> > <
> >
> https://www.confluent.io/get-started?utm_campaign=tm.fm-apac_cd.inbound&utm_source=gmail&utm_medium=organic
> > >
> >
>

Re: [DISCUSS] KIP-968: Support single-key_multi-timestamp interactive queries (IQv2) for versioned state stores

2023-10-03 Thread Walker Carlson

Hey Alieh thanks for the KIP,

Weighing in on the AsOf vs Until debate I think either is fine from a
natural language perspective. Personally AsOf makes more sense to me where
until gives me the idea that the query is making a change. It's totally a
connotative difference and not that important. I think as of is pretty
frequently used in point of time queries.

Also for these methods it makes sense to drop the "get" We don't
normally use that in getters

   * The key that was specified for this query.
   */
  public K getKey();

  /**
   * The starting time point of the query, if specified
   */
  public Optional getFromTimestamp();

  /**
   * The ending time point of the query, if specified
   */
  public Optional getAsOfTimestamp();

Other than that I didn't have too much to add. Overall I like the direction
of the KIP and think the funcatinlyt is all there!
best,
Walker



On Mon, Oct 2, 2023 at 10:46 PM Matthias J. Sax  wrote:

> Thanks for the updated KIP. Overall I like it.
>
> Victoria raises a very good point, and I personally tend to prefer (I
> believe so does Victoria, but it's not totally clear from her email) if
> a range query would not return any tombstones, ie, only two records in
> Victoria's example. Thus, it seems best to include a `validTo` ts-field
> to `VersionedRecord` -- otherwise, the retrieved result cannot be
> interpreted correctly.
>
> Not sure what others think about it.
>
> I would also be open to actually add a `includeDeletes()` (or
> `includeTombstones()`) method/flag (disabled by default) to allow users
> to get all tombstone: this would only be helpful if there are two
> consecutive tombstone though (if I got it right), so not sure if we want
> to add it or not -- it seems also possible to add it later if there is
> user demand for it, so it might be a premature addition as this point?
>
>
> Nit:
>
> > the public interface ValueIterator is used
>
> "is used" -> "is added" (otherwise it sounds like as if `ValueIterator`
> exist already)
>
>
>
> Should we also add a `.within(fromTs, toTs)` (or maybe some better
> name?) to allow specifying both bounds at once? The existing
> `RangeQuery` does the same for specifying the key-range, so might be
> good to add for time-range too?
>
>
>
> -Matthias
>
>
> On 9/6/23 5:01 AM, Bruno Cadonna wrote:
> > In my last e-mail I missed to finish a sentence.
> >
> > "I think from a KIP"
> >
> > should be
> >
> > "I think the KIP looks good!"
> >
> >
> > On 9/6/23 1:59 PM, Bruno Cadonna wrote:
> >> Hi Alieh,
> >>
> >> Thanks for the KIP!
> >>
> >> I think from a KIP
> >>
> >> 1.
> >> I propose to throw an IllegalArgumentException or an
> >> IllegalStateException for meaningless combinations. In any case, the
> >> KIP should specify what exception is thrown.
> >>
> >> 2.
> >> Why does not specifying a range return the latest version? I would
> >> expect that it returns all versions since an empty lower or upper
> >> limit is interpreted as no limit.
> >>
> >> 3.
> >> I second Matthias comment about replacing "asOf" with "until" or "to".
> >>
> >> 4.
> >> Do we need "allVersions()"? As I said above I would return all
> >> versions if no limits are specified. I think if we get rid of
> >> allVersions() there might not be any meaningless combinations anymore.
> >> If a user applies twice the same limit like for example
> >> MultiVersionedKeyQuery.with(key).from(t1).from(t2) the last one wins.
> >>
> >> 5.
> >> Could you add some more examples with time ranges to the example
> section?
> >>
> >> 6.
> >> The KIP misses the test plan section.
> >>
> >> 7.
> >> I propose to rename the class to "MultiVersionKeyQuery" since we are
> >> querying multiple versions of the same key.
> >>
> >> 8.
> >> Could you also add withAscendingTimestamps()? IMO it gives users the
> >> possibility to make their code more readable instead of only relying
> >> on the default.
> >>
> >> Best,
> >> Bruno
> >>
> >>
> >> On 8/17/23 4:13 AM, Matthias J. Sax wrote:
> >>> Thanks for splitting this part into a separate KIP!
> >>>
> >>> For `withKey()` we should be explicit that `null` is not allowed.
> >>>
> >>> (Looking into existing `KeyQuery` it seems the JavaDocs don't cover
> >>> this either -- would you like to do a tiny cleanup PR for this, or
> >>> fix on-the-side in one of your PRs?)
> >>>
> >>>
> >>>
>  The key query returns all the records that are valid in the time
>  range starting from the timestamp {@code fromTimestamp}.
> >>>
> >>> In the JavaDocs you use the phrase `are valid` -- I think we need to
> >>> explain what "valid" means? It might even be worth to add some
> >>> examples. It's annoying, but being precise if kinda important.
> >>>
> >>> With regard to KIP-962, should we allow `null` for time bounds ? The
> >>> JavaDocs should also be explicit if `null` is allowed or not and what
> >>> the semantics are if allowed.
> >>>
> >>>
> >>>
> >>> You are using `asOf()` however, because we are doing time-range
> >>> queries, to me using `until()` to des

Re: [ANNOUNCE] New committer: Lucas Brutschy

2023-09-21 Thread Walker Carlson

Congrats Lucas!

On Thu, Sep 21, 2023 at 11:42 AM Kamal Chandraprakash <
kamal.chandraprak...@gmail.com> wrote:

> Congrats Lucas!
>
> On Thu, Sep 21, 2023, 22:05 Boudjelda Mohamed Said 
> wrote:
>
> > Congratulations, Lucas!!
> >
> > On Thu 21 Sep 2023 at 18:34, Lianet M.  wrote:
> >
> > > Congratulations Lucas!
> > >
> > > On Thu, Sept 21, 2023, 11:45 a.m. Bruno Cadonna 
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > The PMC of Apache Kafka is pleased to announce a new Kafka committer
> > > > Lucas Brutschy.
> > > >
> > > > Lucas' major contributions are around Kafka Streams.
> > > >
> > > > Lucas' significantly contributed to the state updater
> > > > (https://issues.apache.org/jira/browse/KAFKA-10199) and he drives
> the
> > > > implementation of the new threading model for Kafka Streams
> > > > (https://issues.apache.org/jira/browse/KAFKA-15326).
> > > >
> > > > Lucas' contributions to KIP discussions and PR reviews are very
> > > thoughtful.
> > > >
> > > > Congratulations, Lucas!
> > > >
> > > > Thanks,
> > > >
> > > > Bruno (on behalf of the Apache Kafka PMC)
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-960: Support interactive queries (IQv2) for versioned state stores

2023-09-11 Thread Walker Carlson

Thanks for the KIP Alieh!

I don't have anything to add to the 960 discussion right now as it seems
rather straightforward. I think after you address Bruno's comments we can
bring it to a vote. I'll review the two spawned KIPs separately.

Keep it up,
Walker

On Wed, Sep 6, 2023 at 5:11 AM Bruno Cadonna  wrote:

> Hi Alieh,
>
> I am sorry if I might repeat things that have been already said since I
> am not sure I got all e-mails of this discussion thread.
>
> The KIP looks good!
>
> I just have two minor comments that I think are easily resolved.
>
> 1.
> Why is defining latest() not needed? Is it because if I do not use
> asOf() I get the latest value?
>
> For example,
>
> final VersionedKeyQuery query =
> VersionedKeyQuery.withKey(1);
>
> will return the latest version, right?
>
> If so, that should be explicitly stated in the KIP and in the javadocs.
>
> I assume, you wanted to say exactly that with
>
> "Defining the latest() method is not needed since returning the latest
> value has been always the default assumption."
>
> I would propose to write something like:
>
> "If a query is created without calling asOf() the query returns the
> latest value of the key"
>
> Adding one example in the example section for this case would also help.
>
>
> 2.
> The KIP misses the test plan section.
>
>
> Best,
> Bruno
>
> On 8/25/23 1:02 PM, Alieh Saeedi wrote:
> > Thank you, Matthias and Victoria.
> >
> > Regarding the problem of using methods of single-key-single-ts queries
> for
> > KeyQuery (such as asOf) and vice versa (such as skipCache()), after a
> > discussion, we decided to define a separate class for
> single-key-single-ts
> > queries named VersionedKeyQuery. Subsequently, the
> > single-key-multi-timestamp queries (KIP-968) and range queries (KIP-969)
> > will be covered by the MultiVersionedKeyQuery and
> MultiVersionedRangeQuery
> > classes, respectively.
> > I think the VersionedKeyQuery is type-safe since if an instance of the
> > VersionedKeyQuery is posed to a normal (non-versioned) state store, we
> will
> > have the defined Kafka Streams UNKNOWN_QUERY_TYPE failure.
> >
> > P.S.: The example should be correct now.
> >
> > Cheers,
> > Alieh
> >
> > On Thu, Aug 24, 2023 at 9:34 PM Victoria Xia
> 
> > wrote:
> >
> >>   Hi Alieh,
> >>
> >> Thanks for the updates!
> >> Some questions on the new limited-scope KIP:
> >> 1. The example in the "Examples" section shows the query type as
> >> `KeyQuery>` and the result type as
> >> `StateQueryResult>`. Should those have
> >> `VersionedRecord` instead of `ValueAndTimestamp`? Also, the request
> type is
> >> currently `StateQueryRequest>>`.
> >> Should the `ValueIterator` no longer be present, now that we are only
> >> returning a single record?
> >> 2. Related to Matthias's question about what happens if `asOf()` is set
> >> for a KeyQuery to a non-versioned store, what happens if `skipCache()`
> is
> >> set for a versioned store? And what will `isSkipCache()` return?
> Versioned
> >> stores do not support caching (at least today). I think for consistency
> we
> >> have to let `isSkipCache()` still default to false if `isSkipCache()` is
> >> not set. I think that's fine, as long as it's clear to users (e.g., from
> >> docs) that `isSkipCache()` is not relevant for queries to versioned
> stores.
> >> And some responses to your comments from earlier:
> >>> I changed the VersionedRecord such that it can have NULL values as
> well.
> >> The question is, what was the initial intention of setting the value in
> >> VersionedRecord as NOT NULL?
> >> We can discuss more on your other KIPs (KIP-968 and KIP-969) since this
> >> change should only be relevant for those KIPs and not this one, but the
> >> short answer is that today there's no situation in which VersionedRecord
> >> would need to return a null value because if a get(key) or get(key,
> >> asOfTimestamp) call to a versioned store were to find a null record
> (i.e.,
> >> tombstone), then a null object is returned, rather than a non-null
> >> VersionedRecord with null value. In other words, versioned stores do not
> >> distinguish between a tombstone having been inserted versus no record
> ever
> >> having been inserted.
> >>> About defining new methods in the VersionedKeyValueStore interface: I
> >> actually have defined the required methods in the RocksDBVersionedStore
> >> class. Since defining them for the interface requires implementing them
> for
> >> all the classes that have implemented the interface.
> >> Again a discussion for your other KIPs, but I think you'll want to
> define
> >> the new method(s) in the VersionedKeyValueStore interface directly
> (rather
> >> than only in individual implementations such as RocksDBVersionedStore),
> >> otherwise your new interactive query types will throw NPEs for custom
> store
> >> implementations which do not support the new methods.
> >> Best,VictoriaOn Thursday, August 17, 2023 at 07:25:22 AM EDT, Alieh
> >> Saeedi  wrote:
> >>
> >>   Hey Matthias,
> >>

[jira] [Resolved] (KAFKA-14936) Add Grace Period To Stream Table Join

2023-09-04 Thread Walker Carlson (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-14936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walker Carlson resolved KAFKA-14936.

Resolution: Done

> Add Grace Period To Stream Table Join
> -
>
> Key: KAFKA-14936
> URL: https://issues.apache.org/jira/browse/KAFKA-14936
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>        Reporter: Walker Carlson
>    Assignee: Walker Carlson
>Priority: Major
>  Labels: kip, streams
> Fix For: 3.6.0
>
>
> Include the grace period for stream table joins as described in kip 923.
> Also add a rocksDB time based queueing implementation of 
> `TimeOrderedKeyValueBuffer`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (KAFKA-15379) Add option for Grace period Joins to disable changelog creation

2023-08-18 Thread Walker Carlson (Jira)

Walker Carlson created KAFKA-15379:
--

 Summary: Add option for Grace period Joins to disable changelog 
creation 
 Key: KAFKA-15379
 URL: https://issues.apache.org/jira/browse/KAFKA-15379
 Project: Kafka
  Issue Type: New Feature
Reporter: Walker Carlson


Right now if you are preforming a buffered join with a grace period there is no 
way to avoid the creation of a changelog



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [VOTE] KIP-759: Unneeded repartition canceling

2023-08-01 Thread Walker Carlson

+1 (binding)

On Mon, Jul 31, 2023 at 10:43 PM Matthias J. Sax  wrote:

> +1 (binding)
>
> On 7/11/23 11:16 AM, Shay Lin wrote:
> > Hi all,
> >
> > I'd like to call a vote on KIP-759: Unneeded repartition canceling
> > The KIP has been under discussion for quite some time(two years). This
> is a
> > valuable optimization for advanced users. I hope we can push this toward
> > the finish line this time.
> >
> > Link to the KIP:
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-759%3A+Unneeded+repartition+canceling
> >
> > Best,
> > Shay
> >
>

Re: [VOTE] KIP-923: Add A Grace Period to Stream Table Join

2023-06-06 Thread Walker Carlson

Hi all,

This vote thread has been open over 72 hours, and has sufficient votes so
I'll close the voting at this time.

+4 binding votes
+2 non-binding votes

KIP-923 has PASSED.

Thanks all for your votes
Walker

On Tue, Jun 6, 2023 at 8:13 AM John Roesler  wrote:

> Thanks for the KIP, Walker!
>
> I’m +1 (binding)
>
> Thanks,
> John
>
> On Mon, Jun 5, 2023, at 13:39, Victoria Xia wrote:
> > Hi Walker,
> >
> > Thanks for the KIP! Left a clarification question on the discussion
> thread
> > just now but it's about an implementation detail, so I don't think it
> > changes anything in this vote thread.
> >
> > +1 (non-binding)
> >
> > Cheers,
> > Victoria
> >
> > On Mon, Jun 5, 2023 at 10:23 AM Bill Bejeck  wrote:
> >
> >> Hi Walker,
> >>
> >> Thanks for the KIP.
> >>
> >> I've caught up on the discussion thread and I'm satisfied with all
> >> responses.
> >>
> >> +1(binding)
> >>
> >> -Bill
> >>
> >> On Mon, Jun 5, 2023 at 10:20 AM Bruno Cadonna 
> wrote:
> >>
> >> > Hi Walker,
> >> >
> >> > Thank you for the KIP!
> >> >
> >> > +1 (binding)
> >> >
> >> > Best,
> >> > Bruno
> >> >
> >> > On 24.05.23 23:00, Walker Carlson wrote:
> >> > > Hello everybody,
> >> > >
> >> > > I'm opening the vote on KIP-923 here
> >> > > <https://cwiki.apache.org/confluence/x/lAs0Dw>.
> >> > >
> >> > > If we have more to discus please continue the discussion on the
> >> existing
> >> > > thread
> >> https://www.mail-archive.com/dev@kafka.apache.org/msg130657.html
> >> > >
> >> > > best,
> >> > > Walker
> >> > >
> >> >
> >>
>

Re: [DISCUSS] KIP-923: Add A Grace Period to Stream Table Join

2023-06-06 Thread Walker Carlson

Good Point Victoria. I just removed the compacted topic mention from the
KIP. I agree with Burno about using a normal topic and deleting records
that have been processed.

On Tue, Jun 6, 2023 at 2:28 AM Bruno Cadonna  wrote:

> Hi,
>
> another idea that came to my mind. Instead of using a compacted topic,
> the buffer could use a non-compacted topic and regularly delete records
> before a given offset as Streams does for repartition topics.
>
> Best,
> Bruno
>
> On 05.06.23 21:48, Bruno Cadonna wrote:
> > Hi Victoria,
> >
> > that is a good point!
> >
> > I think, the topic needs to be a compacted topic to be able to get rid
> > of records that are evicted from the buffer. So the key might be
> > something with the key, the timestamp, and a sequence number to
> > distinguish between records with the same key and same timestamp.
> >
> > Just an idea! Maybe Walker comes up with something better.
> >
> > Best,
> > Bruno
> >
> > On 05.06.23 20:38, Victoria Xia wrote:
> >> Hi Walker,
> >>
> >> Thanks for the latest updates! The KIP looks great. Just one question
> >> about
> >> the changelog topic for the join buffer: The KIP says "When a failure
> >> occurs the buffer will try to recover from an OffsetCheckpoint if
> >> possible.
> >> If not it will reload the buffer from a compacted change-log topic."
> This
> >> is a new changelog topic that will be introduced specifically for the
> >> join
> >> buffer, right? Why is the changelog topic compacted? What are the keys?
> I
> >> am confused because the buffer contains records from the stream-side
> >> of the
> >> join, for which multiple records with the same key should be treated as
> >> separate updates will all must be tracked in the buffer, rather than
> >> updates which replace each other.
> >>
> >> Thanks,
> >> Victoria
> >>
> >> On Mon, Jun 5, 2023 at 1:47 AM Bruno Cadonna 
> wrote:
> >>
> >>> Hi Walker,
> >>>
> >>> Thanks once more for the updates to the KIP!
> >>>
> >>> Do you also plan to expose metrics for the buffer?
> >>>
> >>> Best,
> >>> Bruno
> >>>
> >>> On 02.06.23 17:16, Walker Carlson wrote:
> >>>> Hello Bruno,
> >>>>
> >>>> I think this covers your questions. Let me know what you think
> >>>>
> >>>> 2.
> >>>> We can use a changelog topic. I think we can treat it like any other
> >>> store
> >>>> and recover in the usual manner. Also implementation is on disk
> >>>>
> >>>> 3.
> >>>> The description is in the public interfaces description. I will copy
> it
> >>>> into the proposed changes as well.
> >>>>
> >>>> This is a bit of an implementation detail that I didn't want to add
> >>>> into
> >>>> the kip, but the record will be added to the buffer to keep the stream
> >>> time
> >>>> consistent, it will just be ejected immediately. If of course if this
> >>>> causes performance issues we will skip this step and track stream time
> >>>> separately. I will update the kip to say that stream time advances
> >>>> when a
> >>>> stream record enters the node.
> >>>>
> >>>> Also, yes, updated.
> >>>>
> >>>> 5.
> >>>> No there is no difference right now, everything gets processed as it
> >>> comes
> >>>> in and tries to find a record for its time stamp.
> >>>>
> >>>> Walker
> >>>>
> >>>> On Fri, Jun 2, 2023 at 6:41 AM Bruno Cadonna 
> >>>> wrote:
> >>>>
> >>>>> Hi Walker,
> >>>>>
> >>>>> Thanks for the updates!
> >>>>>
> >>>>> 2.
> >>>>> It is still not clear to me how a failure is handled. I do not
> >>>>> understand what you mean by "recover from an OffsetCheckpoint".
> >>>>>
> >>>>> My understanding is that the buffer needs to be replicated into its
> >>>>> own
> >>>>> Kafka topic. The input topic is not enough. The offset of a record is
> >>>>> added to the offsets to commit once the record is streamed through
> the
> >>>>> subtopology. Th

Re: [VOTE] KIP-925: rack aware task assignment in Kafka Streams

2023-06-05 Thread Walker Carlson

+1 (binding)

On Mon, Jun 5, 2023 at 3:14 AM Bruno Cadonna  wrote:

> Hi Hao,
>
> +1 (binding)
>
> Thanks!
> Bruno
>
> On 30.05.23 21:16, Colt McNealy wrote:
> > +1 (non-binding)
> >
> > Thank you Hao!
> >
> > Colt McNealy
> >
> > *Founder, LittleHorse.dev*
> >
> >
> > On Tue, May 30, 2023 at 9:50 AM Hao Li  wrote:
> >
> >> Hi all,
> >>
> >> I'd like to open the vote for KIP-925: rack aware task assignment in
> Kafka
> >> Streams. The link for the KIP is
> >>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-925%3A+Rack+aware+task+assignment+in+Kafka+Streams
> >> .
> >>
> >> --
> >> Thanks,
> >> Hao
> >>
> >
>

Re: [DISCUSS] KIP-923: Add A Grace Period to Stream Table Join

2023-06-02 Thread Walker Carlson

Hello Bruno,

I think this covers your questions. Let me know what you think

2.
We can use a changelog topic. I think we can treat it like any other store
and recover in the usual manner. Also implementation is on disk

3.
The description is in the public interfaces description. I will copy it
into the proposed changes as well.

This is a bit of an implementation detail that I didn't want to add into
the kip, but the record will be added to the buffer to keep the stream time
consistent, it will just be ejected immediately. If of course if this
causes performance issues we will skip this step and track stream time
separately. I will update the kip to say that stream time advances when a
stream record enters the node.

Also, yes, updated.

5.
No there is no difference right now, everything gets processed as it comes
in and tries to find a record for its time stamp.

Walker

On Fri, Jun 2, 2023 at 6:41 AM Bruno Cadonna  wrote:

> Hi Walker,
>
> Thanks for the updates!
>
> 2.
> It is still not clear to me how a failure is handled. I do not
> understand what you mean by "recover from an OffsetCheckpoint".
>
> My understanding is that the buffer needs to be replicated into its own
> Kafka topic. The input topic is not enough. The offset of a record is
> added to the offsets to commit once the record is streamed through the
> subtopology. That means once the record is added to the buffer its
> offset is added to the offsets to commit -- independently of whether the
> record was evicted from the buffer and sent to the join node or not.
> Now, let's assume the following scenario
> 1. a record is read from the input topic and added to the buffer, but
> not evicted to be processed by the join node.
> 2. When the processing of the subtopology finishes the offset of the
> record is added to the offsets to commit.
> 3. A commit happens.
> 4. A failure happens
>
> After the failure the buffer is empty but the record will not be read
> anymore from the input topic since its offset has been already
> committed. The record is lost.
> One solution to avoid the loss is to recreate the buffer from a
> compacted Kafka topic as we do for suppression buffers. I do not think,
> we need any offset checkpoint here since, we keep the buffer in memory,
> right? Or do you plan to back the buffer with a persistent store? Even
> in that case, a compacted Kafka topic would be needed.
>
>
> 3.
>  From the KIP it is still not clear to me what happens if a record is
> outside of the grace period. I guess the record that falls outside of
> the grace period will not be added to the buffer, but will be send to
> the join node. Since it is outside of the grace period it will also not
> increase stream time and it will not trigger an eviction. Also the head
> of the buffer will not contain a record that needs to be evicted since
> the the timestamp of the head record will be within the interval stream
> time minus grace period. Is this correct? Please add such a description
> to the KIP.
> Furthermore, I think there is a mistake in the text:
> "... will dequeue when the record timestamp is greater than stream time
> plus the grace period". I guess that should be "... will dequeue when
> the record timestamp is less than (or equal?) stream time minus the
> grace period"
>
>
> 5.
> What is the difference between not setting the grace period and setting
> it to zero? If there is a difference, why is there a difference?
>
>
> Best,
> Bruno
>
>
> On 01.06.23 23:58, Walker Carlson wrote:
> > Hey Bruno thanks for the feedback.
> >
> > 1)
> > I will add this to the kip, but stream time only advances as the when the
> > buffer receives a new record.
> >
> > 2)
> > You are correct, I will add a failure section on to the kip. Since the
> > records wont change in the buffer from when they are read from the topic
> > they are replicated already.
> >
> > 3)
> > I see that I'm out voted on the dropping of records thing. We will pass
> > them on and try to join them if possible. This might cause some null
> > results, but increasing the table history retention should help that.
> >
> > 4)
> > I can add some on the kip. But its pretty directly adding whatever the
> > grace period is to the latency. I don't see a way around it.
> >
> > Walker
> >
> > On Thu, Jun 1, 2023 at 5:23 AM Bruno Cadonna  wrote:
> >
> >> Hi Walker,
> >>
> >> thanks for the KIP!
> >>
> >> Here my feedback:
> >>
> >> 1.
> >> It is still not clear to me when stream time for the buffer advances.
> >> What is the event that let the stream time

Re: [DISCUSS] KIP-923: Add A Grace Period to Stream Table Join

2023-06-01 Thread Walker Carlson

Hey Bruno thanks for the feedback.

1)
I will add this to the kip, but stream time only advances as the when the
buffer receives a new record.

2)
You are correct, I will add a failure section on to the kip. Since the
records wont change in the buffer from when they are read from the topic
they are replicated already.

3)
I see that I'm out voted on the dropping of records thing. We will pass
them on and try to join them if possible. This might cause some null
results, but increasing the table history retention should help that.

4)
I can add some on the kip. But its pretty directly adding whatever the
grace period is to the latency. I don't see a way around it.

Walker

On Thu, Jun 1, 2023 at 5:23 AM Bruno Cadonna  wrote:

> Hi Walker,
>
> thanks for the KIP!
>
> Here my feedback:
>
> 1.
> It is still not clear to me when stream time for the buffer advances.
> What is the event that let the stream time advance? In the discussion, I
> do not understand what you mean by "The segment store already has an
> observed stream time, we advance based on that. That should only advance
> based on records that enter the store." Where does this segment store
> come from? Anyways, I think it would be great to also state how stream
> time advances in the KIP.
>
> 2.
> How does the buffer behave in case of a failure? I think I understand
> that the buffer will use an implementation of TimeOrderedKeyValueBuffer
> and therefore the records in the buffer will be replicated to a topic in
> Kafka, but I am not completely sure. Could you elaborate on this in the
> KIP?
>
> 3.
> I agree with Matthias about dropping late records. We use grace periods
> in scenarios where we records are grouped like in windowed aggregations
> and windowed joins. The stream buffer you propose does not really group
> any records. It rather delays records and reorders them. I am not sure
> if grace period is the right naming/concept to apply here. Instead of
> dropping records that fall outside of the buffer's time interval the
> join should skip the buffer and try to join the record immediately. In
> the end, a stream-table join is a unwindowed join, i.e., no grouping is
> applied to the records.
> What do you and other folks think about this proposal?
>
> 4.
> How does the proposed buffer, affects processing latency? Could you
> please add some words about this to the KIP?
>
>
> Best,
> Bruno
>
>
>
>
> On 31.05.23 01:49, Walker Carlson wrote:
> > Thanks for all the additional comments. I will either address them here
> or
> > update the kip accordingly.
> >
> >
> > I mentioned a follow kip to add extra features before and in the
> responses.
> > I will try to briefly summarize what options and optimizations I plan to
> > include. If a concern is not covered in this list I for sure talk about
> it
> > below.
> >
> > * Allowing non versioned tables to still use the stream buffer
> > * Automatically materializing tables instead of forcing the user to do it
> > * Configurable for in memory buffer
> > * Order the records in offset order or in time order
> > * Non memory use buffer (offset order, delayed pull from stream.)
> > * Time synced between stream and table side (maybe)
> > * Do not drop late records and process them as they come in instead.
> >
> >
> > First, Victoria.
> >
> > 1) (One of your nits covers this, but you are correct it doesn't make
> > sense. so I removed that part of the example.)
> > For those examples with the "bad" join results I said without buffering
> the
> > stream it would look like that, but that was incomplete. If the look up
> was
> > simply looking at the latest version of the table when the stream records
> > came in then the results were possible. If we are using the point in time
> > lookup that versioned tables let us then you are correct the future
> results
> > are not possible.
> >
> > 2) I'll get to this later as Matthias brought up something related.
> >
> > To your additional thoughts, I agree that we need to call those things
> out
> > in the documentation. I'm writing up a follow up kip with a lot of the
> > ideas we have discussed so that we can improve this feature beyond the
> base
> > implementation if it's needed.
> >
> > I addressed the nits in the kip. I somehow missed the table stream table
> > join processor improvement, it makes your first question make a lot more
> > sense.  Table history retention is a much cleaner way to describe it.
> >
> > As to your mention of the syncing the time for the table and stream.
> > Matthias mentioned

Re: [DISCUSS] KIP-925: rack aware task assignment in Kafka Streams

2023-05-31 Thread Walker Carlson

Hi Hao,

Most of the comments I had on this kip are already mentioned, but I did
want to share my two major concerns.

1. Stability. I worry about stability. If we only have the HA assignor work
with rack awareness we will have a lot of state movement in many cases.
Sophie and Bruno have this concern as well.

2. It seems the rack awareness assignment operation can be run after any
assignment algorithm. I would think that maybe we can leave it agnostic if
it is using the sicky assignor or the HA assignor and let the users choose
the strategy. Maybe just have the rack awareness be off or on,
independent of the assignment strategy.

Walker

On Wed, May 31, 2023 at 7:46 AM Bruno Cadonna  wrote:

> Hi Hao,
>
>
> Thank you for the KIP! Really interesting!
>
> In general, I think the KIP is a bit too vague. You explain the main
> algorithm and different options. It is not clear to me on what option we
> will start voting. One way out of this situation would be to cut the KIP
> down to the simplest options and evaluate those. Then, we would have a
> starting point from which we can move on.
>
>
> 1.
> Mention that the KIP optimizes only the read path and does nothing about
> the write path.
>
>
> 2.
> "U is 1 and C is at most 3 (A task can have at most 3 topics including
> changelog topic?)"
> where C is max cost and U is max capacity
>
> C is not at most 3 to answer the question in the KIP. A task can have
> any number of topics it reads from. Some examples:
> - a processor API operator with multiple state stores reads from
> multiple changelog topics
> - a cascade of merge operators would result in a task with multiple
> input topics
> - a cascade of joins would result in a task with multiple input topics
> and multiple changelog topics.
> I am pretty sure there are also other examples.
>
> Assuming cost C is 1 for each topic partition is a simplification.
> Traffic for tasks can vary significantly. I saw joins that had 100s of
> bytes/s on one side and 10s of MB/s on the other side. I guess the
> cross-rack traffic depends on the data rate. Please correct me if I am
> wrong. I am fine with simplifying but then we also need to explicitly
> state the simplification and its limitation in the KIP to manage
> expectations.
>
> C is just a factor in the complexity but we should be clear about the
> simplification we made. How much C influences the actual performance we
> do not know and we should evaluate this as part of the implementation of
> the KIP. Maybe add this aspect to the performance experiments in the
> test plan section.
>
>
> 3.
> I second Sophie's question about the complexity being O(T^2 * N^2)
> instead of O(T*N).
>
>
> 4.
> The improved algorithm in "Min cost with balanced sub-topology" contains
> a bunch more edges and the complexity of the algorithm depends on the
> square of the number of edges. Can you say something about the trade-off
> or even quantify it? How does does the complexity change from O(T^2 * N^2)?
>
>
> 5.
> If you propose to implement multiple algorithms, the KIP should add
> public configs as Sophie proposed.
>
>
> 6.
> Does any of the algorithm change the subscription protocol? Usually we
> describe those changes in a KIP.
>
>
> 7.
> I have a couple of minor comment about notation:
>
> 7.1 For the complexity, I think it would be better to either use |T| and
> |C| or define new variables like for example N_task = |T| and N_client =
> |C| for the formula to be consistent with the mapping function you
> define in the previous section.
>
> 7.2 Using C for the set of clients and the cost is confusing. Maybe use
> Cost or $ for cost.
>
>
> 8.
> In the "Graph construction" section in the "Min cost with balanced
> sub-topology" section, you write
> "Create new set of nodes which has same number as clients"
> Shouldn't this be number of tasks.
>
>
> 9.
> Regarding standby assignment, have you considered to simplify the setup
> by defining if rack-aware configs are set, the standby assignment is
> optimized for reliability and if they are not set costs are optimized. I
> think that would be a good starting point on which we can iterate in
> future.
>
>
> 10.
> Just a clarification and something that you should corrected in the KIP.
> In "Assignment of stateless tasks" you contrast stateless tasks with
> active task. However, active tasks can be stateless or stateful. So
> "Rack awareness assignment algorithm for active tasks" should actually
> be "Rack awareness assignment algorithm for active stateful tasks".
> Please use the terms accordingly otherwise, it gets confusing.
>
>
> 11.
> In the testing plan, I think it would be useful to also have performance
> experiments along other dimension like number of topic partitions a task
> reads from, i.e., basically the varying costs per task.
>
>
> Best,
> Bruno
>
> On 30.05.23 23:28, Sophie Blee-Goldman wrote:
> > Hey Hao, thanks for the KIP!
> >
> > 1. There's a typo in the "internal.rack.aware.assignment.strategry"
> config,
> > this
> > s

Re: [DISCUSS] KIP-923: Add A Grace Period to Stream Table Join

2023-05-30 Thread Walker Carlson

 records from the stream-side will be reordered, and
> late
> >> records will be dropped.
> >>
> >> I don't think any of these are reasons to not go forward with this KIP,
> but
> >> it'd be good to call them out in the eventual documentation to decrease
> the
> >> chance users get tripped up.
> >>
> >>> We could maybe do an improvement later to advance stream time from
> table
> >> side as well, but that might be debatable as we might get more late
> records.
> >>
> >> Yes, the likelihood of late records increases but also the likelihood of
> >> "join misses" due to versioned store history retention having elapsed
> >> decreases, which feels important for certain use cases. Either way,
> agreed
> >> that it can be a discussion for the future as incorporating this would
> >> substantially complicate the implementation.
> >>
> >> Also a couple nits:
> >>
> >> - The KIP currently says "We recently added versioned tables which
> allow
> >> the table side of the a join [...] but it is not taken advantage of
> in
> >> joins," but this doesn't seem true? If the table of a stream-table
> join is
> >> versioned, then the DSL's stream-table join processor will
> automatically
> >> perform timestamped lookups into the table, in order to take
> advantage of
> >> the new timestamp-aware store to provide better join semantics.
> >> - The KIP mentions "grace period" for versioned stores in a number
> of
> >> places but I think you actually mean "history retention"? The two
> happen to
> >> be the same today (it is not an option for users to configure the
> two
> >> separately) but this need not be true in the future. "History
> retention"
> >> governs how far back in time reads may occur, which is the relevant
> >> parameter for performing lookups as part of the stream-table join.
> "Grace
> >> period" in the context of versioned stores refers to how far back
> in time
> >> out-of-order writes may occur, which probably isn't directly
> relevant for
> >> introducing a stream-side buffer, though it's also possible I've
> overlooked
> >> something. (As a bonus, switching from "table grace period" in the
> KIP to
> >> "table history retention" also helps to clarify/distinguish that
> it's a
> >> different parameter from the "join grace period," which I could see
> being
> >> confusing to readers. :) )
> >>
> >>
> >> Cheers,
> >> Victoria
> >>
> >> On Thu, May 18, 2023 at 1:43 PM Walker Carlson
> >>  wrote:
> >>
> >>> Hey all,
> >>>
> >>> Thanks for the comments, they gave me a lot to think about. I'll try to
> >>> address them all inorder. I have made some updates to the kip related
> to
> >>> them, but I mention where below.
> >>>
> >>> Lucas
> >>>
> >>> Good idea about the example. I added a simple one.
> >>>
> >>> 1) I have thought about including options for the underlying buffer
> >>> configuration. One of which might be adding an in memory option. My
> biggest
> >>> concern is about the semantic guarantees. This isn't like suppress or
> with
> >>> windows where producing incomplete results is repetitively harmless.
> Here
> >>> we would be possibly producing incorrect results. I also would like to
> keep
> >>> the interface changes as simple as I can. Making more than this change
> to
> >>> Joined I feel could make this more complicated than it needs to be. If
> we
> >>> really want to I could see adding a grace() option with a BufferConifg
> in
> >>> there or something, but I would rather not.
> >>>
> >>> 2) The buffer will be independent of if the table is versioned or not.
> If
> >>> table is not materialized it will materialize it as versioned. It might
> >>> make sense to do a follow up kip where we force the retention period
> of
> >>> the versioned to be greater than whatever the max of the stream buffer
> is.
> >>>
> >>> Victoria
> >>>
> >>> 1) Yes, records will exit in timestamp order not in offset order.
> >>> 2) Late records will be dropped (Late as out of th

[VOTE] KIP-923: Add A Grace Period to Stream Table Join

2023-05-24 Thread Walker Carlson

Hello everybody,

I'm opening the vote on KIP-923 here
.

If we have more to discus please continue the discussion on the existing
thread https://www.mail-archive.com/dev@kafka.apache.org/msg130657.html

best,
Walker

Re: [DISCUSS] KIP-923: Add A Grace Period to Stream Table Join

2023-05-18 Thread Walker Carlson

;, we could easily preserve offset-order.
> But I also see a benefit of re-ordering and emitting out-of-order data
> right away when read (instead of blocking them behind in-order records
> that are not ready yet). -- It might even be a possibility, to let users
> pick a emit strategy eg "EmitStrategy.preserveOffsets" (name just a
> placeholder).
>
> The KIP should explain this in more detail and also discuss different
> options and mention them in "Rejected alternatives" in case we don't
> want to include them.
>
>
> 50) What happens when users change the grace period? Especially, when
> they turn it on/off (but also increasing/decreasing is an interesting
> point)? I think we should try to support this if possible; the
> "Compatibility" section needs to cover switching on/off in more detail.
>
>
> -Matthias
>
>
>
>
> On 5/2/23 2:06 PM, Victoria Xia wrote:
> > Cool KIP, Walker! Thanks for sharing this proposal.
> >
> > A few clarifications:
> >
> > 1. Is the order that records exit the buffer in necessarily the same as
> the
> > order that records enter the buffer in, or no? Based on the description
> in
> > the KIP, it sounds like the answer is no, i.e., records will exit the
> > buffer in increasing timestamp order, which means that they may be
> ordered
> > (even for the same key) compared to the input order.
> >
> > 2. What happens if the join grace period is nonzero, and a stream-side
> > record arrives with a timestamp that is older than the current stream
> time
> > minus the grace period? Will this record trigger a join result, or will
> it
> > be dropped? Based on the description for what happens when the join grace
> > period is set to zero, it sounds like the late record will be dropped,
> even
> > if the join grace period is nonzero. Is that true?
> >
> > 3. What could cause stream time to advance, for purposes of removing
> > records from the join buffer? For example, will new records arriving on
> the
> > table side of the join cause stream time to advance? From the KIP it
> sounds
> > like only stream-side records will advance stream time -- does that mean
> > that the join processor itself will have to track this stream time?
> >
> > Also +1 to Lucas's question about what options will be available for
> > configuring the join buffer. Will users have the option to choose whether
> > they want the buffer to be in-memory vs persistent?
> >
> > - Victoria
> >
> > On Fri, Apr 28, 2023 at 11:54 AM Lucas Brutschy
> >  wrote:
> >
> >> HI Walker,
> >>
> >> thanks for the KIP! We definitely need this. I have two questions:
> >>
> >>   - Have you considered allowing the customization of the underlying
> >> buffer implementation? As I can see, `StreamJoined` lets you customize
> >> the underlying store via a `WindowStoreSupplier`. Would it make sense
> >> for `Joined` to have this as well? I can imagine one may want to limit
> >> the number of records in the buffer, for example. If we hit the
> >> maximum, the only option would be to drop semantic guarantees, but
> >> users may still want to do this.
> >>   - With "second option on the table side" you are referring to
> >> versioned tables, right? Will the buffer on the stream side behave any
> >> different whether the table side is versioned or not?
> >>
> >> Finally, I think a simple example in the motivation section could help
> >> non-experts understand the KIP.
> >>
> >> Best,
> >> Lucas
> >>
> >> On Tue, Apr 25, 2023 at 9:13 PM Walker Carlson
> >>  wrote:
> >>>
> >>> Hello everybody,
> >>>
> >>> I have a stream proposal to improve the stream table join by adding a
> >> grace
> >>> period and buffer to the stream side of the join to allow processing in
> >>> timestamp order matching the recent improvements of the versioned
> tables.
> >>>
> >>> Please take a look here <https://cwiki.apache.org/confluence/x/lAs0Dw>
> >> and
> >>> share your thoughts.
> >>>
> >>> best,
> >>> Walker
> >>
> >
>

[DISCUSS] KIP-923: Add A Grace Period to Stream Table Join

2023-04-25 Thread Walker Carlson

Hello everybody,

I have a stream proposal to improve the stream table join by adding a grace
period and buffer to the stream side of the join to allow processing in
timestamp order matching the recent improvements of the versioned tables.

Please take a look here  and
share your thoughts.

best,
Walker

[jira] [Created] (KAFKA-14936) Add Grace Period To Stream Table Join

2023-04-25 Thread Walker Carlson (Jira)

Walker Carlson created KAFKA-14936:
--

 Summary: Add Grace Period To Stream Table Join
 Key: KAFKA-14936
 URL: https://issues.apache.org/jira/browse/KAFKA-14936
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Reporter: Walker Carlson
Assignee: Walker Carlson


Include the grace period for stream table joins as described in kip 923.

Also add a rocksDB time based queueing implementation of 
`TimeOrderedKeyValueBuffer`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [VOTE] KIP-907: Add Boolean Serde to public interface

2023-03-01 Thread Walker Carlson

+1 Binding

On Mon, Feb 27, 2023 at 1:46 PM Chia-Ping Tsai  wrote:

> +1 (binding)
>

Re: [VOTE] KIP-904: Kafka Streams - Guarantee subtractor is called before adder if key has not changed

2023-03-01 Thread Walker Carlson

+1 Binding

On Mon, Feb 27, 2023 at 12:48 PM Guozhang Wang 
wrote:

> +1.
>
> On Sun, Feb 26, 2023 at 4:27 PM Fq Public  wrote:
> >
> > Hi everyone,
> >
> > I'd like to start the vote on KIP-904: Kafka Streams - Guarantee
> subtractor
> > is called before adder if key has not changed.
> > The KIP is available here: https://cwiki.apache.org/confluence/x/P5VbDg
> > The easiest way to view the entire discussion thread is via this search
> > link: https://lists.apache.org/list?dev@kafka.apache.org:lte=1M:KIP-904
> > Please take a look and vote.
> >
> > Thank you,
> > Farooq
>

Re: [VOTE] KIP-869: Improve Streams State Restoration Visibility

2023-01-24 Thread Walker Carlson

ful in some
> > debugging scenarios.
> >
> >>
> >> (5) `restore-remaining-records-total` -- why is this a task metric?
> >> Seems we could roll it up into a thread metric that we report at INFO
> >> level (we could still have per-task DEBUG level metric for it in
> addition).
> >>
> > The rationale behind it is the general principle in metrics design
> > that "Kafka would provide the lowest necessary metrics levels, and
> > users can do the roll-ups however they want".
> >
> >>
> >> (6) What about "warmup tasks"? Internally, we treat them as standbys,
> >> but it seems it's hard for users to reason about it in the scale-out
> >> warm-up case. Would it be helpful (and possible) to report "warmup
> >> progress" explicitly?
> >>
> > At the restore thread level, we cannot differentiate standby tasks
> > from warmup tasks since the latter is created exactly just like the
> > former. But I do agree this is an issue for visibility that worth
> > addressing, I think another KIP would be needed to first consider
> > distinguishing these two at the class level.
> >
> >>
> >> -Matthias
> >>
> >>
> >> On 11/1/22 2:44 AM, Lucas Brutschy wrote:
> >>> We need this!
> >>>
> >>> + 1 non binding
> >>>
> >>> Cheers,
> >>> Lucas
> >>>
> >>> On Tue, Nov 1, 2022 at 10:01 AM Bruno Cadonna 
> wrote:
> >>>>
> >>>> Guozhang,
> >>>>
> >>>> Thanks for the KIP!
> >>>>
> >>>> +1 (binding)
> >>>>
> >>>> Best,
> >>>> Bruno
> >>>>
> >>>> On 25.10.22 22:07, Walker Carlson wrote:
> >>>>> +1 non binding
> >>>>>
> >>>>> Thanks for the kip!
> >>>>>
> >>>>> On Thu, Oct 20, 2022 at 10:25 PM John Roesler 
> wrote:
> >>>>>
> >>>>>> Thanks for the KIP, Guozhang!
> >>>>>>
> >>>>>> I'm +1 (binding)
> >>>>>>
> >>>>>> -John
> >>>>>>
> >>>>>> On Wed, Oct 12, 2022, at 16:36, Nick Telford wrote:
> >>>>>>> Can't wait!
> >>>>>>> +1 (non-binding)
> >>>>>>>
> >>>>>>> On Wed, 12 Oct 2022, 18:02 Guozhang Wang, <
> guozhang.wang...@gmail.com>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Hello all,
> >>>>>>>>
> >>>>>>>> I'd like to start a vote for the following KIP, aiming to improve
> Kafka
> >>>>>>>> Stream's restoration visibility via new metrics and callback
> methods:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-869%3A+Improve+Streams+State+Restoration+Visibility
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Thanks!
> >>>>>>>> -- Guozhang
> >>>>>>>>
> >>>>>>
> >>>>>
>

Re: [DISCUSS] KIP-878: Autoscaling for Statically Partitioned Streams

2022-10-25 Thread Walker Carlson

Hey Sophie,

Thanks for the KIP. I think this could be useful for a lot of cases. I also
think that this could cause a lot of confusion.

Just to make sure we are doing our best to prevent people from
misusing this feature, I wanted to clarify a couple of things.
1) There will be only an interface and no "default" implementation that a
user can plug in for the static partitioner. I am considering when it comes
to testing we want to make sure that we do not make our testing
implementation avaible to a user.
2)  If a user wanted to use auto scaling for a stateless application it
should be as easy as implementing the StaticStreamsPartitioner. Their
implementation could even just wrap the default partitioner if they wanted,
right?  I can't think of any way we could detect and then warn them about
the output topic not being partitioned by keys if that were to happen, can
you?

Overall this looks good to me!

Walker

On Tue, Oct 25, 2022 at 12:27 PM Bill Bejeck  wrote:

> Hi Sophie,
>
> Thanks for the KIP! I think this is a worthwhile feature to add.  I have
> two main questions about how this new feature will work.
>
>
>1. You mention that for stateless applications auto-scaling is a sticker
>situation.  But I was thinking that the auto-scaling would actually
> benefit
>stateless applications the most, let me explain my thinking.  Let's say
> you
>have a stateless Kafka Streams application with one input topic and 2
>partitions, meaning you're limited to at most 2 stream threads.  In
> order
>to increase the throughput, you increase the number of partitions of the
>source topic to 4, so you can 4 stream threads.  In this case would the
>auto-scaling feature automatically increase the number of tasks from 2
> to
>4?  Since the application is stateless, say using a filter then a map
> for
>example, the partition for the record doesn't matter, so it seems that
>stateless applications would stand to gain a great deal.
>2. For stateful applications I can see the immediate benefit from
>autoscaling and static partitioning.   But again going with a partition
>expansion for increased throughput example, what would be the mitigation
>strategy for a stateful application that eventually wants to take
> advantage
>of the increased number of partitions? Otherwise keeping all keys on
> their
>original partition means you could end up with "key skew" due to not
>allowing keys to distribute out to the new partitions.
>
> One last comment, the KIP states "only the key, rather than the key and
> value, are passed in to the partitioner", but the interface has it taking a
> key and a value as parameters.  Based on your comments earlier in this
> thread I was thinking that the text needs to be updated.
>
> Thanks,
> Bill
>
> On Fri, Oct 21, 2022 at 12:21 PM Lucas Brutschy
>  wrote:
>
> > Hi all,
> >
> > thanks, Sophie, this makes sense. I suppose then the way to help the user
> > not apply this in the wrong setting is having good documentation and a
> one
> > or two examples of good use cases.
> >
> > I think Colt's time-based partitioning is a good example of how to use
> > this. It actually doesn't have to be time, the same will work with any
> > monotonically increasing identifier. I.e. the new partitions will only
> get
> > records for users with a "large" user ID greater than some user ID
> > threshold hardcoded in the static partitioner. At least in this
> restricted
> > use-case, lookups by user ID would still be possible.
> >
> > Cheers,
> > Lucas
> >
> > On Fri, Oct 21, 2022 at 5:37 PM Colt McNealy 
> wrote:
> >
> > > Sophie,
> > >
> > > Regarding item "3" (my last paragraph from the previous email),
> perhaps I
> > > should give a more general example now that I've had more time to
> clarify
> > > my thoughts:
> > >
> > > In some stateful applications, certain keys have to be findable without
> > any
> > > information about when the relevant data was created. For example, if
> I'm
> > > running a word-count app and I want to use Interactive Queries to find
> > the
> > > count for "foo", I would need to know whether "foo" first arrived
> before
> > or
> > > after time T before I could find the correct partition to look up the
> > data.
> > > In this case, I don't think static partitioning is possible. Is this
> > > use-case a non-goal of the KIP, or am I missing something?
> > >
> > > Colt McNealy
> > > *Founder, LittleHorse.io*
> > >
> > >
> > > On Thu, Oct 20, 2022 at 6:37 PM Sophie Blee-Goldman
> > >  wrote:
> > >
> > > > Thanks for the responses guys! I'll get the easy stuff out of the way
> > > > first:
> > > >
> > > > 1) Fixed the KIP so that StaticStreamPartitioner extends
> > > StreamPartitioner
> > > > 2) I totally agree with you Colt, the record value might have
> valuable
> > > (no
> > > > pun) information
> > > > in it that is needed to compute the partition without breaking the
> > static
> > > > constraint. As in my
> > > > own example ea

Re: [VOTE] KIP-869: Improve Streams State Restoration Visibility

2022-10-25 Thread Walker Carlson

+1 non binding

Thanks for the kip!

On Thu, Oct 20, 2022 at 10:25 PM John Roesler  wrote:

> Thanks for the KIP, Guozhang!
>
> I'm +1 (binding)
>
> -John
>
> On Wed, Oct 12, 2022, at 16:36, Nick Telford wrote:
> > Can't wait!
> > +1 (non-binding)
> >
> > On Wed, 12 Oct 2022, 18:02 Guozhang Wang, 
> > wrote:
> >
> >> Hello all,
> >>
> >> I'd like to start a vote for the following KIP, aiming to improve Kafka
> >> Stream's restoration visibility via new metrics and callback methods:
> >>
> >>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-869%3A+Improve+Streams+State+Restoration+Visibility
> >>
> >>
> >> Thanks!
> >> -- Guozhang
> >>
>

Re: [VOTE] KIP-837 Allow MultiCasting a Result Record.

2022-08-09 Thread Walker Carlson

+1 (non binding)

Walker

On Tue, May 31, 2022 at 4:44 AM Sagar  wrote:

> Hi All,
>
> I would like to start a voting thread on
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211883356
> .
>
> I am just starting this as the discussion thread has been open for 10+
> days. In case there are some comments, we can always discuss them over
> there.
>
> Thanks!
> Sagar.
>

Re: [DISCUSS] KIP-837 Allow MultiCasting a Result Record.

2022-08-09 Thread Walker Carlson

Thanks for updating! I looked over the changes and I think it's good.

Walker

On Tue, Aug 9, 2022 at 5:44 AM Sagar  wrote:

> Hello All,
>
> Bumping this one again to see if folks have any other
> comments/observations.
>
> Thanks!
> Sagar.
>
> On Wed, Jul 27, 2022 at 4:03 PM Sagar  wrote:
>
> > Thanks Walker for the comments
> >
> > I have updated the KIP with all the suggestions.
> >
> > Thanks!
> >
> > On Tue, Jul 12, 2022 at 10:59 PM Walker Carlson
> >  wrote:
> >
> >> Hi Sagar,
> >>
> >> I just finished reading the KIP and this seems to be a great addition.
> >>
> >> I agree with Matthias that the interface with a default implementation
> and
> >> deprecating partition() does seem cleaner. It has been a pattern that we
> >> have followed in the past. How I would handle a custom streams
> partitioner
> >> is just always call partitions(). If it is implemented then we ignore
> the
> >> partition() and if not the default implementation should just wrap the
> >> deprecated method in a list.
> >>
> >> Despite that I think your concerns are valid about this causing some
> >> confusion. To avoid that in the past we have just made sure we updated
> the
> >> depreciation message very cleanly and also include that implementing the
> >> new method will override the old one in the description. All those docs
> >> plus good logging has worked well. We had a very similar situation when
> >> adding a new exception handler for streams back for 2.8 and these
> >> precautions seemed to be enough.
> >>
> >> thanks for the kip!
> >> Walker
> >>
> >> On Sun, Jul 10, 2022 at 1:22 PM Sagar 
> wrote:
> >>
> >> > Hi Matthias,
> >> >
> >> > I agree that working with interfaces is cleaner. As such, there's not
> >> much
> >> > value in keeping both the methods. So, we can go down the route of
> >> > deprecating partition(). The only question I have is till deprecation
> >> if we
> >> > get both partition() and partitions() implemented, we may need to give
> >> > precedence to partitions() method, right?
> >> >
> >> > Also, in IQ and FK-join the runtime check you proposed seems good to
> me
> >> and
> >> > your suggestion on broadcast makes sense as well.
> >> >
> >> > Lastly, I am leaning towards the interface approach now. Let's see if
> >> other
> >> > have any questions/comments.
> >> >
> >> > Thanks!
> >> > Sagar.
> >> >
> >> >
> >> > On Fri, Jul 8, 2022 at 4:31 AM Matthias J. Sax 
> >> wrote:
> >> >
> >> > > Thanks for explaining you reasoning.
> >> > >
> >> > > I agree that it might not be ideal to have both methods implemented,
> >> but
> >> > > if we deprecate the exiting one, it would only be an issue until we
> >> > > remove the old one? Or do we see value to keep both methods?
> >> > >
> >> > > In general, working with interfaces is cleaner than with abstract
> >> > > classed, that is why I proposed it.
> >> > >
> >> > > In the end, I don't have too strong of an opinion what the better
> >> option
> >> > > would be. Maybe others can chime in and share their thoughts?
> >> > >
> >> > > -Matthias
> >> > >
> >> > > On 7/1/22 10:54 PM, Sagar wrote:
> >> > > > Hi Matthias,
> >> > > >
> >> > > > Thanks for your review. The reason I chose to introduce a new
> >> abstract
> >> > > > class is that, while it doesn't entail any changes in the
> >> > > StreamPartitioner
> >> > > > interface, I also disabled the partition() method in that class.
> >> Reason
> >> > > to
> >> > > > do that is that I didn't want a scenario where a user implements
> >> both
> >> > > > partition and partitions methods which could lead to confusion.
> With
> >> > the
> >> > > > approach you suggested, while the interface still remains
> >> functional,
> >> > > users
> >> > > > get the option to implement either methods which is what I wanted
> to
> >> > > avoid.
> >> > > > Let me know if

Re: [DISCUSS] KIP-837 Allow MultiCasting a Result Record.

2022-07-12 Thread Walker Carlson

Hi Sagar,

I just finished reading the KIP and this seems to be a great addition.

I agree with Matthias that the interface with a default implementation and
deprecating partition() does seem cleaner. It has been a pattern that we
have followed in the past. How I would handle a custom streams partitioner
is just always call partitions(). If it is implemented then we ignore the
partition() and if not the default implementation should just wrap the
deprecated method in a list.

Despite that I think your concerns are valid about this causing some
confusion. To avoid that in the past we have just made sure we updated the
depreciation message very cleanly and also include that implementing the
new method will override the old one in the description. All those docs
plus good logging has worked well. We had a very similar situation when
adding a new exception handler for streams back for 2.8 and these
precautions seemed to be enough.

thanks for the kip!
Walker

On Sun, Jul 10, 2022 at 1:22 PM Sagar  wrote:

> Hi Matthias,
>
> I agree that working with interfaces is cleaner. As such, there's not much
> value in keeping both the methods. So, we can go down the route of
> deprecating partition(). The only question I have is till deprecation if we
> get both partition() and partitions() implemented, we may need to give
> precedence to partitions() method, right?
>
> Also, in IQ and FK-join the runtime check you proposed seems good to me and
> your suggestion on broadcast makes sense as well.
>
> Lastly, I am leaning towards the interface approach now. Let's see if other
> have any questions/comments.
>
> Thanks!
> Sagar.
>
>
> On Fri, Jul 8, 2022 at 4:31 AM Matthias J. Sax  wrote:
>
> > Thanks for explaining you reasoning.
> >
> > I agree that it might not be ideal to have both methods implemented, but
> > if we deprecate the exiting one, it would only be an issue until we
> > remove the old one? Or do we see value to keep both methods?
> >
> > In general, working with interfaces is cleaner than with abstract
> > classed, that is why I proposed it.
> >
> > In the end, I don't have too strong of an opinion what the better option
> > would be. Maybe others can chime in and share their thoughts?
> >
> > -Matthias
> >
> > On 7/1/22 10:54 PM, Sagar wrote:
> > > Hi Matthias,
> > >
> > > Thanks for your review. The reason I chose to introduce a new abstract
> > > class is that, while it doesn't entail any changes in the
> > StreamPartitioner
> > > interface, I also disabled the partition() method in that class. Reason
> > to
> > > do that is that I didn't want a scenario where a user implements both
> > > partition and partitions methods which could lead to confusion. With
> the
> > > approach you suggested, while the interface still remains functional,
> > users
> > > get the option to implement either methods which is what I wanted to
> > avoid.
> > > Let me know if that makes sense.
> > >
> > > Regarding extending StreamsPartitioner, we could expose  net new to()
> > > methods taking in this new class devoid of any StreamPartitioner. I
> just
> > > thought it's cleaner to keep it this way as StreamPartitioner already
> > > dpes the partitioning. Let me know what you think.
> > >
> > > Thanks!
> > > Sagar.
> > >
> > > On Wed, Jun 29, 2022 at 5:34 AM Matthias J. Sax 
> > wrote:
> > >
> > >> Thanks for the KIP. Overall a good addition.
> > >>
> > >> I am actually not sure if we need to add a new class? From my
> > >> understanding, if there is exactly one abstract method, the interface
> is
> > >> still functional? Thus, we could add a new method to
> > >> `StreamsPartitioner` with a default implementation (that calls the
> > >> existing `partition()` method and wraps the result into a singleton
> > list)?
> > >>
> > >> Not sure what the pros/cons for both approaches would be?
> > >>
> > >> If we really add a new class, I am wondering if it should inherit from
> > >> `StreamsPartitioner` at all? Or course, if it does not, it's more
> stuff
> > >> we need to change, but the proposed overwrite of `partition()` that
> > >> throws also does not seem to be super clean? -- I am even wondering if
> > >> we should aim to deprecate the existing `partition()` and only offer
> > >> `partitions()` in the future?
> > >>
> > >> For the broadcast option, I am wondering if returning `null` (not an
> > >> singleton with `-1`) might be a clear option to encode it? Empty list
> > >> would still be valid as "send to no partition".
> > >>
> > >> Btw: The `StreamPartitioner` interface is also used for IQ. For both
> IQ
> > >> and FK-join, it seem ok to just add a runtime check that the returned
> > >> list is a singleton (in case we don't add a new class)?
> > >>
> > >>
> > >> -Matthias
> > >>
> > >>
> > >> On 6/26/22 7:55 AM, Sagar wrote:
> > >>> Hi Florin,
> > >>>
> > >>> Thanks for the comment! You brought up a very good point.. Actually I
> > was
> > >>> focussed too much on the multicast operation and didn't pay attention
> > to
> > >>> th

Re: [VOTE] KIP-846: Processor-level Streams metrics for records/bytes Producedd

2022-05-31 Thread Walker Carlson

+1 non binding

On Tue, May 31, 2022 at 12:19 PM John Roesler  wrote:

> +1 (binding)
>
> Thanks,
> John
>
> On Mon, May 30, 2022, at 13:00, Bill Bejeck wrote:
> > +1 (binding)
> >
> > -Bill
> >
> > On Mon, May 30, 2022 at 4:49 AM Sagar  wrote:
> >
> >> +1 (non-binding).
> >>
> >> Thanks!
> >> Sagar.
> >>
> >> On Mon, May 30, 2022 at 1:11 PM Bruno Cadonna 
> wrote:
> >>
> >> > +1 (binding)
> >> >
> >> > Thanks!
> >> > Bruno
> >> >
> >> > On 30.05.22 09:36, Sophie Blee-Goldman wrote:
> >> > > Hey all,
> >> > >
> >> > >   I'd like to kick off the voting thread for the KIP I proposed to
> add
> >> > > processor-level "bytes/records produced" metrics to Kafka Streams.
> >> > >
> >> > > Thanks!
> >> > >
> >> > > KIP-846: Task-level Streams metrics for bytes/records Produced
> >> > > <
> >> >
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211886093
> >> > >
> >> > >
> >> > > Cheers,
> >> > > Sophie
> >> > >
> >> >
> >>
>

Re: [VOTE] KIP-834: Pause / Resume KafkaStreams Topologies

2022-05-13 Thread Walker Carlson

+1 from me (non-binding)

Walker

On Wed, May 11, 2022 at 12:36 PM Leah Thomas 
wrote:

> Thanks Jim, great discussion. +1 from me (non-binding)
>
> Cheers,
> Leah
>
> On Wed, May 11, 2022 at 10:14 AM Bill Bejeck  wrote:
>
> > Thanks for the KIP!
> >
> > +1 (binding)
> >
> > -Bill
> >
> > On Wed, May 11, 2022 at 9:36 AM Luke Chen  wrote:
> >
> > > Hi Jim,
> > >
> > > I'm +1. (please add some note in KIP about the stream resetting tool
> > can't
> > > be used in paused state)
> > > Thanks for the KIP!
> > >
> > > Luke
> > >
> > > On Wed, May 11, 2022 at 9:09 AM Guozhang Wang 
> > wrote:
> > >
> > > > Thanks Jim. +1 from me.
> > > >
> > > > On Tue, May 10, 2022 at 4:51 PM Matthias J. Sax 
> > > wrote:
> > > >
> > > > > I had one minor question on the discuss thread. It's mainly about
> > > > > clarifying and document the user contract. I am fine either way.
> > > > >
> > > > > +1 (binding)
> > > > >
> > > > >
> > > > > -Matthias
> > > > >
> > > > > On 5/10/22 12:32 PM, Sophie Blee-Goldman wrote:
> > > > > > Thanks for the KIP! +1 (binding)
> > > > > >
> > > > > > On Tue, May 10, 2022, 12:24 PM Bruno Cadonna  >
> > > > wrote:
> > > > > >
> > > > > >> Thanks Jim,
> > > > > >>
> > > > > >> +1 (binding)
> > > > > >>
> > > > > >> Best,
> > > > > >> Bruno
> > > > > >>
> > > > > >> On 10.05.22 21:19, John Roesler wrote:
> > > > > >>> Thanks Jim,
> > > > > >>>
> > > > > >>> I’m +1 (binding)
> > > > > >>>
> > > > > >>> -John
> > > > > >>>
> > > > > >>> On Tue, May 10, 2022, at 14:05, Jim Hughes wrote:
> > > > >  Hi all,
> > > > > 
> > > > >  I'm asking for a vote on KIP-834:
> > > > > 
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211882832
> > > > > 
> > > > >  Thanks in advance!
> > > > > 
> > > > >  Jim
> > > > > >>
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > -- Guozhang
> > > >
> > >
> >
>

[jira] [Created] (KAFKA-13676) When processing in ALOS we might as well commit progress made other tasks on a task specific exception

2022-02-18 Thread Walker Carlson (Jira)

Walker Carlson created KAFKA-13676:
--

 Summary: When processing in ALOS we might as well commit progress 
made other tasks on a task specific exception
 Key: KAFKA-13676
 URL: https://issues.apache.org/jira/browse/KAFKA-13676
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Reporter: Walker Carlson
Assignee: Walker Carlson


When processing in ALOS we might as well commit progress made other tasks on a 
task specific exception. If one task has an issue and we have already 
successfully completed processing on at least one task it would be good to 
commit those successfully processed tasks. This should prevent limit the 
duplicated records downstream and also be more efficient.

Also if one task is having lots of issues the other tasks can at least make 
progress. When we introduced the thread replacement mechanism this optimization 
became possible. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Re: [VOTE] KIP-591: Add Kafka Streams config to set default state store

2022-01-20 Thread Walker Carlson

+1 non binding

On Thu, Jan 20, 2022 at 2:00 PM Matthias J. Sax  wrote:

> +1 (binding)
>
> On 1/20/22 10:52 AM, Guozhang Wang wrote:
> > Thanks Luke! I'm +1 on the KIP.
> >
> >
> > Guozhang
> >
> > On Wed, Jan 19, 2022 at 5:58 PM Luke Chen  wrote:
> >
> >> Hi devs,
> >>
> >> I'd like to start a vote for the KIP-591: Add Kafka Streams config to
> set
> >> default state store. The goal is to allow users to set a default store
> in
> >> the config, so it can apply to all the streams.
> >>
> >> Detailed description can be found here:
> >>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-591%3A+Add+Kafka+Streams+config+to+set+default+state+store
> >>
> >>
> >> Thank you.
> >> Luke
> >>
> >
> >
>

[jira] [Created] (KAFKA-13588) We should consolidate `changelogFor` methods to simplify the generation of internal topic names

2022-01-10 Thread Walker Carlson (Jira)

Walker Carlson created KAFKA-13588:
--

 Summary: We should consolidate `changelogFor` methods to simplify 
the generation of internal topic names
 Key: KAFKA-13588
 URL: https://issues.apache.org/jira/browse/KAFKA-13588
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Reporter: Walker Carlson


[https://github.com/apache/kafka/pull/11611#discussion_r772625486]

we should use `ProcessorContextUtils#changelogFor` after we remove `init(final 
ProcessorContext context, final StateStore root)` in 
`CahceingWindowStore#initInternal`

 

Or any other place that we generate an internal topic name.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Re: [VOTE] Add TaskId field to StreamsException

2021-10-18 Thread Walker Carlson

Hey Sophie,

+1 for me, I think that is would only help.

Walker

On Mon, Oct 18, 2021 at 1:45 AM Luke Chen  wrote:

> Hi Sophie,
> Add taskId to make the exception much clear is a good improvement.
> + 1 (non-binding)
>
> Thank you.
> Luke
>
> On Mon, Oct 18, 2021 at 12:10 PM Sophie Blee-Goldman
>  wrote:
>
> > Hey all,
> >
> > I'd like to kick off the vote on this small KIP which adds a TaskId field
> > to the StreamsException class. Please take a look and cast your vote when
> > you have a chance.
> >
> > Links:
> >
> >- KIP-783: Add TaskId field to StreamsException
> >
> >- PR #11405 
> >
> >
> > Thanks!
> > Sophie
> >
>

[jira] [Created] (KAFKA-13246) StoreQueryIntegrationTest#shouldQueryStoresAfterAddingAndRemovingStreamThread does not gate on stream state well

2021-08-27 Thread Walker Carlson (Jira)

Walker Carlson created KAFKA-13246:
--

 Summary: 
StoreQueryIntegrationTest#shouldQueryStoresAfterAddingAndRemovingStreamThread 
does not gate on stream state well
 Key: KAFKA-13246
 URL: https://issues.apache.org/jira/browse/KAFKA-13246
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Reporter: Walker Carlson


StoreQueryIntegrationTest#shouldQueryStoresAfterAddingAndRemovingStreamThread 
should be improved by waiting for the client to go to rebalancing or running 
after adding and removing a thread. It should also wait until running before 
querying the state store 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (KAFKA-13215) Flaky test org.apache.kafka.streams.integration.TaskMetadataIntegrationTest.shouldReportCorrectEndOffsetInformation

2021-08-24 Thread Walker Carlson (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-13215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walker Carlson resolved KAFKA-13215.

Resolution: Fixed

> Flaky test 
> org.apache.kafka.streams.integration.TaskMetadataIntegrationTest.shouldReportCorrectEndOffsetInformation
> ---
>
> Key: KAFKA-13215
> URL: https://issues.apache.org/jira/browse/KAFKA-13215
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Konstantine Karantasis
>    Assignee: Walker Carlson
>Priority: Major
>  Labels: flaky-test
> Fix For: 3.1.0
>
>
> Integration test {{test 
> org.apache.kafka.streams.integration.TaskMetadataIntegrationTest.shouldReportCorrectCommittedOffsetInformation()}}
>  sometimes fails with
> {code:java}
> java.lang.AssertionError: only one task
>   at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:26)
>   at 
> org.apache.kafka.streams.integration.TaskMetadataIntegrationTest.getTaskMetadata(TaskMetadataIntegrationTest.java:163)
>   at 
> org.apache.kafka.streams.integration.TaskMetadataIntegrationTest.shouldReportCorrectEndOffsetInformation(TaskMetadataIntegrationTest.java:144)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [VOTE] KIP-740: Use TaskId instead of String for the taskId field in TaskMetadata

2021-08-09 Thread Walker Carlson

> > > > > > take them to the discussion thread.
> > > > > > >
> > > > > > > Thanks!
> > > > > > > Sophie
> > > > > > >
> > > > > > > On Fri, May 14, 2021 at 10:12 AM John Roesler <
> > vvcep...@apache.org>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Thanks for these updates, Sophie,
> > > > > > > >
> > > > > > > > Unfortunately, I have some minor suggestions:
> > > > > > > >
> > > > > > > > 1. "Topic Group" is a vestigial term from the early days of
> > > > > > > > the codebase. We call a "topic group" a "subtopology" in the
> > > > > > > > public interface now (although "topic group" is still used
> > > > > > > > internally some places). For user-facing consistency, we
> > > > > > > > should also use "subtopologyId" in your proposal.
> > > > > > > >
> > > > > > > > 2. I'm wondering if it's really necessary to introduce this
> > > > > > > > interface at all. I think your motivation is to be able to
> > > > > > > > get the subtopologyId and partition via TaskMetadata, right?
> > > > > > > > Why not just add those methods to TaskMetadata? Stepping
> > > > > > > > back, the concept of metadata about an identifier is a bit
> > > > > > > > elaborate.
> > > > > > > >
> > > > > > > > Sorry for thrashing what you were hoping would be a quick,
> > > > > > > > uncontroversial KIP.
> > > > > > > >
> > > > > > > > Thanks for your consideration,
> > > > > > > > John
> > > > > > > >
> > > > > > > > On Thu, 2021-05-13 at 19:35 -0700, Sophie Blee-Goldman
> > > > > > > > wrote:
> > > > > > > > > One last update: we will not actually remove the existing
> > > > > > > > > o.a.k.streams.processor.TaskId class, but only
> > > > > > > > > deprecate it, along with any methods that returned it (ie
> the
> > > > > > getters on
> > > > > > > > > ProcessorContext and StateStoreContext)
> > > > > > > > >
> > > > > > > > > Internally, everything will still be converted to use the
> new
> > > > > > internal
> > > > > > > > > TaskId class, and public TaskIdMetadata interface,
> > > > > > > > > where appropriate.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Thu, May 13, 2021 at 6:42 PM Sophie Blee-Goldman <
> > > > > > > > sop...@confluent.io>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Thanks all. I updated the KIP slightly since there is
> some
> > > > > > ambiguity
> > > > > > > > > > around whether the existing TaskId class is
> > > > > > > > > > currently part of the public API or not. To settle the
> > matter, I
> > > > > > have
> > > > > > > > > > introduced a new public TaskId interface that
> > > > > > > > > > exposes the metadata, and moved the existing TaskId class
> > to the
> > > > > > > > internals
> > > > > > > > > > package. The KIP <
> > https://cwiki.apache.org/confluence/x/vYTOCg>
> > > > > > has
> > > > > > > > been
> > > > > > > > > > updated
> > > > > > > > > > with the proposed API changes.
> > > > > > > > > >
> > > > > > > > > > @Guozhang Wang  : I decided to
> > leave this
> > > > > > new
> > > > > > > > > > TaskId interface in o.a.k.streams.processor since that's
> > where the
> > > > > > > > > > TaskMetadata class is, along with the other related
> > metadata
> > > > > > classes
> > > > > > > > (eg
> > > > > > > > > > ThreadMetadata). I do agree it makes
> > > > > > > > > > more sense for them to be under o.a.k.streams, but I'd
> > rather leave
> > > > > > > > them
> > > > > > > > > > together for now.
> > > > > > > > > >
> > > > > > > > > > Please let me know if there are any concerns, or you want
> > to redact
> > > > > > > > your
> > > > > > > > > > vote :)
> > > > > > > > > >
> > > > > > > > > > -Sophie
> > > > > > > > > >
> > > > > > > > > > On Thu, May 13, 2021 at 3:11 PM Guozhang Wang <
> > wangg...@gmail.com>
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > +1
> > > > > > > > > > >
> > > > > > > > > > > On a hindsight, maybe TaskId should not really be in
> > > > > > > > > > > `org.apache.kafka.streams.processor` but rather just in
> > > > > > > > `o.a.k.streams`,
> > > > > > > > > > > but maybe not worth pulling it up now :)
> > > > > > > > > > >
> > > > > > > > > > > Guozhang
> > > > > > > > > > >
> > > > > > > > > > > On Thu, May 13, 2021 at 1:58 PM Walker Carlson
> > > > > > > > > > >  wrote:
> > > > > > > > > > >
> > > > > > > > > > > > +1 from me! (non-binding)
> > > > > > > > > > > >
> > > > > > > > > > > > Walker
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, May 13, 2021 at 1:53 PM Sophie Blee-Goldman
> > > > > > > > > > > >  wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hey all,
> > > > > > > > > > > > >
> > > > > > > > > > > > > I'm just going to take this KIP straight to a vote
> > since it
> > > > > > > > should be
> > > > > > > > > > > a
> > > > > > > > > > > > > trivial and uncontroversial change. Of course
> please
> > raise
> > > > > > any
> > > > > > > > > > > concerns
> > > > > > > > > > > > > should they come up, and I can take things to a
> > DISCUSS
> > > > > > thread.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The KIP is a simple change to move from String to
> > TaskId for
> > > > > > the
> > > > > > > > > > > taskID
> > > > > > > > > > > > > field of TaskMetadata.
> > > > > > > > > > > > >
> > > > > > > > > > > > > KIP-740: Use TaskId instead of String for the
> taskId
> > field in
> > > > > > > > > > > > TaskMetadata
> > > > > > > > > > > > > <
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > >
> > > > > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-740%3A+Use+TaskId+instead+of+String+for+the+taskId+field+in+TaskMetadata
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > Sophie
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > -- Guozhang
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > -- Guozhang
> > > > >
> > > >
> > > >
> >
> >
> >
>

Re: [VOTE] KIP-761: Add Total Blocked Time Metric to Streams

2021-08-02 Thread Walker Carlson

Thanks for the KIP +1 from me (non binding)!

Walker

On Fri, Jul 30, 2021 at 1:20 PM Sophie Blee-Goldman
 wrote:

> Thanks for updating the KIP, +1 (binding)
>
> -Sophie
>
>
> On Tue, Jul 27, 2021 at 9:57 AM Guozhang Wang  wrote:
>
> > Hello Rohan,
> >
> > Thanks for the KIP. As Bruno mentioned in the other thread could you
> update
> > the "New Metrics" that 1) we have sub-titles for streams, producer,
> > consumer metrics, just for clarification, and 2) update the "producer-id"
> > etc to "client-id" to be consistent with the existing metrics.
> >
> > Otherwise, I'm +1
> >
> >
> > Guozhang
> >
> >
> > On Mon, Jul 26, 2021 at 12:49 PM Leah Thomas
>  > >
> > wrote:
> >
> > > Hey Rohan,
> > >
> > > Thanks for pushing this KIP through. I'm +1, non-binding.
> > >
> > > Leah
> > >
> > > On Wed, Jul 21, 2021 at 7:09 PM Rohan Desai 
> > > wrote:
> > >
> > > > Now that the discussion thread's been open for a few days, I'm
> calling
> > > for
> > > > a vote on
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-761%3A+Add+Total+Blocked+Time+Metric+to+Streams
> > > >
> > >
> >
> >
> > --
> > -- Guozhang
> >
>

Re: [ANNOUNCE] New Kafka PMC Member: Konstantine Karantasis

2021-06-21 Thread Walker Carlson

Congratulations!

On Mon, Jun 21, 2021 at 12:25 PM Dhruvil Shah 
wrote:

> Congratulations Konstantine! Well deserved!
>
> On Mon, Jun 21, 2021 at 10:20 AM Boyang Chen 
> wrote:
>
> > Congratulations Konstantine!
> >
> > On Mon, Jun 21, 2021 at 10:16 AM Matthias J. Sax 
> wrote:
> >
> > > Congrats!
> > >
> > > On 6/21/21 12:57 PM, Raymond Ng wrote:
> > > > Congrats Konstantine!
> > > >
> > > > /Ray
> > > >
> > > > On Mon, Jun 21, 2021 at 9:45 AM Guozhang Wang 
> > > wrote:
> > > >
> > > >> Congratulations Konstantine!
> > > >>
> > > >> On Mon, Jun 21, 2021 at 9:37 AM Tom Bentley 
> > > wrote:
> > > >>
> > > >>> Congratulations Konstantine!
> > > >>>
> > > >>> On Mon, Jun 21, 2021 at 5:33 PM David Jacot
> > >  > > >>>
> > > >>> wrote:
> > > >>>
> > >  Congrats, Konstantine. Well deserved!
> > > 
> > >  Best,
> > >  David
> > > 
> > >  On Mon, Jun 21, 2021 at 6:14 PM Ramesh Krishnan <
> > > >> ramesh.154...@gmail.com
> > > 
> > >  wrote:
> > > 
> > > > Congrats Konstantine
> > > >
> > > > On Mon, 21 Jun 2021 at 8:58 PM, Mickael Maison <
> > mimai...@apache.org>
> > > > wrote:
> > > >
> > > >> Hi,
> > > >>
> > > >> It's my pleasure to announce that Konstantine Karantasis is now
> a
> > > >> member of the Kafka PMC.
> > > >>
> > > >> Konstantine has been a Kafka committer since Feb 2020. He has
> > > >>> remained
> > > >> active in the community since becoming a committer.
> > > >>
> > > >> Congratulations Konstantine!
> > > >>
> > > >> Mickael, on behalf of the Apache Kafka PMC
> > > >>
> > > >
> > > 
> > > >>>
> > > >>
> > > >>
> > > >> --
> > > >> -- Guozhang
> > > >>
> > > >
> > >
> >
>

Re: [VOTE] KIP-741: Change default serde to be null

2021-05-24 Thread Walker Carlson

+1 (non-binding) from me, Leah

Walker

On Mon, May 24, 2021 at 1:51 PM Leah Thomas 
wrote:

> Hi,
>
> I'd like to kick-off voting for KIP-741: Change default serde to be null.
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-741%3A+Change+default+serde+to+be+null
> >
> The
> discussion is linked on the KIP for context.
>
> Cheers,
> Leah
>

[jira] [Created] (KAFKA-12781) Improve the endOffsets accuracy in TaskMetadata

2021-05-13 Thread Walker Carlson (Jira)

Walker Carlson created KAFKA-12781:
--

 Summary: Improve the endOffsets accuracy in TaskMetadata 
 Key: KAFKA-12781
 URL: https://issues.apache.org/jira/browse/KAFKA-12781
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Reporter: Walker Carlson


Currently `TaskMetadata#endOffsets()` returns the highest offset seen by the 
consumer so far. It should be possible to get the highest offset in the topic 
via the consumer instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [VOTE] KIP-740: Use TaskId instead of String for the taskId field in TaskMetadata

2021-05-13 Thread Walker Carlson

+1 from me! (non-binding)

Walker

On Thu, May 13, 2021 at 1:53 PM Sophie Blee-Goldman
 wrote:

> Hey all,
>
> I'm just going to take this KIP straight to a vote since it should be a
> trivial and uncontroversial change. Of course please raise any concerns
> should they come up, and I can take things to a DISCUSS thread.
>
> The KIP is a simple change to move from String to TaskId for the taskID
> field of TaskMetadata.
>
> KIP-740: Use TaskId instead of String for the taskId field in TaskMetadata
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-740%3A+Use+TaskId+instead+of+String+for+the+taskId+field+in+TaskMetadata
> >
>
> Cheers,
> Sophie
>

[jira] [Created] (KAFKA-12754) TaskMetadata endOffsets does not update when the offsets are read

2021-05-05 Thread Walker Carlson (Jira)

Walker Carlson created KAFKA-12754:
--

 Summary: TaskMetadata endOffsets does not update when the offsets 
are read
 Key: KAFKA-12754
 URL: https://issues.apache.org/jira/browse/KAFKA-12754
 Project: Kafka
  Issue Type: Bug
  Components: streams
Reporter: Walker Carlson
Assignee: Walker Carlson


The high water mark in StreamTask is not updated optimally. Also it would be 
good to have the metadata offsets have a initial value of -1 instead of an 
empty map that way the set of TopicPartitions won't change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (KAFKA-12711) Add a back off option to Replace thread

2021-04-23 Thread Walker Carlson (Jira)

Walker Carlson created KAFKA-12711:
--

 Summary: Add a back off option to Replace thread
 Key: KAFKA-12711
 URL: https://issues.apache.org/jira/browse/KAFKA-12711
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Reporter: Walker Carlson


There should be a native option to set a back off when replacing a thread.

 

Either there should be a config and a user chosen strategy or a value you can 
set in the handler that causes a delay in creating the new thread.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (KAFKA-12705) Task idling is not sufficiently tested

2021-04-21 Thread Walker Carlson (Jira)

Walker Carlson created KAFKA-12705:
--

 Summary: Task idling is not sufficiently tested
 Key: KAFKA-12705
 URL: https://issues.apache.org/jira/browse/KAFKA-12705
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Reporter: Walker Carlson


The test for task idling are a bit sparse. When I changed it so that 
isProcessable always returns true only one test failed. That means the entire 
code path is hinging on one unit test 
(shouldBeProcessableIfAllPartitionsBuffered) that does not cover all branches 
of logic. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (KAFKA-12699) Streams no longer overrides the java default uncaught exception handler

2021-04-20 Thread Walker Carlson (Jira)

Walker Carlson created KAFKA-12699:
--

 Summary: Streams no longer overrides the java default uncaught 
exception handler  
 Key: KAFKA-12699
 URL: https://issues.apache.org/jira/browse/KAFKA-12699
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 2.8.0
Reporter: Walker Carlson


If a user used `Thread.setUncaughtExceptionHanlder()` to set the handler for 
all threads in the runtime streams would override that with its own handler. 
However since streams does not use the `Thread` handler anymore it will no 
longer do so. This can cause problems if the user does something like 
`System.exit(1)` in the handler. 

 

If using the old handler in streams it will still work as it used to



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (KAFKA-12691) TaskMetadata timeSinceIdlingStarted not reporting correctly

2021-04-19 Thread Walker Carlson (Jira)

Walker Carlson created KAFKA-12691:
--

 Summary: TaskMetadata timeSinceIdlingStarted not reporting 
correctly
 Key: KAFKA-12691
 URL: https://issues.apache.org/jira/browse/KAFKA-12691
 Project: Kafka
  Issue Type: Bug
  Components: streams
Reporter: Walker Carlson
Assignee: Walker Carlson


TaskMetadata timeSinceIdlingStarted not reporting correctly. It takes into 
account suspended but not the call to is processable. To fix this we need to 
record when the first time it is not processable. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [VOTE] KIP-633: Drop 24 hour default of grace period in Streams

2021-04-06 Thread Walker Carlson

This makes sense to me +1!

Walker

On Tue, Apr 6, 2021 at 11:08 AM Guozhang Wang  wrote:

> +1. Thanks!
>
> On Tue, Apr 6, 2021 at 7:00 AM Leah Thomas 
> wrote:
>
> > Thanks for picking this up, Sophie. +1 from me, non-binding.
> >
> > Leah
> >
> > On Mon, Apr 5, 2021 at 9:42 PM John Roesler  wrote:
> >
> > > Thanks, Sophie,
> > >
> > > I’m +1 (binding)
> > >
> > > -John
> > >
> > > On Mon, Apr 5, 2021, at 21:34, Sophie Blee-Goldman wrote:
> > > > Hey all,
> > > >
> > > > I'd like to start the voting on KIP-633, to drop the awkward 24 hour
> > > grace
> > > > period and improve the API to raise visibility on an important
> concept
> > in
> > > > Kafka Streams: grace period nad out-of-order data handling.
> > > >
> > > > Here's the KIP:
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-633%3A+Drop+24+hour+default+of+grace+period+in+Streams
> > > > <
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-633%3A+Drop+24hr+default+grace+period
> > > >
> > > >
> > > > Cheers,
> > > > Sophie
> > > >
> > >
> >
>
>
> --
> -- Guozhang
>

[jira] [Created] (KAFKA-12565) Global thread only topologies should be able to shutdown applications via the uncaught exception handler

2021-03-26 Thread Walker Carlson (Jira)

Walker Carlson created KAFKA-12565:
--

 Summary: Global thread only topologies should be able to shutdown 
applications via the uncaught exception handler
 Key: KAFKA-12565
 URL: https://issues.apache.org/jira/browse/KAFKA-12565
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Affects Versions: 3.0.0, 2.8.0
Reporter: Walker Carlson


Global thread only topologies should be able to shutdown applications via the 
uncaught exception handler.

Currently because there is no stream thread in this case there is nothing to 
participate in a rebalance to communicate the request. If we add a stream 
thread to do this it will result in an `IllegalStateException` because 
"Consumer is not subscribed to any topics or assigned any partitions".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (KAFKA-12538) Global Threads should be able to be replaced like stream threads

2021-03-23 Thread Walker Carlson (Jira)

Walker Carlson created KAFKA-12538:
--

 Summary: Global Threads should be able to be replaced like stream 
threads
 Key: KAFKA-12538
 URL: https://issues.apache.org/jira/browse/KAFKA-12538
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Reporter: Walker Carlson


We should be able to replace global threads from the streams uncaught exception 
handler just like we replace stream threads.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (KAFKA-12537) Single Threaded EOS applications will not work with SHUTDOWN_APPLICATION

2021-03-23 Thread Walker Carlson (Jira)

Walker Carlson created KAFKA-12537:
--

 Summary: Single Threaded EOS applications will not work with 
SHUTDOWN_APPLICATION
 Key: KAFKA-12537
 URL: https://issues.apache.org/jira/browse/KAFKA-12537
 Project: Kafka
  Issue Type: Bug
Affects Versions: 3.0.0, 2.8.0
Reporter: Walker Carlson


Single Threaded EOS applications will not work with the streams uncaught 
exception handler option SHUTDOWN_APPLICATION. This is because the EOS thread 
needs to close and clean up, but to send the shutdown signal it needs to have 
at least one thread running.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] Apache Kafka 2.8.0 release

2021-03-19 Thread Walker Carlson

Hello John,

We merged the fix for the blocker I mentioned earlier
https://issues.apache.org/jira/browse/KAFKA-12462
https://github.com/apache/kafka/pull/10311

best
walker

On Fri, Mar 19, 2021 at 7:09 AM Bruno Cadonna 
wrote:

> Hi John,
>
> Correction to my last e-mail: The bug does not break eos, but it breaks
> at-least-once.
>
> Bruno
>
>
> On 19.03.21 14:55, Bruno Cadonna wrote:
> > Hi John,
> >
> > Please have a look at the following bug report:
> >
> > https://issues.apache.org/jira/browse/KAFKA-12508
> >
> > I set its priority to blocker since the bug might break at-least-once
> > and exactly-once processing guarantees.
> >
> > Feel free to set it back to major, if you think that it is not a blocker.
> >
> > Best,
> > Bruno
> >
> >
> > On 13.03.21 00:41, Walker Carlson wrote:
> >> Hello John,
> >>
> >> We found a Blocker for 2.8 in our streams soak environment.
> >> https://issues.apache.org/jira/browse/KAFKA-12462
> >>  - We found a case where a StreamThread can try to transition
> >> to PARTITIONS_REVOKED when it was already in PENDING_SHUTDOWN causing an
> >> IllegalStateException.
> >>
> >> Will have a PR for a fix soon
> >>
> >> Walker
> >>
> >> On Tue, Mar 2, 2021 at 2:41 PM Dhruvil Shah 
> wrote:
> >>
> >>> Thanks, John. The fix for KAFKA-12254 is now merged into 2.8.
> >>>
> >>> On Tue, Mar 2, 2021 at 11:54 AM John Roesler 
> >>> wrote:
> >>>
> >>>> Hi Dhruvil,
> >>>>
> >>>> Thanks for this fix. I agree it would be good to get it in
> >>>> for 2.8.0, so I have added it to the fix versions in KAFKA-
> >>>> 12254.
> >>>>
> >>>> Please go ahead and cherry-pick your fix onto the 2.8
> >>>> branch.
> >>>>
> >>>> Thanks!
> >>>> -John
> >>>>
> >>>> On Mon, 2021-03-01 at 09:36 -0800, Dhruvil Shah wrote:
> >>>>> Hi John,
> >>>>>
> >>>>> I would like to bring up
> >>>> https://issues.apache.org/jira/browse/KAFKA-12254
> >>>>> as a blocker candidate for 2.8.0. While this is not a regression, the
> >>>>> issue could lead to data loss in certain cases. The fix is trivial so
> >>> it
> >>>>> may be worth bringing it into 2.8.0. Let me know what you think.
> >>>>>
> >>>>> - Dhruvil
> >>>>>
> >>>>> On Mon, Feb 22, 2021 at 7:50 AM John Roesler 
> >>>> wrote:
> >>>>>
> >>>>>> Thanks for the heads-up, Chia-Ping,
> >>>>>>
> >>>>>> I agree it would be good to include that fix.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> John
> >>>>>>
> >>>>>> On Mon, 2021-02-22 at 09:48 +, Chia-Ping Tsai wrote:
> >>>>>>> hi John,
> >>>>>>>
> >>>>>>> There is a PR (https://github.com/apache/kafka/pull/10024) fixing
> >>>>>> following test error.
> >>>>>>>
> >>>>>>> 14:00:28 Execution failed for task ':core:test'.
> >>>>>>> 14:00:28 > Process 'Gradle Test Executor 24' finished with non-zero
> >>>> exit
> >>>>>> value 1
> >>>>>>> 14:00:28   This problem might be caused by incorrect test process
> >>>>>> configuration.
> >>>>>>> 14:00:28   Please refer to the test execution section in the User
> >>>> Manual
> >>>>>> at
> >>>>>>>
> >>>>>>> This error obstructs us from running integration tests so I'd like
> >>> to
> >>>>>> push it to 2.8 branch after it gets approved.
> >>>>>>>
> >>>>>>> Best Regards,
> >>>>>>> Chia-Ping
> >>>>>>>
> >>>>>>> On 2021/02/18 16:23:13, "John Roesler" 
> >>> wrote:
> >>>>>>>> Hello again, all.
> >>>>>>>>
> >>>>>>>> This is a notice that we are now in Code Freeze for the 2.8
> >>> branch.
> >>>>>>>>
> >>>>>>

[jira] [Created] (KAFKA-12503) Resizing the thread cache in a non thread safe way can cause records to be redirected throughout the topology

2021-03-18 Thread Walker Carlson (Jira)

Walker Carlson created KAFKA-12503:
--

 Summary: Resizing the thread cache in a non thread safe way can 
cause records to be redirected throughout the topology
 Key: KAFKA-12503
 URL: https://issues.apache.org/jira/browse/KAFKA-12503
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 2.8.0
Reporter: Walker Carlson
 Fix For: 2.8.0


When a thread is added, removed or replaced the cache is resized. When the 
thread cache was resized it was being done so from the thread initiating these 
calls. This can cause the record to be redirected to the wrong processor via 
the call to `evict` in the cache. The evict flushes records downstream to the 
next processor after the cache. But if this is on the wrong thread the wrong 
processor receives them. 

This can cause 3 problems.

1) When the owner finishes processing the record it set the current node to 
null in the processor context a this then causes the other processor to throw 
an exception `StreamsException: Current node is unknown.`. 

2) Depending on the type it can cause a class cast exception as the record is a 
different type. Mostly this happened when the value types were different inside 
of the map node from the toStream method

3) A silent issue is it could cause data to be processed by the wrong node and 
cause data corruption. We have not been able to confirm this last one but it is 
the most dangerous in many ways.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] Apache Kafka 2.8.0 release

2021-03-12 Thread Walker Carlson

Hello John,

We found a Blocker for 2.8 in our streams soak environment.
https://issues.apache.org/jira/browse/KAFKA-12462
- We found a case where a StreamThread can try to transition
to PARTITIONS_REVOKED when it was already in PENDING_SHUTDOWN causing an
IllegalStateException.

Will have a PR for a fix soon

Walker

On Tue, Mar 2, 2021 at 2:41 PM Dhruvil Shah  wrote:

> Thanks, John. The fix for KAFKA-12254 is now merged into 2.8.
>
> On Tue, Mar 2, 2021 at 11:54 AM John Roesler  wrote:
>
> > Hi Dhruvil,
> >
> > Thanks for this fix. I agree it would be good to get it in
> > for 2.8.0, so I have added it to the fix versions in KAFKA-
> > 12254.
> >
> > Please go ahead and cherry-pick your fix onto the 2.8
> > branch.
> >
> > Thanks!
> > -John
> >
> > On Mon, 2021-03-01 at 09:36 -0800, Dhruvil Shah wrote:
> > > Hi John,
> > >
> > > I would like to bring up
> > https://issues.apache.org/jira/browse/KAFKA-12254
> > > as a blocker candidate for 2.8.0. While this is not a regression, the
> > > issue could lead to data loss in certain cases. The fix is trivial so
> it
> > > may be worth bringing it into 2.8.0. Let me know what you think.
> > >
> > > - Dhruvil
> > >
> > > On Mon, Feb 22, 2021 at 7:50 AM John Roesler 
> > wrote:
> > >
> > > > Thanks for the heads-up, Chia-Ping,
> > > >
> > > > I agree it would be good to include that fix.
> > > >
> > > > Thanks,
> > > > John
> > > >
> > > > On Mon, 2021-02-22 at 09:48 +, Chia-Ping Tsai wrote:
> > > > > hi John,
> > > > >
> > > > > There is a PR (https://github.com/apache/kafka/pull/10024) fixing
> > > > following test error.
> > > > >
> > > > > 14:00:28 Execution failed for task ':core:test'.
> > > > > 14:00:28 > Process 'Gradle Test Executor 24' finished with non-zero
> > exit
> > > > value 1
> > > > > 14:00:28   This problem might be caused by incorrect test process
> > > > configuration.
> > > > > 14:00:28   Please refer to the test execution section in the User
> > Manual
> > > > at
> > > > >
> > > > > This error obstructs us from running integration tests so I'd like
> to
> > > > push it to 2.8 branch after it gets approved.
> > > > >
> > > > > Best Regards,
> > > > > Chia-Ping
> > > > >
> > > > > On 2021/02/18 16:23:13, "John Roesler" 
> wrote:
> > > > > > Hello again, all.
> > > > > >
> > > > > > This is a notice that we are now in Code Freeze for the 2.8
> branch.
> > > > > >
> > > > > > From now until the release, only fixes for blockers should be
> > merged
> > > > to the release branch. Fixes for failing tests are allowed and
> > encouraged.
> > > > Documentation-only commits are also ok, in case you have forgotten to
> > > > update the docs for some features in 2.8.0.
> > > > > >
> > > > > > Once we have a green build and passing system tests, I will cut
> the
> > > > first RC.
> > > > > >
> > > > > > Thank you,
> > > > > > John
> > > > > >
> > > > > > On Sun, Feb 7, 2021, at 09:59, John Roesler wrote:
> > > > > > > Hello all,
> > > > > > >
> > > > > > > I have just cut the branch for 2.8 and sent the notification
> > > > > > > email to the dev mailing list.
> > > > > > >
> > > > > > > As a reminder, the next checkpoint toward the 2.8.0 release
> > > > > > > is Code Freeze on Feb 17th.
> > > > > > >
> > > > > > > To ensure a high-quality release, we should now focus our
> > > > > > > efforts on stabilizing the 2.8 branch, including resolving
> > > > > > > failures, writing new tests, and fixing documentation.
> > > > > > >
> > > > > > > Thanks as always for your contributions,
> > > > > > > John
> > > > > > >
> > > > > > >
> > > > > > > On Wed, 2021-02-03 at 14:18 -0600, John Roesler wrote:
> > > > > > > > Hello again, all,
> > > > > > > >
> > > > > > > > This is a reminder that today is the Feature Freeze
> > > > > > > > deadline. To avoid any last-minute crunch or time-zone
> > > > > > > > unfairness, I'll cut the branch toward the end of the week.
> > > > > > > >
> > > > > > > > Please wrap up your features and transition fully into a
> > > > > > > > stabilization mode. The next checkpoint is Code Freeze on
> > > > > > > > Feb 17th.
> > > > > > > >
> > > > > > > > Thanks as always for all of your contributions,
> > > > > > > > John
> > > > > > > >
> > > > > > > > On Wed, 2021-01-27 at 12:17 -0600, John Roesler wrote:
> > > > > > > > > Hello again, all.
> > > > > > > > >
> > > > > > > > > This is a reminder that *today* is the KIP freeze for
> Apache
> > > > > > > > > Kafka 2.8.0.
> > > > > > > > >
> > > > > > > > > The next checkpoint is the Feature Freeze on Feb 3rd.
> > > > > > > > >
> > > > > > > > > When considering any last-minute KIPs today, please be
> > > > > > > > > mindful of the scope, since we have only one week to merge
> a
> > > > > > > > > stable implementation of the KIP.
> > > > > > > > >
> > > > > > > > > For those whose KIPs have been accepted already, please
> work
> > > > > > > > > closely with your reviewers so that your features can be
> > > > > > > > > merged in a stable form in before the Feb 3rd cutoff. Also,
> > >

[jira] [Created] (KAFKA-12462) Threads in PENDING_SHUTDOWN entering a rebalance can cause an illegal state exception

2021-03-12 Thread Walker Carlson (Jira)

Walker Carlson created KAFKA-12462:
--

 Summary: Threads in PENDING_SHUTDOWN entering a rebalance can 
cause an illegal state exception 
 Key: KAFKA-12462
 URL: https://issues.apache.org/jira/browse/KAFKA-12462
 Project: Kafka
  Issue Type: Bug
Reporter: Walker Carlson


A thread was removed, sending it to the PENDING_SHUTDOWN state, but went 
through a rebalance before completing the shutdown.
{code:java}
// [2021-03-07 04:33:39,385] DEBUG [i-07430efc31ad166b7-StreamThread-6] 
stream-thread [i-07430efc31ad166b7-StreamThread-6] Ignoring request to transit 
from PENDING_SHUTDOWN to PARTITIONS_REVOKED: only DEAD state is a valid next 
state (org.apache.kafka.streams.processor.internals.StreamThread)
{code}
Inside StreamsRebalanceListener#onPartitionsRevoked, we have
{code:java}
// 
if (streamThread.setState(State.PARTITIONS_REVOKED) != null && 
!partitions.isEmpty())
taskManager.handleRevocation(partitions);
{code}
Since PENDING_SHUTDOWN → PARTITIONS_REVOKED is a disallowed transition, we 
never invoke TaskManager#handleRevocation. Currently handleRevocation is 
responsible for preparing any active tasks for close, including committing 
offsets and writing the checkpoint as well as suspending the task. We can’t 
close the task in handleRevocation since we still support EAGER rebalancing, 
which invokes handleRevocation at the beginning of a rebalance on all tasks.

The tasks that are actually revoked will be closed during 
TaskManager#handleAssignment . The IllegalStateException is specifically 
because we don’t suspend the task before attempting to close it, and the direct 
transition from RUNNING → CLOSED is forbidden.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Reopened] (KAFKA-12347) Improve Kafka Streams ability to track progress

2021-03-05 Thread Walker Carlson (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-12347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walker Carlson reopened KAFKA-12347:


> Improve Kafka Streams ability to track progress
> ---
>
> Key: KAFKA-12347
> URL: https://issues.apache.org/jira/browse/KAFKA-12347
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>        Reporter: Walker Carlson
>    Assignee: Walker Carlson
>Priority: Major
>  Labels: kip
> Fix For: 3.0.0
>
>
> Add methods to track records being consumed fully and to tell if tasks are 
> idling. This will allow users of streams to build uptime metrics around 
> streams with less difficulty.
> KIP-715: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-715%3A+Expose+Committed+offset+in+streams]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [VOTE] KIP-715: Expose Committed offset in streams

2021-03-02 Thread Walker Carlson

Thank you everyone for voting. This KIP has passed with:

+4 binding votes (Boyang, Sophie, Guozhang and Matthias)
+2 non-binding (Leah and myself)

best,
Walker

On Tue, Mar 2, 2021 at 11:18 AM Matthias J. Sax  wrote:

> +1 (binding)
>
> On 3/1/21 4:23 PM, Guozhang Wang wrote:
> > Thanks Walker for the updated KIP, +1 (binding)
> >
> >
> > Guozhang
> >
> > On Mon, Mar 1, 2021 at 3:47 PM Sophie Blee-Goldman 
> > wrote:
> >
> >> Thanks for the KIP! +1 (binding)
> >>
> >> Sophie
> >>
> >> On Mon, Mar 1, 2021 at 10:04 AM Leah Thomas 
> wrote:
> >>
> >>> Hey Walker,
> >>>
> >>> Thanks for leading this discussion. +1 from me, non-binding
> >>>
> >>> Leah
> >>>
> >>> On Mon, Mar 1, 2021 at 12:37 AM Boyang Chen <
> reluctanthero...@gmail.com>
> >>> wrote:
> >>>
> >>>> Thanks Walker for the proposal, +1 (binding) from me.
> >>>>
> >>>> On Fri, Feb 26, 2021 at 12:42 PM Walker Carlson <
> wcarl...@confluent.io
> >>>
> >>>> wrote:
> >>>>
> >>>>> Hello all,
> >>>>>
> >>>>> I would like to bring KIP-715 to a vote. Here is the KIP:
> >>>>> https://cwiki.apache.org/confluence/x/aRRRCg.
> >>>>>
> >>>>> Walker
> >>>>>
> >>>>
> >>>
> >>
> >
> >
>

Re: [DISCUSS] KIP-715: Expose Committed offset in streams

2021-03-02 Thread Walker Carlson

Okay we can document that if the state is rebalancing that a Task could be
between instances and so no show up for one localThreadMetadata call. but
this should not cause a problem for repeated calls

Bruno, to your questions. The endOffset is like the consumer's
highWatermark and does not require a remote call. It seems his name is
confusing and I should change the name from endOffset to HighWatermark to
match the consumer.

walker

On Tue, Mar 2, 2021 at 4:43 AM Bruno Cadonna  wrote:

> Hi Walker,
>
> Thank you for the KIP!
>
> I somehow agree that we should document that some tasks may be missing.
>
> I have one question/comment. As far as I understand, your KIP adds two
> methods that return data that is actually hosted on the brokers, namely
> committedOffsets() and endOffsets(). Thus, we need a remote call to
> fetch the data and consequently the cost of calling
> localThreadMetaData() might increase substantially. I understand, that
> for committedOffsets(), we could avoid the remote call by maintaining
> the committedOffsets() locally, but can we also avoid the remote call
> for endOffsets()? Should we allow users to pass a parameter to
> localThreadMetaData() that skips the metadata that needs remote calls to
> keep the costs for use cases that do not need the end offsets low?
>
> Best,
> Bruno
>
> On 02.03.21 02:18, Matthias J. Sax wrote:
> >> but the user should
> >> not rely on all tasks being returned at any given time to begin with
> since
> >> it's possible we are in between revoking and re-assigning a partition.
> >
> > Exactly. That is what I meant: the "hand off" phase of partitions during
> > a rebalance. During this phase, some tasks are "missing" if you
> > aggregate the information globally. My point was (even if it might be
> > obvious to us) that it seems to be worth pointing out in the KIPs and in
> > the docs.
> >
> > I meant "partial information" from a global POV (not partial for a
> > single local instance).
> >
> >> Also I mention that they return the highest value they had seen
> >> so far for any tasks they have assigned to them.
> >
> > For the shutdown case maybe, but after a task is closed its metadata
> > should not be returned any longer IMHO.
> >
> >
> > -Matthias
> >
> > On 3/1/21 4:46 PM, Walker Carlson wrote:
> >> I updated to use Optional, good idea Mathias.
> >>
> >> For the localThreadMetadata, it could already be called running a
> >> rebalance. Also I mention that they return the highest value they had
> seen
> >> so far for any tasks they have assigned to them. I thought it would be
> >> useful to see the TaskMetadata while the Threads were shutting down. I
> >> think that there shouldn't really be partial information. If you think
> this
> >> should be clarified better let me know.
> >>
> >> walker
> >>
> >> On Mon, Mar 1, 2021 at 3:45 PM Sophie Blee-Goldman  >
> >> wrote:
> >>
> >>> Can you clarify your second question Matthias? If this is queried
> during
> >>> a cooperative rebalance, it should return the tasks as usual. If the
> user
> >>> is
> >>> using eager rebalancing then this will not return any tasks, but the
> user
> >>> should
> >>> not rely on all tasks being returned at any given time to begin with
> since
> >>> it's
> >>> possible we are in between revoking and re-assigning a partition.
> >>>
> >>> What does "partial information" mean?
> >>>
> >>> (btw I agree that an Optional makes sense for
> timeCurrentIdlingStarted())
> >>>
> >>> On Mon, Mar 1, 2021 at 11:46 AM Matthias J. Sax 
> wrote:
> >>>
> >>>> Thanks the updating the KIP Walker.
> >>>>
> >>>> About, `timeCurrentIdlingStarted()`: should we return an `Optional`
> >>>> instead of `-1` if a task is not idling.
> >>>>
> >>>>
> >>>> As we allow to call `localThreadMetadata()` any time, could it be that
> >>>> we report partial information during a rebalance? If yes, this should
> be
> >>>> pointed out, because if one want to implement a health check this
> needs
> >>>> to be taken into account.
> >>>>
> >>>> -Matthias
> >>>>
> >>>>
> >>>> On 2/27/21 11:32 AM, Walker Carlson wrote:
> >>>>> Sure thing Boyang,
> >>>>>
> >&g

Re: [DISCUSS] KIP-715: Expose Committed offset in streams

2021-03-01 Thread Walker Carlson

I updated to use Optional, good idea Mathias.

For the localThreadMetadata, it could already be called running a
rebalance. Also I mention that they return the highest value they had seen
so far for any tasks they have assigned to them. I thought it would be
useful to see the TaskMetadata while the Threads were shutting down. I
think that there shouldn't really be partial information. If you think this
should be clarified better let me know.

walker

On Mon, Mar 1, 2021 at 3:45 PM Sophie Blee-Goldman 
wrote:

> Can you clarify your second question Matthias? If this is queried during
> a cooperative rebalance, it should return the tasks as usual. If the user
> is
> using eager rebalancing then this will not return any tasks, but the user
> should
> not rely on all tasks being returned at any given time to begin with since
> it's
> possible we are in between revoking and re-assigning a partition.
>
> What does "partial information" mean?
>
> (btw I agree that an Optional makes sense for timeCurrentIdlingStarted())
>
> On Mon, Mar 1, 2021 at 11:46 AM Matthias J. Sax  wrote:
>
> > Thanks the updating the KIP Walker.
> >
> > About, `timeCurrentIdlingStarted()`: should we return an `Optional`
> > instead of `-1` if a task is not idling.
> >
> >
> > As we allow to call `localThreadMetadata()` any time, could it be that
> > we report partial information during a rebalance? If yes, this should be
> > pointed out, because if one want to implement a health check this needs
> > to be taken into account.
> >
> > -Matthias
> >
> >
> > On 2/27/21 11:32 AM, Walker Carlson wrote:
> > > Sure thing Boyang,
> > >
> > > 1) it is in proposed changes. I expanded on it a bit more now.
> > > 2) done
> > > 3) and done :)
> > >
> > > thanks for the suggestions,
> > > walker
> > >
> > > On Fri, Feb 26, 2021 at 3:10 PM Boyang Chen <
> reluctanthero...@gmail.com>
> > > wrote:
> > >
> > >> Thanks Walker. Some minor comments:
> > >>
> > >> 1. Could you add a reference to localThreadMetadata method in the KIP?
> > >> 2. Could you make the code block as a java template, such that
> > >> TaskMetadata.java could be as the template title? Also it would be
> good
> > to
> > >> add some meta comments about the newly added functions.
> > >> 3. Could you write more details about rejected alternatives? Just as
> > why we
> > >> don't choose to expose as metrics, and how a new method on KStream is
> > not
> > >> favorable. These would be valuable when we look back on our design
> > >> decisions.
> > >>
> > >> On Fri, Feb 26, 2021 at 11:23 AM Walker Carlson <
> wcarl...@confluent.io>
> > >> wrote:
> > >>
> > >>> I understand now. I think that is a valid concern but I think it is
> > best
> > >>> solved but having an external service verify through streams. As this
> > KIP
> > >>> is now just adding fields to TaskMetadata to be returned in the
> > >>> threadMetadata I am going to say that is out of scope.
> > >>>
> > >>> That seems to be the last concern. If there are no others I will put
> > this
> > >>> up for a vote soon.
> > >>>
> > >>> walker
> > >>>
> > >>> On Thu, Feb 25, 2021 at 12:35 PM Boyang Chen <
> > reluctanthero...@gmail.com
> > >>>
> > >>> wrote:
> > >>>
> > >>>> For the 3rd point, yes, what I'm proposing is an edge case. For
> > >> example,
> > >>>> when we have 4 tasks [0_0, 0_1, 1_0, 1_1], and a bug in rebalancing
> > >> logic
> > >>>> causing no one gets 1_1 assigned. Then the health check service will
> > >> only
> > >>>> see 3 tasks [0_0, 0_1, 1_0] reporting progress normally while not
> > >> paying
> > >>>> attention to 1_1. What I want to expose is a "logical global" view
> of
> > >> all
> > >>>> the tasks through the stream instance, since each instance gets the
> > >>>> assigned topology and should be able to infer all the exact tasks to
> > be
> > >>> up
> > >>>> and running when the service is healthy.
> > >>>>
> > >>>> On Thu, Feb 25, 2021 at 11:25 AM Walker Carlson <
> > wcarl...@confluent.io
> > >>>
> > >>>

Re: [DISCUSS] KIP-715: Expose Committed offset in streams

2021-02-27 Thread Walker Carlson

Sure thing Boyang,

1) it is in proposed changes. I expanded on it a bit more now.
2) done
3) and done :)

thanks for the suggestions,
walker

On Fri, Feb 26, 2021 at 3:10 PM Boyang Chen 
wrote:

> Thanks Walker. Some minor comments:
>
> 1. Could you add a reference to localThreadMetadata method in the KIP?
> 2. Could you make the code block as a java template, such that
> TaskMetadata.java could be as the template title? Also it would be good to
> add some meta comments about the newly added functions.
> 3. Could you write more details about rejected alternatives? Just as why we
> don't choose to expose as metrics, and how a new method on KStream is not
> favorable. These would be valuable when we look back on our design
> decisions.
>
> On Fri, Feb 26, 2021 at 11:23 AM Walker Carlson 
> wrote:
>
> > I understand now. I think that is a valid concern but I think it is best
> > solved but having an external service verify through streams. As this KIP
> > is now just adding fields to TaskMetadata to be returned in the
> > threadMetadata I am going to say that is out of scope.
> >
> > That seems to be the last concern. If there are no others I will put this
> > up for a vote soon.
> >
> > walker
> >
> > On Thu, Feb 25, 2021 at 12:35 PM Boyang Chen  >
> > wrote:
> >
> > > For the 3rd point, yes, what I'm proposing is an edge case. For
> example,
> > > when we have 4 tasks [0_0, 0_1, 1_0, 1_1], and a bug in rebalancing
> logic
> > > causing no one gets 1_1 assigned. Then the health check service will
> only
> > > see 3 tasks [0_0, 0_1, 1_0] reporting progress normally while not
> paying
> > > attention to 1_1. What I want to expose is a "logical global" view of
> all
> > > the tasks through the stream instance, since each instance gets the
> > > assigned topology and should be able to infer all the exact tasks to be
> > up
> > > and running when the service is healthy.
> > >
> > > On Thu, Feb 25, 2021 at 11:25 AM Walker Carlson  >
> > > wrote:
> > >
> > > > Thanks for the follow up Boyang and Guozhang,
> > > >
> > > > I have updated the kip to include these ideas.
> > > >
> > > > Guozhang, that is a good idea about using the TaskMetadata. We can
> get
> > it
> > > > through the ThreadMetadata with a minor change to
> `localThreadMetadata`
> > > in
> > > > kafkaStreams. This means that we will only need to update
> TaskMetadata
> > > and
> > > > add no other APIs
> > > >
> > > > Boyang, since each TaskMetadata contains the TaskId and
> > TopicPartitions I
> > > > don't believe mapping either way will be a problem. Also I think we
> can
> > > do
> > > > something like record the time the task started idling and when it
> > stops
> > > > idling we can override it to -1. I think that should clear up the
> first
> > > two
> > > > points.
> > > >
> > > > As for your third point I am not sure I 100% understand. The
> > > ThreadMetadata
> > > > will contain a set of all task assigned to that thread. Any health
> > check
> > > > service will just need to query all clients and aggregate their
> > responses
> > > > to get a complete picture of all tasks correct?
> > > >
> > > > walker
> > > >
> > > > On Thu, Feb 25, 2021 at 9:57 AM Guozhang Wang 
> > > wrote:
> > > >
> > > > > Regarding the second API and the `TaskStatus` class: I'd suggest we
> > > > > consolidate on the existing `TaskMetadata` since we have already
> > > > > accumulated a bunch of such classes, and its better to keep them
> > small
> > > as
> > > > > public APIs. You can see
> > > > https://issues.apache.org/jira/browse/KAFKA-12370
> > > > > for a reference and a proposal.
> > > > >
> > > > > On Thu, Feb 25, 2021 at 9:40 AM Boyang Chen <
> > > reluctanthero...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Thanks for the updates Walker. Some replies and follow-up
> > questions:
> > > > > >
> > > > > > 1. I agree one task could have multiple partitions, but when we
> > hit a
> > > > > delay
> > > > > > in terms of offset progress, do we have a convenient way to
> reverse
> > > > > mapping
&

[VOTE] KIP-715: Expose Committed offset in streams

2021-02-26 Thread Walker Carlson

Hello all,

I would like to bring KIP-715 to a vote. Here is the KIP:
https://cwiki.apache.org/confluence/x/aRRRCg.

Walker

Re: [DISCUSS] KIP-715: Expose Committed offset in streams

2021-02-26 Thread Walker Carlson

I understand now. I think that is a valid concern but I think it is best
solved but having an external service verify through streams. As this KIP
is now just adding fields to TaskMetadata to be returned in the
threadMetadata I am going to say that is out of scope.

That seems to be the last concern. If there are no others I will put this
up for a vote soon.

walker

On Thu, Feb 25, 2021 at 12:35 PM Boyang Chen 
wrote:

> For the 3rd point, yes, what I'm proposing is an edge case. For example,
> when we have 4 tasks [0_0, 0_1, 1_0, 1_1], and a bug in rebalancing logic
> causing no one gets 1_1 assigned. Then the health check service will only
> see 3 tasks [0_0, 0_1, 1_0] reporting progress normally while not paying
> attention to 1_1. What I want to expose is a "logical global" view of all
> the tasks through the stream instance, since each instance gets the
> assigned topology and should be able to infer all the exact tasks to be up
> and running when the service is healthy.
>
> On Thu, Feb 25, 2021 at 11:25 AM Walker Carlson 
> wrote:
>
> > Thanks for the follow up Boyang and Guozhang,
> >
> > I have updated the kip to include these ideas.
> >
> > Guozhang, that is a good idea about using the TaskMetadata. We can get it
> > through the ThreadMetadata with a minor change to `localThreadMetadata`
> in
> > kafkaStreams. This means that we will only need to update TaskMetadata
> and
> > add no other APIs
> >
> > Boyang, since each TaskMetadata contains the TaskId and TopicPartitions I
> > don't believe mapping either way will be a problem. Also I think we can
> do
> > something like record the time the task started idling and when it stops
> > idling we can override it to -1. I think that should clear up the first
> two
> > points.
> >
> > As for your third point I am not sure I 100% understand. The
> ThreadMetadata
> > will contain a set of all task assigned to that thread. Any health check
> > service will just need to query all clients and aggregate their responses
> > to get a complete picture of all tasks correct?
> >
> > walker
> >
> > On Thu, Feb 25, 2021 at 9:57 AM Guozhang Wang 
> wrote:
> >
> > > Regarding the second API and the `TaskStatus` class: I'd suggest we
> > > consolidate on the existing `TaskMetadata` since we have already
> > > accumulated a bunch of such classes, and its better to keep them small
> as
> > > public APIs. You can see
> > https://issues.apache.org/jira/browse/KAFKA-12370
> > > for a reference and a proposal.
> > >
> > > On Thu, Feb 25, 2021 at 9:40 AM Boyang Chen <
> reluctanthero...@gmail.com>
> > > wrote:
> > >
> > > > Thanks for the updates Walker. Some replies and follow-up questions:
> > > >
> > > > 1. I agree one task could have multiple partitions, but when we hit a
> > > delay
> > > > in terms of offset progress, do we have a convenient way to reverse
> > > mapping
> > > > TopicPartition to the problematic task? In production, I believe it
> > would
> > > > be much quicker to identify the problem using task.id instead of
> topic
> > > > partition, especially when it points to an internal topic. I think
> > having
> > > > the task id as part of the entry value seems useful, which means
> > getting
> > > > something like Map where TaskProgress
> > > > contains both committed offsets & task id.
> > > >
> > > > 2. The task idling API was still confusing. I don't think we care
> about
> > > the
> > > > exact state when making tasksIdling()query, instead we care more
> about
> > > how
> > > > long one task has been in idle state since when you called, which
> > > reflects
> > > > whether it is a normal idling period. So I feel it might be helpful
> to
> > > > track that time difference and report it in the TaskStatus struct.
> > > >
> > > > 3. What I want to achieve to have some global mapping of either
> > > > TopicPartition or TaskId was that it is not possible for a health
> check
> > > > service to report a task failure that doesn't emit any metrics. So as
> > > long
> > > > as we have a global topic partition API, health check could always be
> > > aware
> > > > of any task/partition not reporting its progress, does that make
> sense?
> > > If
> > > > you feel we have a better way to achieve this, such as querying all
> the
> > > > input/in

[jira] [Resolved] (KAFKA-12362) Determine if a Task is idling

2021-02-26 Thread Walker Carlson (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-12362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walker Carlson resolved KAFKA-12362.

Resolution: Abandoned

> Determine if a Task is idling
> -
>
> Key: KAFKA-12362
> URL: https://issues.apache.org/jira/browse/KAFKA-12362
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>        Reporter: Walker Carlson
>Priority: Major
> Fix For: 3.0.0
>
>
> determine if a task is idling given the task Id.
>  
> https://github.com/apache/kafka/pull/10180



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] KIP-715: Expose Committed offset in streams

2021-02-25 Thread Walker Carlson

Thanks for the follow up Boyang and Guozhang,

I have updated the kip to include these ideas.

Guozhang, that is a good idea about using the TaskMetadata. We can get it
through the ThreadMetadata with a minor change to `localThreadMetadata` in
kafkaStreams. This means that we will only need to update TaskMetadata and
add no other APIs

Boyang, since each TaskMetadata contains the TaskId and TopicPartitions I
don't believe mapping either way will be a problem. Also I think we can do
something like record the time the task started idling and when it stops
idling we can override it to -1. I think that should clear up the first two
points.

As for your third point I am not sure I 100% understand. The ThreadMetadata
will contain a set of all task assigned to that thread. Any health check
service will just need to query all clients and aggregate their responses
to get a complete picture of all tasks correct?

walker

On Thu, Feb 25, 2021 at 9:57 AM Guozhang Wang  wrote:

> Regarding the second API and the `TaskStatus` class: I'd suggest we
> consolidate on the existing `TaskMetadata` since we have already
> accumulated a bunch of such classes, and its better to keep them small as
> public APIs. You can see https://issues.apache.org/jira/browse/KAFKA-12370
> for a reference and a proposal.
>
> On Thu, Feb 25, 2021 at 9:40 AM Boyang Chen 
> wrote:
>
> > Thanks for the updates Walker. Some replies and follow-up questions:
> >
> > 1. I agree one task could have multiple partitions, but when we hit a
> delay
> > in terms of offset progress, do we have a convenient way to reverse
> mapping
> > TopicPartition to the problematic task? In production, I believe it would
> > be much quicker to identify the problem using task.id instead of topic
> > partition, especially when it points to an internal topic. I think having
> > the task id as part of the entry value seems useful, which means getting
> > something like Map where TaskProgress
> > contains both committed offsets & task id.
> >
> > 2. The task idling API was still confusing. I don't think we care about
> the
> > exact state when making tasksIdling()query, instead we care more about
> how
> > long one task has been in idle state since when you called, which
> reflects
> > whether it is a normal idling period. So I feel it might be helpful to
> > track that time difference and report it in the TaskStatus struct.
> >
> > 3. What I want to achieve to have some global mapping of either
> > TopicPartition or TaskId was that it is not possible for a health check
> > service to report a task failure that doesn't emit any metrics. So as
> long
> > as we have a global topic partition API, health check could always be
> aware
> > of any task/partition not reporting its progress, does that make sense?
> If
> > you feel we have a better way to achieve this, such as querying all the
> > input/intermediate topic metadata directly from Kafka for the baseline, I
> > think that should be good as well and worth mentioning it in the KIP.
> >
> > Also it seems that the KIP hasn't reflected what you proposed for the
> task
> > idling status.
> >
> > Best,
> > Boyang
> >
> >
> > On Wed, Feb 24, 2021 at 9:11 AM Walker Carlson 
> > wrote:
> >
> > > Thank you for the comments everyone!
> > >
> > > I think there are a few things I can clear up in general then I will
> > > specifically respond to each question.
> > >
> > > First, when I say "idling" I refer to task idling. Where the stream is
> > > intentionally not making progress. (
> > > https://issues.apache.org/jira/browse/KAFKA-10091 is an example). This
> > > becomes relevant if a task is waiting on one partition with no data but
> > > that is holding up a partition with data. That would cause one just
> > looking
> > > at the committed offset changes to believe the task has a problem when
> it
> > > is working as intended.
> > >
> > > In light of this confusion. I plan to change tasksIdling() to
> > `Map > > TaskStatus> getTasksStatus()` this should hopefully make it more clear
> > what
> > > is being exposed.
> > >
> > > TaskStatus would include: TopicPartions, TaskId, ProcessorTopology,
> > Idling,
> > > and State.
> > >
> > > Boyang:
> > >
> > > 2) I think that each task should report on whatever TopicPartitions
> they
> > > hold, this means a Topic Partition might get reported twice but the
> user
> > > can roll those up and use the larger one when looking at the whole

Re: [DISCUSS] KIP-715: Expose Committed offset in streams

2021-02-24 Thread Walker Carlson

y brainstorming was whether it makes sense for the leader
> instance
> > to report the task progress as -1 for all “supposed to be running” tasks,
> > so that on the metrics collector side it could catch any missing tasks.
> >
> > 5. It seems not clear how users should use `isTaskIdling`. Why not
> report a
> > map/set for idling tasks just as what we did for committed offsets?
> >
> > 6. Why do we use TopicPartition instead of TaskId as the key in the
> > returned map?
> > 7. Could we include some details in where we got the commit offsets for
> > each task? Is it through consumer offset fetch, or the stream processing
> > progress based on the records fetched?
> >
> >
> > On Mon, Feb 22, 2021 at 3:00 PM Walker Carlson 
> > wrote:
> >
> > > Hello all,
> > >
> > > I would like to start discussion on KIP-715. This kip aims to make it
> > > easier to monitor Kafka Streams progress by exposing the committed
> offset
> > > in a similar way as the consumer client does.
> > >
> > > Here is the KIP: https://cwiki.apache.org/confluence/x/aRRRCg
> > >
> > > Best,
> > > Walker
> > >
> >
>
>
> --
> -- Guozhang
>

[DISCUSS] KIP-715: Expose Committed offset in streams

2021-02-22 Thread Walker Carlson

Hello all,

I would like to start discussion on KIP-715. This kip aims to make it
easier to monitor Kafka Streams progress by exposing the committed offset
in a similar way as the consumer client does.

Here is the KIP: https://cwiki.apache.org/confluence/x/aRRRCg

Best,
Walker

[jira] [Created] (KAFKA-12362) Determine if a Task is idling

2021-02-22 Thread Walker Carlson (Jira)

Walker Carlson created KAFKA-12362:
--

 Summary: Determine if a Task is idling
 Key: KAFKA-12362
 URL: https://issues.apache.org/jira/browse/KAFKA-12362
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Reporter: Walker Carlson
 Fix For: 3.0.0


determine if a task is idling given the task Id.

 

https://github.com/apache/kafka/pull/10180



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (KAFKA-12347) Improve Kafka Streams ability to track progress

2021-02-19 Thread Walker Carlson (Jira)

Walker Carlson created KAFKA-12347:
--

 Summary: Improve Kafka Streams ability to track progress
 Key: KAFKA-12347
 URL: https://issues.apache.org/jira/browse/KAFKA-12347
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Reporter: Walker Carlson
Assignee: Walker Carlson
 Fix For: 3.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (KAFKA-4640) Improve Streams unit test coverage

2021-02-03 Thread Walker Carlson (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-4640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walker Carlson resolved KAFKA-4640.
---
Resolution: Fixed

> Improve Streams unit test coverage
> --
>
> Key: KAFKA-4640
> URL: https://issues.apache.org/jira/browse/KAFKA-4640
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.11.0.0
>Reporter: Damian Guy
>Assignee: Damian Guy
>Priority: Minor
> Attachments: streams-coverage.zip
>
>
> There are some important methods in streams that are lacking good unit-test 
> coverage. Whilst we shouldn't strive to get 100% coverage, we should do our 
> best to ensure sure that all important code paths are covered by unit-tests.
> For contributors: you can first run {{./gradlew streams:reportCoverage}} to 
> get the report, which will then accessible in 
> {{streams/build/reports/jacoco/test/html/index.html}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (KAFKA-10015) React Smartly to Unexpected Errors on Stream Threads

2021-01-29 Thread Walker Carlson (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walker Carlson resolved KAFKA-10015.

Resolution: Fixed

> React Smartly to Unexpected Errors on Stream Threads
> 
>
> Key: KAFKA-10015
> URL: https://issues.apache.org/jira/browse/KAFKA-10015
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 2.5.0
>Reporter: Bruno Cadonna
>    Assignee: Walker Carlson
>Priority: Major
>  Labels: kip
>
> Currently, if an unexpected error occurs on a stream thread, the stream 
> thread dies, a rebalance is triggered, and the Streams' client continues to 
> run with less stream threads.
> Some errors trigger a cascading of stream thread death, i.e., after the 
> rebalance that resulted from the death of the first thread the next thread 
> dies, then a rebalance is triggered, the next thread dies, and so forth until 
> all stream threads are dead and the instance shuts down. Such a chain of 
> rebalances could be avoided if an error could be recognized as the cause of 
> cascading stream deaths and as a consequence the Streams' client could be 
> shut down after the first stream thread death.
> On the other hand, some unexpected errors are transient and the stream thread 
> could safely be restarted without causing further errors and without the need 
> to restart the Streams' client.
> The goal of this ticket is to classify errors and to automatically react to 
> the errors in a way to avoid cascading deaths and to recover stream threads 
> if possible.
> KIP-663: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-663%3A+API+to+Start+and+Shut+Down+Stream+Threads]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (KAFKA-12247) Make removeStreamThread work better with static membership

2021-01-28 Thread Walker Carlson (Jira)

Walker Carlson created KAFKA-12247:
--

 Summary: Make removeStreamThread work better with static membership
 Key: KAFKA-12247
 URL: https://issues.apache.org/jira/browse/KAFKA-12247
 Project: Kafka
  Issue Type: Improvement
Reporter: Walker Carlson
Assignee: Walker Carlson
 Fix For: 2.8.0


Ensure that calling removeStreamThread make the thread leave the group right 
away instead of waiting for the timeout.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (KAFKA-12184) Once the Error Classification is better update the default streams uncaught exception handler to replace threads

2021-01-12 Thread Walker Carlson (Jira)

Walker Carlson created KAFKA-12184:
--

 Summary: Once the Error Classification is better update the 
default streams uncaught exception handler to replace threads
 Key: KAFKA-12184
 URL: https://issues.apache.org/jira/browse/KAFKA-12184
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Reporter: Walker Carlson






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] KIP-696: Update Streams FSM to clarify ERROR state meaning

2020-12-20 Thread Walker Carlson

Thanks for the comets Matthias.

I respond inline below.

Thanks,
walker


On Sat, Dec 19, 2020 at 11:35 AM Matthias J. Sax  wrote:

> Overall LGTM.
>
> A few minor comments:
>
> > The SHUTDOWN_CLIENT option in the Streams Uncaught Exception Handler
> should leave the client state in ERROR instead of NOT_RUNNING
>
> and
>
> > In order to be consistent, SHUTDOWN_CLIENT will leave the client state
> in ERROR instead of NOT_RUNNING
>
> Both should also apply to SHUTDOWN_APPLICATION? If not, why?
>

I totally agree. This is already the behavior of SHUTDOWN_APPLICATION we
are just bringing SHUTDOWN_CLIENT to match.


>
>
> > Close() called on ERROR or PENDING_ERROR will be idempotent
>
> Should we replae `idempotent` with `a no-op`, because the difference is
> that `close()` would normally transit to NOT_RUNNING?
>

That is fine with me.


>
> The Jira links to 3 tickets, but uses different markup. That is
> confusing. Also, I actually believe that 6520 is unrelated?
>

I'll fix the mark up, I think that this kip changes how that 6520 is
handled but it probably doesn't need to be on the list. I have added a
comment on the ticket to notify anyone that related behavior has changed


>
>
> -Matthias
>
>
> On 12/10/20 1:52 AM, Bruno Cadonna wrote:
> > Thanks for the KIP, Walker!
> >
> > The KIP looks good to me. I have just a minor comment about the KIP
> > document.
> >
> > You talk about SHUTDOWN_CLIENT in the KIP, but never explain that it is
> > a possible action that can be taken in the Streams uncaught exception
> > handler. Could you please clarify that?
> >
> > Best,
> > Bruno
> >
> > On 09.12.20 19:04, Walker Carlson wrote:
> >> Thanks for the comments. If there are no further concerns I would like
> to
> >> call for a vote on KIP-696 to clarify and clean up the Streams State
> >> Machine.
> >>
> >> walker
> >>
> >> On Wed, Dec 9, 2020 at 8:50 AM John Roesler 
> wrote:
> >>
> >>> Thanks, Walker!
> >>>
> >>> Your proposal looks good to me.
> >>>
> >>> -John
> >>>
> >>> On Tue, 2020-12-08 at 18:29 -0800, Walker Carlson wrote:
> >>>> Thanks for the feedback Guozhang!
> >>>>
> >>>> I clarified some of the points in the Proposed Changes section so
> >>> hopefully
> >>>> it will be more clear what is going on now. I also agree with your
> >>>> suggestion about the possible call to close() on ERROR so I added this
> >>>> line.
> >>>> "Close() called on ERROR will be idempotent and not throw an
> exception,
> >>> but
> >>>> we will log a warning."
> >>>>
> >>>> I have linked those tickets and I will leave a comment trying to
> >>>> explain
> >>>> how these changes will affect their issue.
> >>>>
> >>>> walker
> >>>>
> >>>> On Tue, Dec 8, 2020 at 4:57 PM Guozhang Wang 
> >>>> wrote:
> >>>>
> >>>>> Hello Walker,
> >>>>>
> >>>>> Thanks for the KIP! Overall it looks reasonable to me. Just a few
> >>>>> minor
> >>>>> comments for the wiki page itself:
> >>>>>
> >>>>> 1) Could you clarify the conditions when RUNNING / REBALANCING ->
> >>>>> PENDING_ERROR will happen; and when PENDING_ERROR -> ERROR will
> >>>>> happen.
> >>>>> E.g. when I read "Streams will only reach ERROR state in the event of
> >>> an
> >>>>> exceptional failure in which the `StreamsUncaughtExceptionHandler`
> >>> chose to
> >>>>> either shutdown the application or the client." I thought the first
> >>>>> transition would happen before the handler, and the second transition
> >>> would
> >>>>> happen immediately after the handler returns "shutdown client" or
> >>> "shutdown
> >>>>> application", until I read the last statement regarding
> >>> "SHUTDOWN_CLIENT".
> >>>>>
> >>>>> 2) A compatibility issue: today it is possible that users would call
> >>>>> Streams APIs like shutdown in the global state transition listener.
> >>>>> And
> >>>>> it's common to try shutting down the application automatically when
> >>>>> transiting to ERROR (assuming it was not a terminating state). I
> think
> >>> we
> >>>>> could consider making this call a no-op and log a warning.
> >>>>>
> >>>>> 3) Could you link the following JIRAs in the "JIRA" field?
> >>>>>
> >>>>> https://issues.apache.org/jira/browse/KAFKA-10555
> >>>>> https://issues.apache.org/jira/browse/KAFKA-9638
> >>>>> https://issues.apache.org/jira/browse/KAFKA-6520
> >>>>>
> >>>>> And maybe we can also left a comment on those tickets explaining what
> >>> would
> >>>>> happen to tackle the issues after this KIP.
> >>>>>
> >>>>>
> >>>>> Guozhang
> >>>>>
> >>>>>
> >>>>> On Tue, Dec 8, 2020 at 12:16 PM Walker Carlson <
> wcarl...@confluent.io>
> >>>>> wrote:
> >>>>>
> >>>>>> Hello all,
> >>>>>>
> >>>>>> I'd like to propose KIP-696 to clarify the meaning of ERROR state in
> >>> the
> >>>>>> KafkaStreams Client State Machine. This will update the States to be
> >>>>>> consistent with changes in KIP-671 and KIP-663.
> >>>>>>
> >>>>>> Here are the details: https://cwiki.apache.org/confluence/x/lCvZCQ
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Walker
> >>>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> -- Guozhang
> >>>>>
> >>>
> >>>
> >>>
> >>
>

Re: [VOTE] KIP-696: Update Streams FSM to clarify ERROR state meaning

2020-12-14 Thread Walker Carlson

Thank you everyone, KIP-696 has passed with 3 binding votes (Guozhang, John
and Sophie) and 2 non-binding votes (Leah and Bruno)

walker

On Thu, Dec 10, 2020 at 11:00 AM Sophie Blee-Goldman 
wrote:

> KIP looks good to me, thanks Walker!
>
> +1 (binding)
>
> -Sophie
>
> On Thu, Dec 10, 2020 at 1:53 AM Bruno Cadonna  wrote:
>
> > Thanks, Walker!
> >
> > +1 (non-binding)
> >
> > Best,
> > Bruno
> >
> > On 09.12.20 20:07, Leah Thomas wrote:
> > > Looks good, thanks Walker! +1 (non-binding)
> > >
> > > Leah
> > >
> > > On Wed, Dec 9, 2020 at 1:04 PM John Roesler 
> wrote:
> > >
> > >> Thanks, Walker!
> > >>
> > >> I'm also +1 (binding)
> > >>
> > >> -John
> > >>
> > >> On Wed, 2020-12-09 at 11:03 -0800, Guozhang Wang wrote:
> > >>> +1. Thanks Walker.
> > >>>
> > >>> On Wed, Dec 9, 2020 at 10:58 AM Walker Carlson <
> wcarl...@confluent.io>
> > >>> wrote:
> > >>>
> > >>>> Sorry I forgot to change the subject line to vote.
> > >>>>
> > >>>> Thanks for the comments. If there are no further concerns I would
> like
> > >> to
> > >>>> call for a vote on KIP-696 to clarify and clean up the Streams State
> > >>>> Machine.
> > >>>>
> > >>>> On Wed, Dec 9, 2020 at 10:04 AM Walker Carlson <
> wcarl...@confluent.io
> > >
> > >>>> wrote:
> > >>>>
> > >>>>> Thanks for the comments. If there are no further concerns I would
> > >> like to
> > >>>>> call for a vote on KIP-696 to clarify and clean up the Streams
> State
> > >>>>> Machine.
> > >>>>>
> > >>>>> walker
> > >>>>>
> > >>>>> On Wed, Dec 9, 2020 at 8:50 AM John Roesler 
> > >> wrote:
> > >>>>>
> > >>>>>> Thanks, Walker!
> > >>>>>>
> > >>>>>> Your proposal looks good to me.
> > >>>>>>
> > >>>>>> -John
> > >>>>>>
> > >>>>>> On Tue, 2020-12-08 at 18:29 -0800, Walker Carlson wrote:
> > >>>>>>> Thanks for the feedback Guozhang!
> > >>>>>>>
> > >>>>>>> I clarified some of the points in the Proposed Changes section so
> > >>>>>> hopefully
> > >>>>>>> it will be more clear what is going on now. I also agree with
> > >> your
> > >>>>>>> suggestion about the possible call to close() on ERROR so I
> > >> added this
> > >>>>>>> line.
> > >>>>>>> "Close() called on ERROR will be idempotent and not throw an
> > >>>> exception,
> > >>>>>> but
> > >>>>>>> we will log a warning."
> > >>>>>>>
> > >>>>>>> I have linked those tickets and I will leave a comment trying to
> > >>>> explain
> > >>>>>>> how these changes will affect their issue.
> > >>>>>>>
> > >>>>>>> walker
> > >>>>>>>
> > >>>>>>> On Tue, Dec 8, 2020 at 4:57 PM Guozhang Wang  > >>>
> > >>>>>> wrote:
> > >>>>>>>
> > >>>>>>>> Hello Walker,
> > >>>>>>>>
> > >>>>>>>> Thanks for the KIP! Overall it looks reasonable to me. Just a
> > >> few
> > >>>>>> minor
> > >>>>>>>> comments for the wiki page itself:
> > >>>>>>>>
> > >>>>>>>> 1) Could you clarify the conditions when RUNNING / REBALANCING
> > >> ->
> > >>>>>>>> PENDING_ERROR will happen; and when PENDING_ERROR -> ERROR will
> > >>>>>> happen.
> > >>>>>>>> E.g. when I read "Streams will only reach ERROR state in the
> > >> event
> > >>>> of
> > >>>>>> an
> > >>>>>>>> exceptional failure in which the
> > >> `StreamsUncaughtExceptionHandler`
> >

Re: [VOTE] KIP-695: Improve Streams Time Synchronization

2020-12-11 Thread Walker Carlson

Thanks for the KIP!

+1 (non-binding)

walker

On Wed, Dec 9, 2020 at 11:40 AM Bruno Cadonna  wrote:

> Thanks for the KIP, John!
>
> +1 (non-binding)
>
> Best,
> Bruno
>
> On 08.12.20 18:03, John Roesler wrote:
> > Hello all,
> >
> > There hasn't been much discussion on KIP-695 so far, so I'd
> > like to go ahead and call for a vote.
> >
> > As a reminder, the purpose of KIP-695 to improve on the
> > "task idling" feature we introduced in KIP-353. This KIP
> > will allow Streams to offer deterministic time semantics in
> > join-type topologies. For example, it makes sure that
> > when you join two topics, that we collate the topics by
> > timestamp. That was always the intent with task idling (KIP-
> > 353), but it turns out the previous mechanism couldn't
> > provide the desired semantics.
> >
> > The details are here:
> > https://cwiki.apache.org/confluence/x/JSXZCQ
> >
> > Thanks,
> > -John
> >
>

Re: [VOTE] KIP-696: Update Streams FSM to clarify ERROR state meaning

2020-12-09 Thread Walker Carlson

Sorry I forgot to change the subject line to vote.

Thanks for the comments. If there are no further concerns I would like to
call for a vote on KIP-696 to clarify and clean up the Streams State
Machine.

On Wed, Dec 9, 2020 at 10:04 AM Walker Carlson 
wrote:

> Thanks for the comments. If there are no further concerns I would like to
> call for a vote on KIP-696 to clarify and clean up the Streams State
> Machine.
>
> walker
>
> On Wed, Dec 9, 2020 at 8:50 AM John Roesler  wrote:
>
>> Thanks, Walker!
>>
>> Your proposal looks good to me.
>>
>> -John
>>
>> On Tue, 2020-12-08 at 18:29 -0800, Walker Carlson wrote:
>> > Thanks for the feedback Guozhang!
>> >
>> > I clarified some of the points in the Proposed Changes section so
>> hopefully
>> > it will be more clear what is going on now. I also agree with your
>> > suggestion about the possible call to close() on ERROR so I added this
>> > line.
>> > "Close() called on ERROR will be idempotent and not throw an exception,
>> but
>> > we will log a warning."
>> >
>> > I have linked those tickets and I will leave a comment trying to explain
>> > how these changes will affect their issue.
>> >
>> > walker
>> >
>> > On Tue, Dec 8, 2020 at 4:57 PM Guozhang Wang 
>> wrote:
>> >
>> > > Hello Walker,
>> > >
>> > > Thanks for the KIP! Overall it looks reasonable to me. Just a few
>> minor
>> > > comments for the wiki page itself:
>> > >
>> > > 1) Could you clarify the conditions when RUNNING / REBALANCING ->
>> > > PENDING_ERROR will happen; and when PENDING_ERROR -> ERROR will
>> happen.
>> > > E.g. when I read "Streams will only reach ERROR state in the event of
>> an
>> > > exceptional failure in which the `StreamsUncaughtExceptionHandler`
>> chose to
>> > > either shutdown the application or the client." I thought the first
>> > > transition would happen before the handler, and the second transition
>> would
>> > > happen immediately after the handler returns "shutdown client" or
>> "shutdown
>> > > application", until I read the last statement regarding
>> "SHUTDOWN_CLIENT".
>> > >
>> > > 2) A compatibility issue: today it is possible that users would call
>> > > Streams APIs like shutdown in the global state transition listener.
>> And
>> > > it's common to try shutting down the application automatically when
>> > > transiting to ERROR (assuming it was not a terminating state). I
>> think we
>> > > could consider making this call a no-op and log a warning.
>> > >
>> > > 3) Could you link the following JIRAs in the "JIRA" field?
>> > >
>> > > https://issues.apache.org/jira/browse/KAFKA-10555
>> > > https://issues.apache.org/jira/browse/KAFKA-9638
>> > > https://issues.apache.org/jira/browse/KAFKA-6520
>> > >
>> > > And maybe we can also left a comment on those tickets explaining what
>> would
>> > > happen to tackle the issues after this KIP.
>> > >
>> > >
>> > > Guozhang
>> > >
>> > >
>> > > On Tue, Dec 8, 2020 at 12:16 PM Walker Carlson > >
>> > > wrote:
>> > >
>> > > > Hello all,
>> > > >
>> > > > I'd like to propose KIP-696 to clarify the meaning of ERROR state
>> in the
>> > > > KafkaStreams Client State Machine. This will update the States to be
>> > > > consistent with changes in KIP-671 and KIP-663.
>> > > >
>> > > > Here are the details: https://cwiki.apache.org/confluence/x/lCvZCQ
>> > > >
>> > > > Thanks,
>> > > > Walker
>> > > >
>> > >
>> > >
>> > > --
>> > > -- Guozhang
>> > >
>>
>>
>>

Re: [DISCUSS] KIP-696: Update Streams FSM to clarify ERROR state meaning

2020-12-09 Thread Walker Carlson

Thanks for the comments. If there are no further concerns I would like to
call for a vote on KIP-696 to clarify and clean up the Streams State
Machine.

walker

On Wed, Dec 9, 2020 at 8:50 AM John Roesler  wrote:

> Thanks, Walker!
>
> Your proposal looks good to me.
>
> -John
>
> On Tue, 2020-12-08 at 18:29 -0800, Walker Carlson wrote:
> > Thanks for the feedback Guozhang!
> >
> > I clarified some of the points in the Proposed Changes section so
> hopefully
> > it will be more clear what is going on now. I also agree with your
> > suggestion about the possible call to close() on ERROR so I added this
> > line.
> > "Close() called on ERROR will be idempotent and not throw an exception,
> but
> > we will log a warning."
> >
> > I have linked those tickets and I will leave a comment trying to explain
> > how these changes will affect their issue.
> >
> > walker
> >
> > On Tue, Dec 8, 2020 at 4:57 PM Guozhang Wang  wrote:
> >
> > > Hello Walker,
> > >
> > > Thanks for the KIP! Overall it looks reasonable to me. Just a few minor
> > > comments for the wiki page itself:
> > >
> > > 1) Could you clarify the conditions when RUNNING / REBALANCING ->
> > > PENDING_ERROR will happen; and when PENDING_ERROR -> ERROR will happen.
> > > E.g. when I read "Streams will only reach ERROR state in the event of
> an
> > > exceptional failure in which the `StreamsUncaughtExceptionHandler`
> chose to
> > > either shutdown the application or the client." I thought the first
> > > transition would happen before the handler, and the second transition
> would
> > > happen immediately after the handler returns "shutdown client" or
> "shutdown
> > > application", until I read the last statement regarding
> "SHUTDOWN_CLIENT".
> > >
> > > 2) A compatibility issue: today it is possible that users would call
> > > Streams APIs like shutdown in the global state transition listener. And
> > > it's common to try shutting down the application automatically when
> > > transiting to ERROR (assuming it was not a terminating state). I think
> we
> > > could consider making this call a no-op and log a warning.
> > >
> > > 3) Could you link the following JIRAs in the "JIRA" field?
> > >
> > > https://issues.apache.org/jira/browse/KAFKA-10555
> > > https://issues.apache.org/jira/browse/KAFKA-9638
> > > https://issues.apache.org/jira/browse/KAFKA-6520
> > >
> > > And maybe we can also left a comment on those tickets explaining what
> would
> > > happen to tackle the issues after this KIP.
> > >
> > >
> > > Guozhang
> > >
> > >
> > > On Tue, Dec 8, 2020 at 12:16 PM Walker Carlson 
> > > wrote:
> > >
> > > > Hello all,
> > > >
> > > > I'd like to propose KIP-696 to clarify the meaning of ERROR state in
> the
> > > > KafkaStreams Client State Machine. This will update the States to be
> > > > consistent with changes in KIP-671 and KIP-663.
> > > >
> > > > Here are the details: https://cwiki.apache.org/confluence/x/lCvZCQ
> > > >
> > > > Thanks,
> > > > Walker
> > > >
> > >
> > >
> > > --
> > > -- Guozhang
> > >
>
>
>

Re: [DISCUSS] KIP-696: Update Streams FSM to clarify ERROR state meaning

2020-12-08 Thread Walker Carlson

Thanks for the feedback Guozhang!

I clarified some of the points in the Proposed Changes section so hopefully
it will be more clear what is going on now. I also agree with your
suggestion about the possible call to close() on ERROR so I added this
line.
"Close() called on ERROR will be idempotent and not throw an exception, but
we will log a warning."

I have linked those tickets and I will leave a comment trying to explain
how these changes will affect their issue.

walker

On Tue, Dec 8, 2020 at 4:57 PM Guozhang Wang  wrote:

> Hello Walker,
>
> Thanks for the KIP! Overall it looks reasonable to me. Just a few minor
> comments for the wiki page itself:
>
> 1) Could you clarify the conditions when RUNNING / REBALANCING ->
> PENDING_ERROR will happen; and when PENDING_ERROR -> ERROR will happen.
> E.g. when I read "Streams will only reach ERROR state in the event of an
> exceptional failure in which the `StreamsUncaughtExceptionHandler` chose to
> either shutdown the application or the client." I thought the first
> transition would happen before the handler, and the second transition would
> happen immediately after the handler returns "shutdown client" or "shutdown
> application", until I read the last statement regarding "SHUTDOWN_CLIENT".
>
> 2) A compatibility issue: today it is possible that users would call
> Streams APIs like shutdown in the global state transition listener. And
> it's common to try shutting down the application automatically when
> transiting to ERROR (assuming it was not a terminating state). I think we
> could consider making this call a no-op and log a warning.
>
> 3) Could you link the following JIRAs in the "JIRA" field?
>
> https://issues.apache.org/jira/browse/KAFKA-10555
> https://issues.apache.org/jira/browse/KAFKA-9638
> https://issues.apache.org/jira/browse/KAFKA-6520
>
> And maybe we can also left a comment on those tickets explaining what would
> happen to tackle the issues after this KIP.
>
>
> Guozhang
>
>
> On Tue, Dec 8, 2020 at 12:16 PM Walker Carlson 
> wrote:
>
> > Hello all,
> >
> > I'd like to propose KIP-696 to clarify the meaning of ERROR state in the
> > KafkaStreams Client State Machine. This will update the States to be
> > consistent with changes in KIP-671 and KIP-663.
> >
> > Here are the details: https://cwiki.apache.org/confluence/x/lCvZCQ
> >
> > Thanks,
> > Walker
> >
>
>
> --
> -- Guozhang
>

[DISCUSS] KIP-696: Update Streams FSM to clarify ERROR state meaning

2020-12-08 Thread Walker Carlson

Hello all,

I'd like to propose KIP-696 to clarify the meaning of ERROR state in the
KafkaStreams Client State Machine. This will update the States to be
consistent with changes in KIP-671 and KIP-663.

Here are the details: https://cwiki.apache.org/confluence/x/lCvZCQ

Thanks,
Walker

[jira] [Created] (KAFKA-10810) Add a replace thread option to the streams uncaught exception handler

2020-12-04 Thread Walker Carlson (Jira)

Walker Carlson created KAFKA-10810:
--

 Summary: Add a replace thread option to the streams uncaught 
exception handler  
 Key: KAFKA-10810
 URL: https://issues.apache.org/jira/browse/KAFKA-10810
 Project: Kafka
  Issue Type: Improvement
Reporter: Walker Carlson
Assignee: Walker Carlson


Add an option to replace threads that have died.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [VOTE] KIP-689: Extend `StreamJoined` to allow more store configs

2020-12-02 Thread Walker Carlson

+1 (non-binding)

Thank you,
walker

On Wed, Dec 2, 2020 at 8:15 AM Bruno Cadonna  wrote:

> +1 (non-binding)
>
> Thanks Leah!
>
> Best,
> Bruno
>
> On 02.12.20 16:55, Leah Thomas wrote:
> > Hi all,
> >
> > I'd like to start the vote for KIP-689 for enabling/disabling logging for
> > `StreamJoined`.
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-689%3A+Extend+%60StreamJoined%60+to+allow+more+store+configs
> >
> > Thanks,
> > Leah
> >
>

Re: [DISCUSS] KIP-689: Extend `StreamJoined` to allow more store configs

2020-12-01 Thread Walker Carlson

Thanks for making these changes. It makes more sense now to me. Overall LGTM

walker

On Tue, Dec 1, 2020 at 3:39 PM Sophie Blee-Goldman 
wrote:

> Thanks for the KIP! I'm happy with the state of things after your latest
> update,
> LGTM
>
> Sophie
>
> On Tue, Dec 1, 2020 at 2:26 PM Leah Thomas  wrote:
>
> > Hi Matthias,
> >
> > Yeah I think it should, good catch. That should also answer Walker's
> > question about why we have an option for `withLoggingEnabled()` even
> though
> > that's the default. Passing in a new map of configs could allow the user
> to
> > configure the log differently than the default. I've updated the KIP to
> > reflected the added parameter and an added variable, `topicConfig` to
> store
> > the map of configs.
> >
> > Best,
> > Leah
> >
> > On Mon, Nov 30, 2020 at 5:35 PM Matthias J. Sax 
> wrote:
> >
> > > Thanks for the KIP Leah.
> > >
> > > Should `withLoggingEnabled()` take a `Map config`
> > > similar to the one from `Materialized`?
> > >
> > >
> > > -Matthias
> > >
> > > On 11/30/20 12:22 PM, Walker Carlson wrote:
> > > > Ah. That makes sense. Thank you for fixing that.
> > > >
> > > > One minor question. If the default is enabled is there any case
> where a
> > > > user would turn logging off then back on again? I can see having the
> > > > enableLoging for completeness so it's not that important probably.
> > > >
> > > > Anyways other than that it looks good!
> > > >
> > > > Walker
> > > >
> > > > On Mon, Nov 30, 2020 at 12:06 PM Leah Thomas 
> > > wrote:
> > > >
> > > >> Hey Walker,
> > > >>
> > > >> Thanks for your response.
> > > >>
> > > >> 1. Ah yeah thanks for the catch, that was held over from copy/paste.
> > > Both
> > > >> functions should take no parameters, as they just `loggingEnabled`
> to
> > > true
> > > >> or false. I've removed the `WindowBytesStoreSupplier
> > otherStoreSupplier`
> > > >> from the methods in the KIP
> > > >> 2. I think the fix to 1 answers this question, otherwise, I'm not
> > quite
> > > >> sure what you're asking. With the updated method calls, there
> > shouldn't
> > > be
> > > >> any duplication.
> > > >>
> > > >> Cheers,
> > > >> Leah
> > > >>
> > > >> On Mon, Nov 30, 2020 at 12:14 PM Walker Carlson <
> > wcarl...@confluent.io>
> > > >> wrote:
> > > >>
> > > >>> Hello Leah,
> > > >>>
> > > >>> Thank you for the KIP.
> > > >>>
> > > >>> I had a couple questions that maybe you can expand on from what is
> on
> > > the
> > > >>> KIP.
> > > >>>
> > > >>> 1) Why are we enabling/disabling the logging by passing in a
> > > >>> `WindowBytesStoreSupplier`?
> > > >>> It seems to me that these two things should be separate.
> > > >>>
> > > >>> 2) There is already `withThisStoreSupplier(final
> > > WindowBytesStoreSupplier
> > > >>> otherStoreSupplier)` and `withOtherStoreSupplier(final
> > > >>> WindowBytesStoreSupplier otherStoreSupplier)`. Why do we need to
> > > >> duplicate
> > > >>> them when the `retentionPeriod` can be set through them?
> > > >>>
> > > >>> Thanks,
> > > >>> Walker
> > > >>>
> > > >>> On Mon, Nov 30, 2020 at 8:53 AM Leah Thomas 
> > > >> wrote:
> > > >>>
> > > >>>> After reading through
> > > https://issues.apache.org/jira/browse/KAFKA-9921
> > > >> I
> > > >>>> removed the option to enable/disable caching for `StreamJoined`,
> as
> > > >>> caching
> > > >>>> will always be disabled because we retain duplicates.
> > > >>>>
> > > >>>> I updated the KIP accordingly, it now adds only `enableLogging`
> as a
> > > >>>> config.
> > > >>>>
> > > >>>> On Mon, Nov 30, 2020 at 9:54 AM Leah Thomas  >
> > > >>> wrote:
> > > >>>>
> > > >>>>> Hi all,
> > > >>>>>
> > > >>>>> I'd like to kick-off the discussion for KIP-689: Extend
> > > >> `StreamJoined`
> > > >>> to
> > > >>>>> allow more store configs. This builds off the work of KIP-479
> > > >>>>> <
> > > >>>>
> > > >>>
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-479%3A+Add+StreamJoined+config+object+to+Join
> > > >>>>
> > > >>>> to
> > > >>>>> add options to enable/disable both logging and caching for stream
> > > >> join
> > > >>>>> stores.
> > > >>>>>
> > > >>>>> KIP is here:
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-689%3A+Extend+%60StreamJoined%60+to+allow+more+store+configs
> > > >>>>>
> > > >>>>>
> > > >>>>> Looking forward to hearing your thoughts,
> > > >>>>> Leah
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-689: Extend `StreamJoined` to allow more store configs

2020-11-30 Thread Walker Carlson

Ah. That makes sense. Thank you for fixing that.

One minor question. If the default is enabled is there any case where a
user would turn logging off then back on again? I can see having the
enableLoging for completeness so it's not that important probably.

Anyways other than that it looks good!

Walker

On Mon, Nov 30, 2020 at 12:06 PM Leah Thomas  wrote:

> Hey Walker,
>
> Thanks for your response.
>
> 1. Ah yeah thanks for the catch, that was held over from copy/paste. Both
> functions should take no parameters, as they just `loggingEnabled` to true
> or false. I've removed the `WindowBytesStoreSupplier otherStoreSupplier`
> from the methods in the KIP
> 2. I think the fix to 1 answers this question, otherwise, I'm not quite
> sure what you're asking. With the updated method calls, there shouldn't be
> any duplication.
>
> Cheers,
> Leah
>
> On Mon, Nov 30, 2020 at 12:14 PM Walker Carlson 
> wrote:
>
> > Hello Leah,
> >
> > Thank you for the KIP.
> >
> > I had a couple questions that maybe you can expand on from what is on the
> > KIP.
> >
> > 1) Why are we enabling/disabling the logging by passing in a
> > `WindowBytesStoreSupplier`?
> > It seems to me that these two things should be separate.
> >
> > 2) There is already `withThisStoreSupplier(final WindowBytesStoreSupplier
> > otherStoreSupplier)` and `withOtherStoreSupplier(final
> > WindowBytesStoreSupplier otherStoreSupplier)`. Why do we need to
> duplicate
> > them when the `retentionPeriod` can be set through them?
> >
> > Thanks,
> > Walker
> >
> > On Mon, Nov 30, 2020 at 8:53 AM Leah Thomas 
> wrote:
> >
> > > After reading through https://issues.apache.org/jira/browse/KAFKA-9921
> I
> > > removed the option to enable/disable caching for `StreamJoined`, as
> > caching
> > > will always be disabled because we retain duplicates.
> > >
> > > I updated the KIP accordingly, it now adds only `enableLogging` as a
> > > config.
> > >
> > > On Mon, Nov 30, 2020 at 9:54 AM Leah Thomas 
> > wrote:
> > >
> > > > Hi all,
> > > >
> > > > I'd like to kick-off the discussion for KIP-689: Extend
> `StreamJoined`
> > to
> > > > allow more store configs. This builds off the work of KIP-479
> > > > <
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-479%3A+Add+StreamJoined+config+object+to+Join
> > >
> > > to
> > > > add options to enable/disable both logging and caching for stream
> join
> > > > stores.
> > > >
> > > > KIP is here:
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-689%3A+Extend+%60StreamJoined%60+to+allow+more+store+configs
> > > >
> > > >
> > > > Looking forward to hearing your thoughts,
> > > > Leah
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-689: Extend `StreamJoined` to allow more store configs

2020-11-30 Thread Walker Carlson

Hello Leah,

Thank you for the KIP.

I had a couple questions that maybe you can expand on from what is on the
KIP.

1) Why are we enabling/disabling the logging by passing in a
`WindowBytesStoreSupplier`?
It seems to me that these two things should be separate.

2) There is already `withThisStoreSupplier(final WindowBytesStoreSupplier
otherStoreSupplier)` and `withOtherStoreSupplier(final
WindowBytesStoreSupplier otherStoreSupplier)`. Why do we need to duplicate
them when the `retentionPeriod` can be set through them?

Thanks,
Walker

On Mon, Nov 30, 2020 at 8:53 AM Leah Thomas  wrote:

> After reading through https://issues.apache.org/jira/browse/KAFKA-9921 I
> removed the option to enable/disable caching for `StreamJoined`, as caching
> will always be disabled because we retain duplicates.
>
> I updated the KIP accordingly, it now adds only `enableLogging` as a
> config.
>
> On Mon, Nov 30, 2020 at 9:54 AM Leah Thomas  wrote:
>
> > Hi all,
> >
> > I'd like to kick-off the discussion for KIP-689: Extend `StreamJoined` to
> > allow more store configs. This builds off the work of KIP-479
> > <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-479%3A+Add+StreamJoined+config+object+to+Join>
> to
> > add options to enable/disable both logging and caching for stream join
> > stores.
> >
> > KIP is here:
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-689%3A+Extend+%60StreamJoined%60+to+allow+more+store+configs
> >
> >
> > Looking forward to hearing your thoughts,
> > Leah
> >
>

[jira] [Created] (KAFKA-10705) Avoid World Readable RocksDB

2020-11-10 Thread Walker Carlson (Jira)

Walker Carlson created KAFKA-10705:
--

 Summary: Avoid World Readable RocksDB
 Key: KAFKA-10705
 URL: https://issues.apache.org/jira/browse/KAFKA-10705
 Project: Kafka
  Issue Type: Bug
Reporter: Walker Carlson


The state directory could be protected more restrictive by preventing access to 
state directory for group and others. At least other should have no readable 
access



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [VOTE] KIP-671: Add method to Shutdown entire Streams Application

2020-10-19 Thread Walker Carlson

Hello all,

Taking into account the feedback about that last change I have removed some
of the changes and no longer will we have a separate handler for the global
thread. To make it so we can align the handlers there will also be no
option to just remove a stream thread.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-671%3A+Introduce+Kafka+Streams+Specific+Uncaught+Exception+Handler

If you have any concerns please let me know,
Walker

On Wed, Oct 14, 2020 at 12:51 PM John Roesler  wrote:

> Thanks, Sophie,
>
> That makes sense. Should we add a whole new interface and a
> separate kind of listener just because global threads don't
> support restarts _yet_, though?
>
> It seems like that will just widen the API surface area, and
> in a few more months, there will be no more difference
> between the two handlers. But people will forever afterward
> have to register two different handlers to do the same
> thing.
>
> Two alternatives we could consider:
> 1. Just don't add the "restart" option until it's possible
> for all threads. This KIP is already accepted with a
> "restart" option that we know won't be added until KIP-663
> is done. Maybe we just wait for KIP-406 as well. But the
> _rest_ of this KIP can be implemented in the mean time.
> 2. Just log an error and kill the thread anyway if the
> handler for the global thread opts to "retry".
>
> In general, it seems like the problem at hand is better
> solved by allowing/disallowing that one option, versus
> adding a whole new interface.
>
> Thanks,
> -John
>
> On Wed, 2020-10-14 at 11:48 -0700, Sophie Blee-Goldman
> wrote:
> > I don't think the proposal was to *never* add the "replace" functionality
> > for
> > the global thread, but we didn't want to tie up this KIP with anything
> more.
> > As I understand it, the goal of Walker's proposal was to set us up for
> > success if/when we want to add new functionality for the global thread,
> > without necessarily committing to it at this time.
> >
> > Restarting the global thread will take a bit more work since you need to
> > pause any further work that relies on global state until it's back up.
> > That's
> > starting to sound more in the purview of KIP-406 whose current goal
> > is to effectively restart the global thread on a specific type of
> exception
> > (OffsetOutOfRange). If we want to consider expanding that to allow users
> > to choose to restart the thread, then KIP-406 seems like the more
> > appropriate place to engage in that discussion.
> >
> > On Wed, Oct 14, 2020 at 7:13 AM John Roesler 
> wrote:
> >
> > > Hello Walker,
> > >
> > > Sorry for the late reply, but I didn’t follow the reasoning for the
> > > separate handler. You said that the global thread doesn’t have
> “replace”,
> > > but as of today, none of the threads have “replace”. Why not add that
> > > ability when we add it for the other threads?
> > >
> > > The nature of an uncaught exception handler is that there is an
> exception
> > > that will kill the thread. In that case, it seems like replacement is a
> > > desirable option.
> > >
> > > What have I missed?
> > >
> > > Thanks,
> > > John
> > >
> > > On Tue, Oct 13, 2020, at 15:49, Walker Carlson wrote:
> > > > Those are good points Sophie and Matthias. I sepificed the defaults
> in
> > > the
> > > > kip and standardized the names fo the handler to make them a bit more
> > > > readable.
> > > >
> > > > Thanks for the suggestions,
> > > > Walker
> > > >
> > > > On Tue, Oct 13, 2020 at 12:24 PM Sophie Blee-Goldman <
> > > sop...@confluent.io>
> > > > wrote:
> > > >
> > > > > Super nit: can we standardize the method & enum names?
> > > > >
> > > > > Right now we have these enums:
> > > > > StreamsUncaughtExceptionHandlerResponse
> > > > > StreamsUncaughtExceptionHandlerResponseGlobalThread
> > > > >
> > > > > and these callbacks:
> > > > > handleUncaughtException()
> > > > > handleExceptionInGlobalThread()
> > > > >
> > > > > The method names have different syntax, which is a bit clunky. I
> don't
> > > have
> > > > > any
> > > > > strong opinions on what grammar they should follow, just that it
> > > should be
> > > > > the
> > > > &g

Re: [ANNOUNCE] New committer: A. Sophie Blee-Goldman

2020-10-19 Thread Walker Carlson

Congratulations Sophie!

On Mon, Oct 19, 2020 at 9:43 AM Navinder Brar
 wrote:

> That's great news. Congrats Sophie! Well deserved.
>
> Regards,
> Navinder
> On Monday, 19 October, 2020, 10:12:16 pm IST, Bruno Cadonna <
> br...@confluent.io> wrote:
>
>  Congrats Sophie! Very well deserved!
>
> Bruno
>
> On 19.10.20 18:40, Matthias J. Sax wrote:
> > Hi all,
> >
> > I am excited to announce that A. Sophie Blee-Goldman has accepted her
> > invitation to become an Apache Kafka committer.
> >
> > Sophie is actively contributing to Kafka since Feb 2019 and has
> > accumulated 140 commits. She authored 4 KIPs in the lead
> >
> >  - KIP-453: Add close() method to RocksDBConfigSetter
> >  - KIP-445: In-memory Session Store
> >  - KIP-428: Add in-memory window store
> >  - KIP-613: Add end-to-end latency metrics to Streams
> >
> > and helped to implement two critical KIPs, 429 (incremental rebalancing)
> > and 441 (smooth auto-scaling; not just implementation but also design).
> >
> > In addition, she participates in basically every Kafka Streams related
> > KIP discussion, reviewed 142 PRs, and is active on the user mailing list.
> >
> > Thanks for all the contributions, Sophie!
> >
> >
> > Please join me to congratulate her!
> >  -Matthias
> >
>

Re: [VOTE] KIP-671: Add method to Shutdown entire Streams Application

2020-10-13 Thread Walker Carlson

Those are good points Sophie and Matthias. I sepificed the defaults in the
kip and standardized the names fo the handler to make them a bit more
readable.

Thanks for the suggestions,
Walker

On Tue, Oct 13, 2020 at 12:24 PM Sophie Blee-Goldman 
wrote:

> Super nit: can we standardize the method & enum names?
>
> Right now we have these enums:
> StreamsUncaughtExceptionHandlerResponse
> StreamsUncaughtExceptionHandlerResponseGlobalThread
>
> and these callbacks:
> handleUncaughtException()
> handleExceptionInGlobalThread()
>
> The method names have different syntax, which is a bit clunky. I don't have
> any
> strong opinions on what grammar they should follow, just that it should be
> the
> same for each. I also think that we should specify "StreamThread" somewhere
> in the name of the StreadThread-specific callback, now that we have a
> second
> callback that specifies it's for the GlobalThread. Something like
> "*handleStreamThreadException()*" and "*handleGlobalThreadException*"
>
> The enums are ok, although I think we should include "StreamThread"
> somewhere
> like with the callbacks. And we can probably shorten them a bit. For
> example
> "*StreamThreadExceptionResponse*" and "*GlobalThreadExceptionResponse*"
>
>
>
> On Tue, Oct 13, 2020 at 11:48 AM Matthias J. Sax  wrote:
>
> > Thanks Walker.
> >
> > Overall, LGTM. However, I am wondering if we should have default
> > implementations for both handler methods? Before the latest change,
> > there was only one method and having a default was not necessary.
> > However, forcing people to implement both methods might not be the best
> > user experience: for example, if there is no global thread, one should
> > not need to implement the global handler method (and the other way
> around).
> >
> > Thus, it might be good to add default for both methods. If we add
> > defaults, we should explain the default behavior to the KIP.
> >
> > -Matthias
> >
> > On 10/12/20 2:32 PM, Walker Carlson wrote:
> > > Hello all,
> > >
> > > I just wanted to let you know that I had to make 2 minor updates to the
> > KIP.
> > >
> > > 1) I changed the behavior of the shutdown client to not leave the
> client
> > in
> > > Error but instead close directly because this aligns better with our
> > > state machine.
> > >
> > > 2) I added a separate call back for the global thread as it does not
> have
> > > all the options as a streamThread does. i.e. replace. The default will
> be
> > > to close the client. that will also be the only option as that is the
> > > current behavior for the global thread.
> > >
> > > you can find the diff here:
> > >
> >
> https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=158876566&originalVersion=21&revisedVersion=23
> > >
> > > If you have any problems with these changes let me know and we can
> > discuss
> > > them further
> > >
> > > Thank you,
> > > Walker
> > >
> > > On Wed, Sep 30, 2020 at 7:33 AM Walker Carlson 
> > > wrote:
> > >
> > >>
> > >> Bruno Cadonna 
> > >> 4:51 AM (2 hours ago)
> > >> to dev
> > >> Thank you all for voting!
> > >>
> > >> This KIP is accepted with +3 binding (Guozhang, Bill, Matthias) and +2
> > >> non-binding (Bruno, Leah).
> > >>
> > >> Matthias, we will take care of  the global threads, and for the
> > >> replacement that will depend on Kip-663.
> > >>
> > >> Best,
> > >>
> > >> On Wed, Sep 30, 2020 at 4:59 AM Bruno Cadonna 
> > wrote:
> > >>
> > >>> Thanks a lot Walker!
> > >>>
> > >>> +1 (non-binding)
> > >>>
> > >>> Best,
> > >>> Bruno
> > >>>
> > >>> On 30.09.20 03:10, Matthias J. Sax wrote:
> > >>>> Thanks Walker. The proposed API changes LGTM.
> > >>>>
> > >>>> +1 (binding)
> > >>>>
> > >>>> One minor nit: you should also mention the global-thread that also
> > needs
> > >>>> to be shutdown if requested by the user.
> > >>>>
> > >>>> Minor side question: should we actually terminate a thread and
> create
> > a
> > >>>> new one, or instead revive the existing thr

Re: [VOTE] KIP-671: Add method to Shutdown entire Streams Application

2020-10-12 Thread Walker Carlson

Hello all,

I just wanted to let you know that I had to make 2 minor updates to the KIP.

1) I changed the behavior of the shutdown client to not leave the client in
Error but instead close directly because this aligns better with our
state machine.

2) I added a separate call back for the global thread as it does not have
all the options as a streamThread does. i.e. replace. The default will be
to close the client. that will also be the only option as that is the
current behavior for the global thread.

you can find the diff here:
https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=158876566&originalVersion=21&revisedVersion=23

If you have any problems with these changes let me know and we can discuss
them further

Thank you,
Walker

On Wed, Sep 30, 2020 at 7:33 AM Walker Carlson 
wrote:

>
> Bruno Cadonna 
> 4:51 AM (2 hours ago)
> to dev
> Thank you all for voting!
>
> This KIP is accepted with +3 binding (Guozhang, Bill, Matthias) and +2
> non-binding (Bruno, Leah).
>
> Matthias, we will take care of  the global threads, and for the
> replacement that will depend on Kip-663.
>
> Best,
>
> On Wed, Sep 30, 2020 at 4:59 AM Bruno Cadonna  wrote:
>
>> Thanks a lot Walker!
>>
>> +1 (non-binding)
>>
>> Best,
>> Bruno
>>
>> On 30.09.20 03:10, Matthias J. Sax wrote:
>> > Thanks Walker. The proposed API changes LGTM.
>> >
>> > +1 (binding)
>> >
>> > One minor nit: you should also mention the global-thread that also needs
>> > to be shutdown if requested by the user.
>> >
>> > Minor side question: should we actually terminate a thread and create a
>> > new one, or instead revive the existing thread (reusing its existing
>> ID)?
>> >
>> >
>> > -Matthias
>> >
>> > On 9/29/20 2:39 PM, Bill Bejeck wrote:
>> >> Thanks for the KIP Walker.
>> >>
>> >> +1 (binding)
>> >>
>> >> -Bill
>> >>
>> >> On Tue, Sep 29, 2020 at 4:59 PM Guozhang Wang 
>> wrote:
>> >>
>> >>> +1 again on the KIP.
>> >>>
>> >>> On Tue, Sep 29, 2020 at 1:51 PM Leah Thomas 
>> wrote:
>> >>>
>> >>>> Hey Walker,
>> >>>>
>> >>>> Thanks for the KIP! I'm +1, non-binding.
>> >>>>
>> >>>> Cheers,
>> >>>> Leah
>> >>>>
>> >>>> On Tue, Sep 29, 2020 at 1:56 PM Walker Carlson <
>> wcarl...@confluent.io>
>> >>>> wrote:
>> >>>>
>> >>>>> Hello all,
>> >>>>>
>> >>>>> I made some changes to the KIP the descriptions are on the
>> discussion
>> >>>>> thread. If you have already voted I would ask you to confirm your
>> vote.
>> >>>>>
>> >>>>> Otherwise please vote so we can get this feature in.
>> >>>>>
>> >>>>> Thanks,
>> >>>>> Walker
>> >>>>>
>> >>>>> On Thu, Sep 24, 2020 at 4:36 PM John Roesler 
>> >>>> wrote:
>> >>>>>
>> >>>>>> Thanks for the KIP, Walker!
>> >>>>>>
>> >>>>>> I’m +1 (binding)
>> >>>>>>
>> >>>>>> -John
>> >>>>>>
>> >>>>>> On Mon, Sep 21, 2020, at 17:04, Guozhang Wang wrote:
>> >>>>>>> Thanks for finalizing the KIP. +1 (binding)
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> Guozhang
>> >>>>>>>
>> >>>>>>> On Mon, Sep 21, 2020 at 1:38 PM Walker Carlson <
>> >>>> wcarl...@confluent.io>
>> >>>>>>> wrote:
>> >>>>>>>
>> >>>>>>>> Hello all,
>> >>>>>>>>
>> >>>>>>>> I would like to start a thread to vote for KIP-671 to add a
>> >>> method
>> >>>> to
>> >>>>>> close
>> >>>>>>>> all clients in a kafka streams application.
>> >>>>>>>>
>> >>>>>>>> KIP:
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-671%3A+Shutdown+Streams+Application+when+appropriate+exception+is+thrown
>> >>>>>>>>
>> >>>>>>>> Discussion thread: *here
>> >>>>>>>> <
>> >>>>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>
>> https://mail-archives.apache.org/mod_mbox/kafka-dev/202009.mbox/%3CCAC55fuh3HAGCxz-PzxTJraczy6T-os2oiCV328PBeuJQSVYASg%40mail.gmail.com%3E
>> >>>>>>>>> *
>> >>>>>>>>
>> >>>>>>>> Thanks,
>> >>>>>>>> -Walker
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> --
>> >>>>>>> -- Guozhang
>> >>>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>>
>> >>> --
>> >>> -- Guozhang
>> >>>
>> >>
>>
>

1 2 >

1 - 100 of 138 matches

Mail list logo