Hello Levani, Thanks for the reply.
1. Interesting; why did you change your mind? I have a gut feeling that we can achieve pretty much any rack awareness need that people have by using purely config, which is obviously much easier to use. But if you had a case in mind where this wouldn’t work, it would be good to know. In fact, if that is true, then perhaps you could just defer the whole idea of a pluggable interface (point 2) to a separate KIP. I do think a pluggable assignor would be extremely valuable, but it might be nice to cut the scope of KIP-708 if just a config will suffice. What do you think? Thanks, John On Mon, Feb 1, 2021, at 06:07, Levani Kokhreidze wrote: > Hi John, > > Thanks a lot for thorough feedback, it’s really valuable. > > 1. Agree with this. Had the same idea initially. > We can set some upper limit in terms of what’s > the max number of tags users can set to make > sure it’s not overused. By default, we can create > standby tasks where tags are different from active task (full match). > This should mimic default rack awareness behaviour. > > 2. I like the idea and I’d be happy to work on > refactoring TaskAssignor to accommodate rack awareness use-case. > When I was going through the code, it felt way more natural > to use pluggable TaskAssignor for achieving rack awareness > instead of introducing new interface and contract. > But I thought approach mentioned in the KIP is simpler so > decided to move forward with it as an initial proposal :). > But I agree with you, it will be much better if we can have > TaskAssignor as pluggable interface users can use. > One potential challenge I see with this is that, if we just let > users implement TaskAssignor in its current form, we will be forcing > users to implement functionality for active task assignment, as well as > standby task assignment. This feels like not very clear contract, > because with > just TaskAssignor interface it’s not really clear they one needs to > allocate > standby tasks as well. We can enforce it on some level with the return > object > You’ve mentioned TaskAssignor#assign has to return, but still feels > error prone. > In addition, I suspect in most of the cases users would want > to control standby task assignment and leave active task assignment as > is. > To make implementation of standby task assignment easier for users, > what if > we decouple active and standby task assignment from the `TaskAssignor`? > Idea I have in mind is to split TaskAssignor into ActiveTaskAssignor > and StandbyTaskAssignor > and let users add their own implementation for them separately if they > like via config. > > If this approach sounds reasonable, I’ll work on updating KIP this week. > > Thanks, > Levani > > > On 28. Jan 2021, at 19:20, John Roesler <vvcep...@apache.org> wrote: > > > > Thanks, Levani! > > > > I was reflecting more on your KIP last night. > > > > One thing I should mention is that I have previously used > > the rack awareness feature of Elasticsearch, and found it to > > be pretty intuitive and also capable of what we needed in > > our AWS clusters. As you look at related work, you might > > take ES into consideration. > > > > I was also had some thoughts about your proposal. > > > > 1. I'm wondering if we instead allow people to add arbitrary > > tags to each host, and then have a configuration to specify > > a combination of tags to use for rack awareness. This seems > > easier to manage than for the use case you anticipate where > > people would concatenate rackId = (clusterId + AZ), and then > > have to parse the rackId back out to compute the assignment. > > > > 2. About the proposed RackAwareStandbyTaskAssignor, I'm > > wondering if we can change the level of abstraction a little > > bit and capture even more value here. One thing we wanted to > > do in KIP-441, but decided to cut from the scope, was to > > define a public TaskAssignor interface so that people can > > plug in the whole task assignment algorithm. > > > > In fact, there is already an internal config and interface > > for this (`internal.task.assignor.class`: > > `org.apache.kafka.streams.processor.internals.assignment.Tas > > kAssignor`). > > > > We kept that interface and config internal because the > > current TaskAssignor interface has a number of flaws, but if > > we correct those flaws, we can offer a nice public interface > > that people can use to control the standby allocation, but > > also active task allocation, based on the tags I suggested > > in (1). > > > > I don't think we need too much work to refactor > > TaskAssignor, the main problems are that the assign method > > mutates its input and that it doesn't expose the full > > metadata from the cluster members. Therefore, if you like > > this idea, we should propose to refactor TaskAssignor with: > > * input: an immutable description of the cluster, including > > current lags of all stateful tasks and current stateless > > task assignments, as well as metadata for each host. > > * output: an object describing the new assignment as well as > > a flag on whether to schedule a followup probing rebalance. > > > > An even more stretchy stretch goal would be to include > > metadata of the brokers, which could be used to achieve > > higher levels of rack awareness. For example, we could co- > > locate tasks in the same "rack" (AZ) as the partition leader > > for their input or output topics, to minimize cross-AZ > > traffic. I'm not sure to what extent clients can learn the > > relevant broker metadata, so this stretch might not be > > currently feasible, but as long as we design the > > TaskAssignor for extensibility, we can do something like > > this in the future. > > > > Thanks again for this proposal, I hope the above is more > > inspiring than annoying :) > > > > I really think your KIP is super high value in whatever form > > you ultimately land on. > > > > > > Thanks, > > John > > > > On Thu, 2021-01-28 at 13:08 +0200, Levani Kokhreidze wrote: > >> Hi John > >> > >> Thanks for the feedback (and for the great work on KIP441 :) ). > >> Makes sense, will add a section in the KIP explaining rack awarenesses on > >> high level and how it’s implemented in the different distributed systems. > >> > >> Thanks, > >> Levani > >> > >>> On 27. Jan 2021, at 16:07, John Roesler <vvcep...@apache.org> wrote: > >>> > >>> Hi Levani, > >>> > >>> Thanks for this KIP! I think this is really high value; it was something > >>> I was disappointed I didn’t get to do as part of KIP-441. > >>> > >>> Rack awareness is a feature provided by other distributed systems as > >>> well. I wonder if your KIP could devote a section to summarizing what > >>> rack awareness looks like in other distributed systems, to help us put > >>> this design in context. > >>> > >>> Thanks! > >>> John > >>> > >>> > >>> On Tue, Jan 26, 2021, at 16:46, Levani Kokhreidze wrote: > >>>> Hello all, > >>>> > >>>> I’d like to start discussion on KIP-708 [1] that aims to introduce rack > >>>> aware standby task distribution in Kafka Streams. > >>>> In addition to changes mentioned in the KIP, I’d like to get some ideas > >>>> on additional change I have in mind. > >>>> Assuming KIP moves forward, I was wondering if it makes sense to > >>>> configure Kafka Streams consumer instances with the rack ID passed with > >>>> the new StreamsConfig#RACK_ID_CONFIG property. > >>>> In practice, that would mean that when “rack.id <http://rack.id/>” is > >>>> configured in Kafka Streams, it will automatically translate into > >>>> ConsumerConfig#CLIENT_RACK_ID config for all the KafkaConsumer clients > >>>> that is used by Kafka Streams internally. > >>>> > >>>> [1] > >>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-708%3A+Rack+aware+Kafka+Streams+with+pluggable+StandbyTask+assignor > >>>> > >>>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-708:+Rack+aware+Kafka+Streams+with+pluggable+StandbyTask+assignor> > >>>> > >>>> P.S > >>>> I have draft PR ready, if it helps the discussion moving forward, I can > >>>> provide the draft PR link in this thread. > >>>> > >>>> Regards, > >>>> Levani > >> > > > > > >