Mickael, have you had some time to review this by any chance? On Tue, Jun 20, 2023 at 5:23 PM Viktor Somogyi-Vass < viktor.somo...@cloudera.com> wrote:
> Hey all, > > I'd like to revive this discussion. I've created > https://cwiki.apache.org/confluence/display/KAFKA/KIP-879%3A+Multi-level+Rack+Awareness > last November and it seems to be that there is a nice overlap between the > two and would be good to merge. Should we revive KIP-660 and merge the two > KIPs? > If you don't have time for this Mickael currently, I'm happy to take it > over from you and merge the two interfaces, it seems like they're somewhat > similar (and also with the current internal interface). > > Best, > Viktor > > On Tue, May 31, 2022 at 3:57 PM Mickael Maison <mickael.mai...@gmail.com> > wrote: > >> Hi Vikas, >> >> You make some very good points and most importantly I agree that being >> able to prevent putting new partitions on a broker should be part of >> Kafka itself and not require a plugin. >> >> This feature would addresses 2 out of the 3 scenarios mentioned in the >> motivation section. The last one "When adding brokers to a cluster, >> Kafka currently does not necessarily place new partitions on new >> brokers" is clearly less important. >> >> So I think I'll retire this KIP and I'll follow up with a new KIP to >> focus on that feature. >> >> Thanks, >> Mickael >> >> >> On Mon, May 9, 2022 at 8:11 PM Vikas Singh <vi...@confluent.io.invalid> >> wrote: >> > >> > Hi Mickael, >> > >> > It's a nice proposal. It's appealing to have a pluggable way to override >> > default kafka placement decisions, and the motivation section lists >> some of >> > them. Here are few comments: >> > >> > * The motivation section has "When adding brokers to a cluster, Kafka >> > currently does not necessarily place new partitions on new brokers". I >> am >> > not sure how valuable doing this will be. A newly created kafka topic >> takes >> > time to reach the same usage level as existing topics, say because the >> > topic created by a new workload that is getting onboarded, or the >> expansion >> > was done to relieve disk pressure on existing nodes etc. While new >> topics >> > catch up to existing workload, the new brokers are not sharing equal >> load >> > in the cluster, which probably defeats the purpose of adding new >> brokers. >> > In addition to that clustering new topics like this on new brokers have >> > implications from fault domain perspective. A reasonable way to >> approach it >> > is to indeed use CruiseControl to move things around so that the newly >> > added nodes become immediately involved and share cluster load. >> > * Regarding "When administrators want to remove brokers from a cluster, >> > there is no way to prevent Kafka from placing partitions on them", this >> is >> > indeed an issue. I would argue that this is needed by everyone and >> should >> > be part of Kafka, instead of being implemented as part of a plugin >> > interface by multiple teams. >> > * For "When some brokers are near their storage/throughput limit, Kafka >> > could avoid putting new partitions on them", while this can help relieve >> > short term overload I think again the correct solution here is something >> > like CruiseControl where the system is monitored and things moved >> around to >> > maintain a balanced cluster. A new topic will not take any disk space, >> so >> > placing them anywhere normally isn't going to add to the storage >> overload. >> > Similar to the previous case, maybe a mechanism in Kafka to put nodes >> in a >> > quarantine state is a better way to approach this. >> > >> > In terms of the proposed api, I have a couple of comments: >> > >> > * It is not clear if the proposal applies to partitions of new topics or >> > addition on partitions to an existing topic. Explicitly stating that >> will >> > be helpful. >> > * Regarding part "To address the use cases identified in the motivation >> > section, some knowledge about the current state of the cluster is >> > necessary. Details whether a new broker has just been added or is being >> > decommissioned are not part of the cluster metadata. Therefore such >> > knowledge has to be provided via an external means to the ReplicaPlacer, >> > for example via the configuration". It's not clear how this will be >> done. >> > If I have to implement this interface, it will be helpful to have clear >> > guidance/examples here which hopefully ties to the use cases in the >> > motivation section. It also allows us to figure out if the proposed >> > interface is complete and helps future implementers of the interface. >> > >> > Couple of minor comments: >> > * The KIP is not listed in the main KIP page ( >> > >> https://cwiki-test.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals >> ). >> > Can you please add it there. >> > * The page has "This is especially true for the 4 scenarios listed in >> the >> > Motivation section", but there are only 3 scenarios listed. >> > >> > Regards, >> > Vikas >> > >> > >> > On Tue, May 3, 2022 at 5:51 PM Colin McCabe <cmcc...@apache.org> wrote: >> > >> > > Hi Mickael, >> > > >> > > We did discuss this earlier, and I remember not being too enthusiastic >> > > about a pluggable policy here :) >> > > >> > > There have been several changes to the placement code in the last few >> > > weeks. (These are examples of the kind of changes that are impossible >> to do >> > > once an API is established, by the way.) Can you please revise the >> KIP to >> > > take these into account? >> > > >> > > I'd also like to understand a little bit better why we need this API >> when >> > > we have the explicit placement API for createTopics and >> createPartitions. >> > > Can you give me a few scenarios where the manual placement API would >> be >> > > insufficient? >> > > >> > > best, >> > > Colin >> > > >> > > >> > > On Mon, May 2, 2022, at 09:28, Mickael Maison wrote: >> > > > Hi, >> > > > >> > > > If there are no further comments, I'll start a vote in the next few >> days. >> > > > >> > > > Thanks, >> > > > Mickael >> > > > >> > > > On Wed, Mar 30, 2022 at 3:51 AM Luke Chen <show...@gmail.com> >> wrote: >> > > >> >> > > >> Hi Mickael, >> > > >> >> > > >> Thanks for the update. >> > > >> It answered my questions! >> > > >> >> > > >> Thank you. >> > > >> Luke >> > > >> >> > > >> On Wed, Mar 30, 2022 at 12:09 AM Mickael Maison < >> > > mickael.mai...@gmail.com> >> > > >> wrote: >> > > >> >> > > >> > Hi Luke, >> > > >> > >> > > >> > Thanks for the feedback. >> > > >> > >> > > >> > 1. Thanks, fixed! >> > > >> > 2. Yes that's right. It's the same behavior for topic policies >> > > >> > 3. I've added details about how the mentioned scenarios could be >> > > >> > addressed. The information required to make such decisions is >> not part >> > > >> > of the Kafka cluster metadata so an external input is necessary. >> This >> > > >> > KIP does not propose a specific mechanism for doing it. >> > > >> > >> > > >> > I hope this answers your questions. >> > > >> > >> > > >> > Thanks, >> > > >> > Mickael >> > > >> > >> > > >> > >> > > >> > On Tue, Mar 29, 2022 at 5:42 PM Mickael Maison < >> > > mickael.mai...@gmail.com> >> > > >> > wrote: >> > > >> > > >> > > >> > > Hi Ryanne, >> > > >> > > >> > > >> > > That's a good point! >> > > >> > > >> > > >> > > There's no value in having all implementations perform the same >> > > sanity >> > > >> > > checks. If the replication factor is < 1 or larger than the >> current >> > > >> > > number of registered brokers, the controller should directly >> throw >> > > >> > > InvalidReplicationFactorException and not call the >> ReplicaPlacer. >> > > I've >> > > >> > > updated the KIP so the place() method now only throws >> > > >> > > ReplicaPlacementException. >> > > >> > > >> > > >> > > Thanks, >> > > >> > > Mickael >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > On Mon, Mar 28, 2022 at 6:20 PM Ryanne Dolan < >> ryannedo...@gmail.com >> > > > >> > > >> > wrote: >> > > >> > > > >> > > >> > > > Wondering about InvalidReplicationFactorException. Why would >> an >> > > >> > > > implementation throw this? Given the information passed to >> the >> > > method, >> > > >> > > > seems like this could only be thrown if there were obviously >> > > invalid >> > > >> > > > arguments, like a negative number or zero. Can we just >> guarantee >> > > such >> > > >> > > > invalid arguments aren't passed in? >> > > >> > > > >> > > >> > > > Ryanne >> > > >> > > > >> > > >> > > > On Sat, Mar 26, 2022, 8:51 AM Luke Chen <show...@gmail.com> >> > > wrote: >> > > >> > > > >> > > >> > > > > Hi Mickael, >> > > >> > > > > >> > > >> > > > > Thanks for the KIP! >> > > >> > > > > It's indeed a pain point for the Kafka admins. >> > > >> > > > > >> > > >> > > > > I have some comments: >> > > >> > > > > 1. Typo in motivation section: When administrators [when >> to] >> > > remove >> > > >> > brokers >> > > >> > > > > from a cluster,.... >> > > >> > > > > 2. If different `replica.placer.class.name` configs are >> set in >> > > all >> > > >> > > > > controllers, I think only the config for "active >> controller" >> > > will >> > > >> > take >> > > >> > > > > effect, right? >> > > >> > > > > 3. Could you explain more about how the proposal fixes some >> > > >> > scenarios you >> > > >> > > > > listed, ex: the new added broker case. How could we know >> the >> > > broker >> > > >> > is new >> > > >> > > > > added? I guess it's by checking the broker load via some >> metrics >> > > >> > > > > dynamically, right? >> > > >> > > > > >> > > >> > > > > >> > > >> > > > > Thank you. >> > > >> > > > > Luke >> > > >> > > > > >> > > >> > > > > On Fri, Mar 18, 2022 at 10:30 AM Ryanne Dolan < >> > > ryannedo...@gmail.com >> > > >> > > >> > > >> > > > > wrote: >> > > >> > > > > >> > > >> > > > > > Thanks Mickael, this makes sense to me! I've been wanting >> > > >> > something like >> > > >> > > > > > this in order to decommission a broker without new >> partitions >> > > >> > getting >> > > >> > > > > > accidentally assigned to it. >> > > >> > > > > > >> > > >> > > > > > Ryanne >> > > >> > > > > > >> > > >> > > > > > On Thu, Mar 17, 2022, 5:56 AM Mickael Maison < >> > > >> > mickael.mai...@gmail.com> >> > > >> > > > > > wrote: >> > > >> > > > > > >> > > >> > > > > > > Hi, >> > > >> > > > > > > >> > > >> > > > > > > I'd like to start a new discussion on KIP-660. I >> originally >> > > >> > wrote this >> > > >> > > > > > > KIP in 2020 and the initial discussion >> > > >> > > > > > > ( >> > > >> > https://lists.apache.org/thread/xn7xyb74nyt281brto4x28r9rzxm4lp9 >> ) >> > > >> > > > > > > raised some concerns especially around KRaft (which >> did not >> > > >> > exist at >> > > >> > > > > > > that time) and scalability. >> > > >> > > > > > > >> > > >> > > > > > > Since then, we got a new KRaft controller so I've been >> able >> > > to >> > > >> > revisit >> > > >> > > > > > > this KIP. I kept the KIP number as it's essentially >> the same >> > > >> > idea, but >> > > >> > > > > > > the proposal is significantly different: >> > > >> > > > > > > >> > > >> > > > > > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > >> > > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-660%3A+Pluggable+ReplicaPlacer >> > > >> > > > > > > >> > > >> > > > > > > Please take a look and let me know if you have any >> feedback. >> > > >> > > > > > > >> > > >> > > > > > > Thanks, >> > > >> > > > > > > Mickael >> > > >> > > > > > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > >> > > >> >