Aditya, when I think about the motivation of "not having to restart brokers to change a config" I think about all of the configurations I have seen having to get changed in brokers and restarted (which is just about all of them). What I mean by "stop the world" is when producers and/or consumers will not be able to use the broker(s) for a period of time or something within the broker holds/blocks everything for the changes to take affect and LeaderElection is going to occur or ISR change.
Lets say someone wanted to change replicaFetchMaxBytes or replicaFetchBackoffMs dynamically you would have to stop the ReplicaFetcherManager. If you use a watcher then then all brokers at the same time will have to stop and (hopefully) start ReplicaFetcherManager at the same time. Or lets say someone wanted to change NumNetworkThreads, the entire SocketServer for every broker at the same time would have to stop and (hopefully) start.I believe most of the configurations fall into this category and using a watcher notification to every broker without some control is going to be a problem. If the notification just goes to the controller and the controller is able to managing the processing for every broker that might work but doesn't solve all the problems to be worked on. We would also have to think about what to-do for the controller broker also itself (unless we make the controller maybe not a broker as possible) as well as how to deal with some of these changes that could take brokers in and out of the ISR or cause Leader Election. If we can make these changes without "stopping the world" (not just a matter of having the controller managing the broker by broker restart) so that Brokers that are leaders would still be leaders (perhaps the connections for producing / consuming get buffered or something) when (if) they come back online. The thing is that lots of folks want all (as many as possible) the configuration to be dynamic and I am concerned that if we don't code for the harder cases then we only have one or two configurations able to be dynamic. If that is the motivation for this KIP so quotas work that is ok. The more I think about it I am not sure just labeling certain configs to be dynamic is going to be helpful for folks because they are still having to manage the updates for all the configurations, restarting brokers and now a new burden to understand dynamic properties. I think we need to add solutions for folks where we can to make things easier without having to add new items for them to contend with. Thanks! ~ Joe Stein - - - - - - - - - - - - - - - - - http://www.stealth.ly - - - - - - - - - - - - - - - - - On Sun, May 3, 2015 at 8:23 PM, Aditya Auradkar < aaurad...@linkedin.com.invalid> wrote: > Hey Joe, > > Can you elaborate what you mean by a stop the world change? In this > protocol, we can target notifications to a subset of brokers in the cluster > (controller if we need to). Is the AdminChangeNotification a ZK > notification or a request type exposed by each broker? > > Thanks, > Aditya > > ________________________________________ > From: Joe Stein [joe.st...@stealth.ly] > Sent: Friday, May 01, 2015 5:25 AM > To: dev@kafka.apache.org > Subject: Re: [DISCUSS] KIP-21 Configuration Management > > Hi Aditya, thanks for the write up and focusing on this piece. > > Agreed we need something that we can do broker changes dynamically without > rolling restarts. > > I think though if every broker is getting changes it with notifications it > is going to limit which configs can be dynamic. > > We could never deliver a "stop the world" configuration change because then > that would happen on the entire cluster to every broker on the same time. > > Can maybe just the controller get the notification? > > And we provide a layer for brokers to work with the controller to-do the > config change operations at is discretion (so it can stop things if needs). > > controller gets notification, sends AdminChangeNotification to broker [X .. > N] then brokers can do their things, even send a response for heartbeating > while it takes the few milliseconds it needs or crashes. We need to go > through both scenarios. > > I am worried we put this change in like this and it works for quotas and > maybe a few other things but nothing else gets dynamic and we don't get far > enough for almost no more rolling restarts. > > ~ Joe Stein > - - - - - - - - - - - - - - - - - > > http://www.stealth.ly > - - - - - - - - - - - - - - - - - > > On Thu, Apr 30, 2015 at 8:14 PM, Joel Koshy <jjkosh...@gmail.com> wrote: > > > > 1. I have deep concerns about managing configuration in ZooKeeper. > > > First, Producers and Consumers shouldn't depend on ZK at all, this > > seems > > > to add back a dependency we are trying to get away from. > > > > The KIP probably needs to be clarified here - I don't think Aditya was > > referring to client (producer/consumer) configs. These are global > > client-id-specific configs that need to be managed centrally. > > (Specifically, quota overrides on a per-client basis). > > > > >