Hi Jun, Sorry to bother you again. Can you please take a look at the wiki again when you have time?
Thanks a lot! Lucas On Wed, Sep 19, 2018 at 3:57 PM Lucas Wang <lucasatu...@gmail.com> wrote: > Hi Jun, > > Thanks a lot for the detailed explanation. > I've restored the wiki to a previous version that does not require config > changes, > and keeps the current behavior with the proposed changes turned off by > default. > I'd appreciate it if you can review it again. > > Thanks! > Lucas > > On Tue, Sep 18, 2018 at 1:48 PM Jun Rao <j...@confluent.io> wrote: > >> Hi, Lucas, >> >> When upgrading to a minor release, I think the expectation is that a user >> wouldn't need to make any config changes, other than the usual >> inter.broker.protocol. If we require other config changes during an >> upgrade, then it's probably better to do that in a major release. >> >> Regarding your proposal, I think removing host/advertised_host in favor of >> listeners:advertised_listeners seems useful regardless of this KIP. >> However, that can probably wait until a major release. >> >> As for the controller listener, I am not sure if one has to set it. To >> make >> a cluster healthy, one sort of have to make sure that the request queue is >> never full and no request will be sitting in the request queue for long. >> If >> one does that, setting the controller listener may not be necessary. On >> the >> flip side, even if one sets the controller listener, but the request queue >> and the request time for the data part are still high, the cluster may >> still not be healthy. Given that we have already started the 2.1 release >> planning, perhaps we can start with not requiring the controller listener. >> If this is indeed something that everyone wants to set, we can make it a >> required config in a major release. >> >> Thanks, >> >> Jun >> >> On Tue, Sep 11, 2018 at 3:46 PM, Lucas Wang <lucasatu...@gmail.com> >> wrote: >> >> > @Jun Rao <j...@confluent.io> >> > >> > I made the recent config changes after thinking about the default >> behavior >> > for adopting this KIP. >> > I think there are basically two options: >> > 1. By default, the behavior proposed in this KIP is turned off, and >> > operators can turn it >> > on by adding the "controller.listener.name" config and entries in the >> > "listeners" and "advertised.listeners" list. >> > If no "controller.listener.name" is added, it'll be the *same as* the " >> > inter.broker.listener.name", >> > and the proposed feature is effectively turned off. >> > This has been the assumption in the KIP writeup before. >> > >> > 2. By default, the behavior proposed in this KIP is turned on, and >> > operators are forced to >> > recognize the proposed change if their "listeners" config is set (this >> is >> > most likely in production environments), >> > by allocating a new port for controller connections, and adding a new >> > endpoint to the "listeners" config. >> > For cases where "listeners" is not set explicitly, >> > there needs to be a default value for it that includes the controller >> > listener name, >> > e.g. "PLAINTEXT://:9092,CONTROLLER://:9091" >> > >> > I chose to go with option 2 since as author of this KIP, >> > I naturally think in the long run, it's worth the effort to adopt this >> > feature, >> > in order to prevent issues under circumstances listed in the motivation >> > section. >> > >> > 100, following the argument above, I want to enforce the separation >> > between controller >> > and data plane requests. Hence the "controller.listener.name" should >> > never be the same >> > as the "inter.broker.listener.name", which is intended for data plane >> > requests. >> > >> > 101, the default value for "listeners" will be >> > "PLAINTEXT://:9092,CONTROLLER://:9091", >> > making values of "host", and "port" not being used under any config >> > settings. >> > And the default value for "advertised.listeners" will be derived from >> > "listeners", >> > making the values of "advertised.host", and "advertised.port" not being >> > used any more. >> > >> > 102, for upgrading, a single broker that has "listeners" and/or >> > "advertised.listeners" set, >> > must add a new endpoint for the CONTROLLER listener name, or end up >> using >> > the default listeners "PLAINTEXT://:9092,CONTROLLER://:9091". >> > During rolling upgrade, in cases of <old controller> + <new broker> or >> > <new controller> + <old broker> >> > we still need to fall back to the current behavior. However after the >> > rolling upgrade is done, >> > it is guaranteed that the controller plane and data plane are separated, >> > given >> > the "controller.listener.name" must be different from " >> > inter.broker.listener.name". >> > >> > @Ismael Juma <ism...@juma.me.uk> >> > Thanks for pointing that out. I did not know that. >> > However my question is if the argument above makes sense, and my code >> > change >> > causes the configs "host", "port", "advertised.host", "advertised.port" >> to >> > be not used under any circumstance, >> > then it's no different from removing them. >> > Anyway if there is still a concern about removing them, is there a new >> > major new version >> > now or in the future where I can remove them? >> > >> > Thanks! >> > Lucas >> > >> > On Mon, Sep 10, 2018 at 1:30 PM Ismael Juma <ism...@juma.me.uk> wrote: >> > >> >> To be clear, we can only remove configs in major new versions. >> Otherwise, >> >> we can only deprecate them. >> >> >> >> Ismael >> >> >> >> On Mon, Sep 10, 2018 at 10:47 AM Jun Rao <j...@confluent.io> wrote: >> >> >> >> > Hi, Lucas, >> >> > >> >> > For the network idlePct, your understanding is correct. Currently, >> >> > networkIdlePct metric is calculated as the average of (1 - io-ratio) >> in >> >> the >> >> > selector of all network threads. >> >> > >> >> > The metrics part looks good to me in the updated KIP. >> >> > >> >> > I am not still not quite sure about the configs. >> >> > >> >> > 100. "Whenever the "controller.listener.name" is set, upon broker >> >> startup, >> >> > we will validate its value and make sure it's different from the >> >> > *inter-broker-listener-name *value." Does that mean that >> >> > controller.listener.name has to be different from >> >> > inter.broker.listener.name? >> >> > That seems limiting. >> >> > >> >> > 101. The KIP says that advertised.listeners and listeners will now >> have >> >> a >> >> > different default value including controller. Could you document what >> >> the >> >> > default value looks like? >> >> > >> >> > 102. About removing the the following configs. How does that affect >> the >> >> > upgrade path? Do we now expect a user to add a new config when >> upgrading >> >> > from an old version to a new one? >> >> > host >> >> > port >> >> > advertised.host >> >> > advertised.port >> >> > >> >> > Thanks, >> >> > >> >> > Jun >> >> > >> >> > >> >> > On Thu, Sep 6, 2018 at 5:14 PM, Lucas Wang <lucasatu...@gmail.com> >> >> wrote: >> >> > >> >> > > @Jun Rao <j...@confluent.io> >> >> > > >> >> > > One clarification, currently on the selector level, we have the >> >> > > io-wait-ratio metric. >> >> > > For the new controller *network* thread, we can use it directly for >> >> > > IdlePct, instead of using 1- io-ratio, >> >> > > so that the logic is similar to the current average IdlePct for >> >> network >> >> > > threads. Is that correct? >> >> > > >> >> > > I've revised the KIP by adding two new metrics for measuring the >> >> IdlePct >> >> > > for the two additional threads. >> >> > > Please take a look again. Thanks! >> >> > > >> >> > > Lucas >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > On Wed, Sep 5, 2018 at 5:01 PM Jun Rao <j...@confluent.io> wrote: >> >> > > >> >> > > > Hi, Lucas, >> >> > > > >> >> > > > Thanks for the updated KIP. >> >> > > > >> >> > > > For monitoring the network thread utilization for the control >> >> plane, we >> >> > > > already have the metric io-ratio at the selector level (idlePct >> is >> >> 1 - >> >> > > > io-ratio). So, we just need to give that selector a meaningful >> name. >> >> > > > >> >> > > > For monitoring the io thread utilization for the control plane, >> it's >> >> > > > probably useful to have a separate metric for that. The >> controller >> >> > > request >> >> > > > queue size may not reflect the history in a window. >> >> > > > >> >> > > > Jun >> >> > > > >> >> > > > On Wed, Sep 5, 2018 at 3:38 PM, Lucas Wang < >> lucasatu...@gmail.com> >> >> > > wrote: >> >> > > > >> >> > > > > Thanks Jun for your quick response. It looks like I forgot to >> >> click >> >> > the >> >> > > > > "Update" button, :) >> >> > > > > It's updated now. >> >> > > > > >> >> > > > > Regarding the idle ratio metrics for the additional threads, I >> >> > > discussed >> >> > > > > with Joel, >> >> > > > > and think they are not as useful, and I added our reasoning in >> the >> >> > last >> >> > > > > paragraph of the >> >> > > > > "How are controller requests handled over the dedicated >> >> connections?" >> >> > > > > section. >> >> > > > > On the other hand, we don't strongly oppose adding them if you >> >> think >> >> > > they >> >> > > > > are necessary. >> >> > > > > >> >> > > > > Thanks, >> >> > > > > Lucas >> >> > > > > >> >> > > > > >> >> > > > > On Wed, Sep 5, 2018 at 3:12 PM Jun Rao <j...@confluent.io> >> wrote: >> >> > > > > >> >> > > > > > Hi, Lucas, >> >> > > > > > >> >> > > > > > Thanks for the reply. Have you actually updated the KIP? The >> >> wiki >> >> > > says >> >> > > > > that >> >> > > > > > it's last updated on Aug. 22. and some of the changes that >> you >> >> > > > mentioned >> >> > > > > > (#1 and #3) are not there. >> >> > > > > > >> >> > > > > > Also, regarding Joel's comment on network/request idle ratio >> >> > metrics, >> >> > > > > could >> >> > > > > > you comment on whether they include the new controller >> >> listener? If >> >> > > > not, >> >> > > > > do >> >> > > > > > we need additional metrics to measure the utilization of the >> io >> >> > > thread >> >> > > > > for >> >> > > > > > the control plane? >> >> > > > > > >> >> > > > > > Jun >> >> > > > > > >> >> > > > > > On Mon, Aug 27, 2018 at 6:26 PM, Lucas Wang < >> >> lucasatu...@gmail.com >> >> > > >> >> > > > > wrote: >> >> > > > > > >> >> > > > > > > Thanks for the comments, Jun. >> >> > > > > > > >> >> > > > > > > 1. I think the answer should be no, since the " >> >> > > > > > inter.broker.listener.name" >> >> > > > > > > are also used >> >> > > > > > > for replication traffic, and merging these two types of >> >> request >> >> > to >> >> > > > the >> >> > > > > > > single threaded tunnel >> >> > > > > > > would defeat the purpose of this KIP and also hurt >> replication >> >> > > > > > throughput. >> >> > > > > > > So I think that means >> >> > > > > > > we should validate to make sure when the new config is set, >> >> it's >> >> > > > > > different >> >> > > > > > > from "inter.broker.listener.name" >> >> > > > > > > or "security.inter.broker.protocol", whichever is set. >> >> > > > > > > >> >> > > > > > > 2. Normally all broker configs in a given cluster are >> changed >> >> at >> >> > > the >> >> > > > > same >> >> > > > > > > time. If there is a typo in the >> >> > > > > > > controller.listener.name and it's not available in the >> >> endpoints >> >> > > > list, >> >> > > > > > we >> >> > > > > > > could catch it, give an error >> >> > > > > > > and block restart of the first broker in the cluster. With >> >> that, >> >> > we >> >> > > > > could >> >> > > > > > > keep the current behavior >> >> > > > > > > in the KIP write up that falls back to >> >> > "inter.broker.listener.nam" >> >> > > > when >> >> > > > > > the >> >> > > > > > > "controller.listener.name" >> >> > > > > > > is not found during the migration phase of this KIP. >> Thoughts? >> >> > > > > > > >> >> > > > > > > 3. That makes sense, and I've changed it. >> >> > > > > > > >> >> > > > > > > Thanks, >> >> > > > > > > Lucas >> >> > > > > > > >> >> > > > > > > On Thu, Aug 23, 2018 at 3:46 PM Jun Rao <j...@confluent.io> >> >> > wrote: >> >> > > > > > > >> >> > > > > > > > Hi, Lucas, >> >> > > > > > > > >> >> > > > > > > > Sorry for the delay. The new proposal looks good to me >> >> > overall. A >> >> > > > few >> >> > > > > > > minor >> >> > > > > > > > comments below. >> >> > > > > > > > >> >> > > > > > > > 1. It's possible that listener.name.for.controller is >> set, >> >> but >> >> > > set >> >> > > > to >> >> > > > > > the >> >> > > > > > > > same value as inter.broker.listener.name. In that case, >> >> should >> >> > > we >> >> > > > > > have a >> >> > > > > > > > single network thread and the request handling thread for >> >> that >> >> > > > > > listener? >> >> > > > > > > > >> >> > > > > > > > 2. Currently, the controller always picks the listener >> >> > specified >> >> > > by >> >> > > > > > > > inter.broker.listener.name even if the listener name is >> not >> >> > > > present >> >> > > > > in >> >> > > > > > > the >> >> > > > > > > > receiving broker. This KIP proposes a slightly different >> >> > approach >> >> > > > for >> >> > > > > > > > picking listener.name.for.controller only when the >> receiving >> >> > end >> >> > > > has >> >> > > > > > the >> >> > > > > > > > listener and switches listener.name.for.controller >> >> otherwise. >> >> > > There >> >> > > > > are >> >> > > > > > > > some tradeoffs between the two approaches. To change the >> >> inter >> >> > > > broker >> >> > > > > > > > listener, the former requires 2 steps: (1) adding the new >> >> > > listener >> >> > > > to >> >> > > > > > > > listener list in every broker and (2) changing >> >> > > > > > > > listener.name.for.controller. >> >> > > > > > > > The latter can do both changes in 1 step. On the hand, if >> >> > > > > > > > listener.name.for.controller >> >> > > > > > > > is mis-configured, the former will report an error and >> the >> >> > latter >> >> > > > > will >> >> > > > > > > hide >> >> > > > > > > > it (so the user may not know the misconfiguration). It >> seems >> >> > that >> >> > > > we >> >> > > > > > > should >> >> > > > > > > > pick one approach to handle both >> >> listener.name.for.controller >> >> > and >> >> > > > > > > > inter.broker.listener.name consistently. To me, the >> former >> >> > seems >> >> > > > > > > slightly >> >> > > > > > > > better. >> >> > > > > > > > >> >> > > > > > > > 3. To be consistent with the existing naming, should >> >> > > > > > > > listener.name.for.controller >> >> > > > > > > > be controller.listener.name? >> >> > > > > > > > >> >> > > > > > > > Thanks, >> >> > > > > > > > >> >> > > > > > > > Jun >> >> > > > > > > > >> >> > > > > > > > >> >> > > > > > > > On Thu, Aug 9, 2018 at 3:21 PM, Lucas Wang < >> >> > > lucasatu...@gmail.com> >> >> > > > > > > wrote: >> >> > > > > > > > >> >> > > > > > > > > Hi Jun and Joel, >> >> > > > > > > > > >> >> > > > > > > > > The KIP writeup has changed by quite a bit since your >> +1. >> >> > > > > > > > > Can you please take another review? Thanks a lot for >> your >> >> > time! >> >> > > > > > > > > >> >> > > > > > > > > Lucas >> >> > > > > > > > > >> >> > > > > > > > > On Tue, Jul 17, 2018 at 10:33 AM, Joel Koshy < >> >> > > > jjkosh...@gmail.com> >> >> > > > > > > > wrote: >> >> > > > > > > > > >> >> > > > > > > > > > +1 on the KIP. >> >> > > > > > > > > > >> >> > > > > > > > > > (I'm not sure we actually necessary to introduce the >> >> > > condition >> >> > > > > > > > variables >> >> > > > > > > > > > for the concern that Jun raised, but it's an >> >> implementation >> >> > > > > detail >> >> > > > > > > that >> >> > > > > > > > > we >> >> > > > > > > > > > can defer to a discussion in the PR.) >> >> > > > > > > > > > >> >> > > > > > > > > > On Sat, Jul 14, 2018 at 10:45 PM, Lucas Wang < >> >> > > > > > lucasatu...@gmail.com> >> >> > > > > > > > > > wrote: >> >> > > > > > > > > > >> >> > > > > > > > > > > Hi Jun, >> >> > > > > > > > > > > >> >> > > > > > > > > > > I agree by using the conditional variables, there >> is >> >> no >> >> > > need >> >> > > > to >> >> > > > > > add >> >> > > > > > > > > such >> >> > > > > > > > > > a >> >> > > > > > > > > > > new config. >> >> > > > > > > > > > > Also thanks for approving this KIP. >> >> > > > > > > > > > > >> >> > > > > > > > > > > Lucas >> >> > > > > > > > > > > >> >> > > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > >> >> > > > > > > >> >> > > > > > >> >> > > > > >> >> > > > >> >> > > >> >> > >> >> >> > >> >