On Mon, Jan 31, 2022, at 16:20, Jason Gustafson wrote: > Hi Colin, > > The updates look good to me. I wanted to clarify a few details about the > AUTHORIZER_NOT_READY error code: > > 1. Only the Authorizer implementation will know when it is finished > initialization. I had imagined that any request received in this state > which cannot be authorized based on static configuration (i.e super.users) > would raise an AuthorizerNotReadyException (or something like that). Is > that what you had in mind?
Right. Currently only super.users will be allowed. > 2. Do we need any version bumps of protocols to indicate compatibility with > this error code? I think the main situation where we might see it is > `Envelope` requests. > 3. In the context of forwarding, I believe the plan was to not return > AUTHORIZER_NOT_READY to clients. Instead, if the broker sees this error, it > will backoff and retry the request. Is that right? > I restructured this part of the KIP a bit so that it discusses EnvelopeRequest specifically. I think this is the only request we want to add the new error code to. This also means that we don't need a new IBP, since the KRaft controllers have always relied on ApiVersions to determine what RPC formats to use. I revised the KIP to make it more explicit that the broker should retry on getting AUTHORIZER_NOT_READY, that we don't need a new IBP, and that it applies only to EnvelopeRequest. regards, Colin > Thanks, > Jason > > > > > On Mon, Jan 31, 2022 at 3:02 PM Colin McCabe <[email protected]> wrote: > >> Hi all, >> >> Thanks, everyone. I've updated the KIP. >> >> I reorganized the design section to explain bootstrapping a bit better, >> and put it first. (It was always weird that the design section came after >> the interface changes section :) ). I also added early.start.listeners and >> a section on the other configs (which are the same as in AclAuthorizer). >> >> best, >> Colin >> >> On Thu, Jan 27, 2022, at 16:38, Jason Gustafson wrote: >> >> My intention here was to avoid exposing Kafka clients to >> > AUTHORIZER_NOT_READY_ERROR. If this error is only ever returned to >> internal >> > components (such as the broker) then things are a lot simpler. Keep in >> mind >> > there are external management systems that assume that if the broker >> ports >> > accept traffic, the broker is "up". For example, Kubernetes operators, >> > external dashboards and health checks, etc. >> > >> > What I had in mind is that AUTHORIZER_NOT_READY_ERROR would only be >> > returned in Envelope responses (and only if the client principal is not a >> > super user). The broker would retry the operation until the request >> timeout >> > after seeing this error, which is similar to how we handle controller >> > unavailability in general. So the error is not seen by client requests. >> It >> > sounds like what you are suggesting instead is having a dedicated >> listener >> > for inter-cluster traffic? Can you clarify the rules that we use to >> decide >> > which listener to send forwarded requests to? >> > >> >> We've discussed this issue in the past in the context of the broker. >> > Should we fence if the broker can't apply metadata for a certain amount >> of >> > time? It's controversial because fencing may add more load to an already >> > loaded cluster. However, without new metadata, we won't get new ACL >> > changes, learn about new topics, etc. etc. >> > >> > Yep, I appreciate the concern. It is a bit different in the context of >> the >> > controller though. Without having access to the quorum, there is little a >> > controller can do anyway. That is why we filed this issue: >> > https://issues.apache.org/jira/browse/KAFKA-13621. That said, I agree it >> > would be desirable if the same approach could be used by both the broker >> > and controller. >> > >> > -Jason >> > >> > >> > On Thu, Jan 27, 2022 at 4:14 PM Colin McCabe <[email protected]> wrote: >> > >> >> On Thu, Jan 27, 2022, at 15:52, Jason Gustafson wrote: >> >> > Hey Colin, >> >> > >> >> > If we allow the `Authorizer` to raise `AUTHORIZER_NOT_READY_ERROR`, >> then >> >> > would we need `early.start.listeners`? This would give the >> `Authorizer` a >> >> > way to support partial startup, but it would be up to the >> implementation >> >> to >> >> > decide whether to use it. For an `Authorizer` which depends only on an >> >> > external service, there is no dependence on the metadata log, so >> there is >> >> > no reason to have partial startup. For the `StandardAuthorizer`, we >> could >> >> > let it return a completed future from `Authorizer.start` for all of >> the >> >> > controller listeners. Then we can let it authorize only super.user >> >> > operations until the metadata log has been loaded (and raise >> >> > AuthorizerNotReadyException for everything else until that time). >> >> > >> >> >> >> Hi Jason, >> >> >> >> My intention here was to avoid exposing Kafka clients to >> >> AUTHORIZER_NOT_READY_ERROR. If this error is only ever returned to >> internal >> >> components (such as the broker) then things are a lot simpler. Keep in >> mind >> >> there are external management systems that assume that if the broker >> ports >> >> accept traffic, the broker is "up". For example, Kubernetes operators, >> >> external dashboards and health checks, etc. >> >> >> >> We can continue to uphold the guarantee that the broker is up if the >> port >> >> is responsive if we annotate only internal ports (like the controller >> port, >> >> or possibly inter-broker port) as "early start." This also helps with >> the >> >> case where people have some internal topic they want to be able to read >> or >> >> write from before turning on client traffic. For example, some people >> send >> >> Kafka metric data to an internal Kafka topic. >> >> >> >> >> Leaving aside the preceding discussion, do you agree with starting up >> >> all >> >> > endpoints (including non-early start ones) once we load a metadata >> >> > snapshot? How feasible would it be for us to get a callback from the >> Raft >> >> > layer the first time we caught up to the last stable offset? (we only >> >> want >> >> > the callback the first time, not any other time). (I think the >> metadata >> >> > shell also would want something like this, at least as an option). >> >> > >> >> > It is certainly possible to get a callback when we've reached the high >> >> > watermark, which seems like what we'd want. This does make me wonder >> how >> >> we >> >> > would handle the case when a controller gets partitioned from the >> rest of >> >> > the quorum for some time. Should there be some point at which the >> >> > controller no longer accepts authorization requests because of the >> >> > potential staleness of the ACLs? Perhaps we could even reuse the >> >> > `AUTHORIZER_NOT_READY_ERROR` error code in this condition. In other >> >> words, >> >> > maybe the readiness condition is more of a dynamic assertion that we >> have >> >> > caught up to near the current end of the log. >> >> > >> >> >> >> We've discussed this issue in the past in the context of the broker. >> >> Should we fence if the broker can't apply metadata for a certain amount >> of >> >> time? It's controversial because fencing may add more load to an already >> >> loaded cluster. However, without new metadata, we won't get new ACL >> >> changes, learn about new topics, etc. etc. >> >> >> >> This is probably something we should discuss in a separate KIP, since >> it's >> >> a bigger issue about how we handle nodes with stale metadata. And of >> course >> >> this issue does exist in the current ZK implementation -- you can talk >> to a >> >> broker that has been fenced from ZK for days, and there you are relying >> on >> >> whatever ACLs were in place when it could last talk to ZK. If we >> implement >> >> bounded staleness for KRaft, we may want to consider doing the same for >> ZK, >> >> for consistency. >> >> >> >> Another approach, which I suspect is more realistic in practice, is to >> >> have a metric tracking staleness and fire off an alert when it grows too >> >> high. Making the stale node inaccessible is sort of the nuclear option, >> and >> >> may tend to make a bad situation worse in practice. >> >> >> >> best, >> >> Colin >> >> >> >> >> >> > >> >> > On Wed, Jan 26, 2022 at 10:23 AM Colin McCabe <[email protected]> >> >> wrote: >> >> > >> >> >> On Wed, Jan 26, 2022, at 01:54, Rajini Sivaram wrote: >> >> >> > Hi Colin, >> >> >> > >> >> >> > Thanks for the KIP. A couple of questions: >> >> >> > >> >> >> > 1) The KIP says: >> >> >> > *However, when the controller is active, its Authorizer state may >> be >> >> >> > slightly ahead of the the broker's Authorizer state. This will >> happen >> >> in >> >> >> > the time between when a new ACL is created (or deleted) and the >> time >> >> that >> >> >> > this change is persisted durably by a majority of controller quorum >> >> >> peers.* >> >> >> > >> >> >> > Once the ACL takes effect on the controller, do we guarantee that >> it >> >> will >> >> >> > be applied everywhere or can there be scenarios where the >> controller >> >> has >> >> >> > applied the change, but others don't due to a subsequent failure in >> >> >> > persisting the ACL changes? >> >> >> >> >> >> Hi Rajini, >> >> >> >> >> >> Thanks for taking a look. >> >> >> >> >> >> Good question. In general, for any new metadata record (not just ACL >> >> >> records), there are two fates: >> >> >> 1. the record will be persisted to the raft quorum. >> >> >> 2. the raft leader will fail and a new leader will be elected that >> >> doesn't >> >> >> have the new record. >> >> >> >> >> >> In the case of #2, the standby controllers and the brokers will never >> >> >> apply the record. >> >> >> >> >> >> In general the active controller always rolls back its state to the >> last >> >> >> committed offset if there is a failure and it loses the leadership. >> So >> >> this >> >> >> isn't really unique to the KRaft Authorizer, it just seemed worth >> >> pointing >> >> >> out since there will be a brief period when the active controller is >> >> >> "ahead." >> >> >> >> >> >> We do try pretty hard to apply KIP-801 ACL records in order so that >> the >> >> >> states you will see on brokers and standby controllers will always be >> >> valid >> >> >> states from some point in the past timeline. The consistency here >> >> should be >> >> >> at least as good as the current system. >> >> >> >> >> >> > >> >> >> > 2) Have we considered using a different config with limited >> privileges >> >> >> for >> >> >> > bootstrapping instead of the all-powerful *super.users*? >> >> >> > >> >> >> >> >> >> That's an interesting idea, but I'm not sure how much we could limit >> it. >> >> >> The brokers and controllers at least need CLUSTER_ACTION on CLUSTER >> in >> >> >> order to do most of what they do. This might be something we could >> >> explore >> >> >> in a future KIP since it's pretty cross-cutting (if we had such a >> >> limited >> >> >> bootstrapping user, all the authorizers could implement it, not just >> the >> >> >> KIP-801 one...) >> >> >> >> >> >> best, >> >> >> Colin >> >> >> >> >> >> > >> >> >> > Regards, >> >> >> > >> >> >> > Rajini >> >> >> > >> >> >> > >> >> >> > On Wed, Jan 26, 2022 at 1:50 AM Colin McCabe <[email protected]> >> >> wrote: >> >> >> > >> >> >> >> How about this: >> >> >> >> >> >> >> >> We create a configuration key called early.start.listeners which >> >> >> contains >> >> >> >> a list of listener names. If this is not specified, its value >> >> defaults >> >> >> to >> >> >> >> just the controller listeners. Optionally, other listeners can be >> >> added >> >> >> too. >> >> >> >> >> >> >> >> If super.users contains any user names, early start listeners will >> >> start >> >> >> >> immediately. In the beginning they only authorize users that are >> in >> >> >> >> super.users. All other listeners receive a new error code, >> >> >> >> AUTHORIZER_NOT_READY_ERROR. If super.users does not contain any >> user >> >> >> names, >> >> >> >> then early start listeners will not be treated differently than >> other >> >> >> >> listeners. >> >> >> >> >> >> >> >> This will allow the controller listeners to get started >> immediately >> >> if >> >> >> the >> >> >> >> broker user is in super.users, which will speed up startup. It >> also >> >> >> will be >> >> >> >> useful for breaking chicken/egg cycles like needing to pull the >> SCRAM >> >> >> >> metadata to authorize pulling the SCRAM metadata. >> >> >> >> >> >> >> >> There are still a few use cases where super.users won't be >> required, >> >> but >> >> >> >> it may be useful in many cases to have this early start >> >> functionality. >> >> >> >> >> >> >> >> Leaving aside the preceding discussion, do you agree with >> starting up >> >> >> all >> >> >> >> endpoints (including non-early start ones) once we load a metadata >> >> >> >> snapshot? How feasible would it be for us to get a callback from >> the >> >> >> Raft >> >> >> >> layer the first time we caught up to the last stable offset? (we >> only >> >> >> want >> >> >> >> the callback the first time, not any other time). (I think the >> >> metadata >> >> >> >> shell also would want something like this, at least as an option). >> >> >> >> >> >> >> >> best, >> >> >> >> Colin >> >> >> >> >> >> >> >> >> >> >> >> On Tue, Jan 25, 2022, at 13:34, Jason Gustafson wrote: >> >> >> >> > Hi Colin, >> >> >> >> > >> >> >> >> > Thanks for the writeup. I had one question about bootstrapping. >> For >> >> >> the >> >> >> >> > brokers, I understand that listener startup is delayed until the >> >> >> >> Authorizer >> >> >> >> > is ready. However, I was not very clear how this would work for >> the >> >> >> >> > controller listeners. We may need them to startup before the >> >> metadata >> >> >> log >> >> >> >> > is ready so that a quorum can be established (as noted in the >> KIP). >> >> >> This >> >> >> >> > works fine if we assume that the controller principals are among >> >> >> >> > `super.users`. For requests forwarded from brokers, on the other >> >> >> hand, we >> >> >> >> > need to ensure the ACLs have been loaded properly before we >> begin >> >> >> >> > authorizing. The problem is that we currently use the same >> listener >> >> >> for >> >> >> >> > quorum requests and for forwarded requests. So my question is >> how >> >> does >> >> >> >> the >> >> >> >> > Authorizer communicate to the controller when it is safe to >> begin >> >> >> >> > authorizing different request types? >> >> >> >> > >> >> >> >> > There are a couple ways I can see this working. First, we could >> >> allow >> >> >> the >> >> >> >> > user to configure the listener used for forwarded requests >> >> separately. >> >> >> >> That >> >> >> >> > would work with the existing `Authorizer.start` API. >> Alternatively, >> >> >> >> perhaps >> >> >> >> > we could modify `Authorizer.start` to work with something more >> >> >> granular >> >> >> >> > than `EndPoint`. This would allow the controller to begin >> accepting >> >> >> >> > requests from the other quorum members before it is ready to >> >> authorize >> >> >> >> > forwarded requests from clients. Then we would need some way to >> >> let >> >> >> >> > brokers know when the controller is ready to accept these >> forwarded >> >> >> >> > requests (e.g. through an error code in the `Envelope` >> response). >> >> >> >> > >> >> >> >> > What do you think? >> >> >> >> > >> >> >> >> > Thanks, >> >> >> >> > Jason >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> > On Wed, Jan 12, 2022 at 12:57 PM David Arthur >> >> >> >> > <[email protected]> wrote: >> >> >> >> > >> >> >> >> >> +1 binding, thanks Colin! >> >> >> >> >> >> >> >> >> >> On Mon, Dec 13, 2021 at 7:47 PM Colin McCabe < >> [email protected]> >> >> >> >> wrote: >> >> >> >> >> >> >> >> >> >> > Hi all, >> >> >> >> >> > >> >> >> >> >> > I'd like to start the vote on KIP-801: Implement an >> Authorizer >> >> that >> >> >> >> >> stores >> >> >> >> >> > metadata in __cluster_metadata >> >> >> >> >> > >> >> >> >> >> > The KIP is here: >> https://cwiki.apache.org/confluence/x/h5KqCw >> >> >> >> >> > >> >> >> >> >> > The original DISCUSS thread is here: >> >> >> >> >> > >> >> >> >> >> > >> >> https://lists.apache.org/thread/3d5o7h17ztjztjhblx4fln0wbbs1rmdq >> >> >> >> >> > >> >> >> >> >> > Please take a look and vote if you can. >> >> >> >> >> > >> >> >> >> >> > best, >> >> >> >> >> > Colin >> >> >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> >> >> -David >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>
