Re: [VOTE] KIP-801: Implement an Authorizer that stores metadata in __cluster_metadata

Colin McCabe Mon, 31 Jan 2022 16:40:35 -0800

On Mon, Jan 31, 2022, at 16:20, Jason Gustafson wrote:
> Hi Colin,
>
> The updates look good to me. I wanted to clarify a few details about the
> AUTHORIZER_NOT_READY error code:
>
> 1. Only the Authorizer implementation will know when it is finished
> initialization. I had imagined that any request received in this state
> which cannot be authorized based on static configuration (i.e super.users)
> would raise an AuthorizerNotReadyException (or something like that).  Is
> that what you had in mind?


Right. Currently only super.users will be allowed.

> 2. Do we need any version bumps of protocols to indicate compatibility with
> this error code? I think the main situation where we might see it is
> `Envelope` requests.
> 3. In the context of forwarding, I believe the plan was to not return
> AUTHORIZER_NOT_READY to clients. Instead, if the broker sees this error, it
> will backoff and retry the request. Is that right?
>

I restructured this part of the KIP a bit so that it discusses EnvelopeRequest 
specifically. I think this is the only request we want to add the new error 
code to. This also means that we don't need a new IBP, since the KRaft 
controllers have always relied on ApiVersions to determine what RPC formats to 
use.

I revised the KIP to make it more explicit that the broker should retry on 
getting AUTHORIZER_NOT_READY, that we don't need a new IBP, and that it applies 
only to EnvelopeRequest.

regards,
Colin

> Thanks,
> Jason
>
>
>
>
> On Mon, Jan 31, 2022 at 3:02 PM Colin McCabe <[email protected]> wrote:
>
>> Hi all,
>>
>> Thanks, everyone. I've updated the KIP.
>>
>> I reorganized the design section to explain bootstrapping a bit better,
>> and put it first. (It was always weird that the design section came after
>> the interface changes section :) ). I also added early.start.listeners and
>> a section on the other configs (which are the same as in AclAuthorizer).
>>
>> best,
>> Colin
>>
>> On Thu, Jan 27, 2022, at 16:38, Jason Gustafson wrote:
>> >> My intention here was to avoid exposing Kafka clients to
>> > AUTHORIZER_NOT_READY_ERROR. If this error is only ever returned to
>> internal
>> > components (such as the broker) then things are a lot simpler. Keep in
>> mind
>> > there are external management systems that assume that if the broker
>> ports
>> > accept traffic, the broker is "up". For example, Kubernetes operators,
>> > external dashboards and health checks, etc.
>> >
>> > What I had in mind is that AUTHORIZER_NOT_READY_ERROR would only be
>> > returned in Envelope responses (and only if the client principal is not a
>> > super user). The broker would retry the operation until the request
>> timeout
>> > after seeing this error, which is similar to how we handle controller
>> > unavailability in general. So the error is not seen by client requests.
>> It
>> > sounds like what you are suggesting instead is having a dedicated
>> listener
>> > for inter-cluster traffic? Can  you clarify the rules that we use to
>> decide
>> > which listener to send forwarded requests to?
>> >
>> >> We've discussed this issue in the past in the context of the broker.
>> > Should we fence if the broker can't apply metadata for a certain amount
>> of
>> > time? It's controversial because fencing may add more load to an already
>> > loaded cluster. However, without new metadata, we won't get new ACL
>> > changes, learn about new topics, etc. etc.
>> >
>> > Yep, I appreciate the concern. It is a bit different in the context of
>> the
>> > controller though. Without having access to the quorum, there is little a
>> > controller can do anyway. That is why we filed this issue:
>> > https://issues.apache.org/jira/browse/KAFKA-13621. That said, I agree it
>> > would be desirable if the same approach could be used by both the broker
>> > and controller.
>> >
>> > -Jason
>> >
>> >
>> > On Thu, Jan 27, 2022 at 4:14 PM Colin McCabe <[email protected]> wrote:
>> >
>> >> On Thu, Jan 27, 2022, at 15:52, Jason Gustafson wrote:
>> >> > Hey Colin,
>> >> >
>> >> > If we allow the `Authorizer` to raise `AUTHORIZER_NOT_READY_ERROR`,
>> then
>> >> > would we need `early.start.listeners`? This would give the
>> `Authorizer` a
>> >> > way to support partial startup, but it would be up to the
>> implementation
>> >> to
>> >> > decide whether to use it. For an `Authorizer` which depends only on an
>> >> > external service, there is no dependence on the metadata log, so
>> there is
>> >> > no reason to have partial startup. For the `StandardAuthorizer`, we
>> could
>> >> > let it return a completed future from `Authorizer.start` for all of
>> the
>> >> > controller listeners. Then we can let it authorize only super.user
>> >> > operations until the metadata log has been loaded (and raise
>> >> > AuthorizerNotReadyException for everything else until that time).
>> >> >
>> >>
>> >> Hi Jason,
>> >>
>> >> My intention here was to avoid exposing Kafka clients to
>> >> AUTHORIZER_NOT_READY_ERROR. If this error is only ever returned to
>> internal
>> >> components (such as the broker) then things are a lot simpler. Keep in
>> mind
>> >> there are external management systems that assume that if the broker
>> ports
>> >> accept traffic, the broker is "up". For example, Kubernetes operators,
>> >> external dashboards and health checks, etc.
>> >>
>> >> We can continue to uphold the guarantee that the broker is up if the
>> port
>> >> is responsive if we annotate only internal ports (like the controller
>> port,
>> >> or possibly inter-broker port) as "early start." This also helps with
>> the
>> >> case where people have some internal topic they want to be able to read
>> or
>> >> write from before turning on client traffic. For example, some people
>> send
>> >> Kafka metric data to an internal Kafka topic.
>> >>
>> >> >> Leaving aside the preceding discussion, do you agree with starting up
>> >> all
>> >> > endpoints (including non-early start ones) once we load a metadata
>> >> > snapshot? How feasible would it be for us to get a callback from the
>> Raft
>> >> > layer the first time we caught up to the last stable offset? (we only
>> >> want
>> >> > the callback the first time, not any other time). (I think the
>> metadata
>> >> > shell also would want something like this, at least as an option).
>> >> >
>> >> > It is certainly possible to get a callback when we've reached the high
>> >> > watermark, which seems like what we'd want. This does make me wonder
>> how
>> >> we
>> >> > would handle the case when a controller gets partitioned from the
>> rest of
>> >> > the quorum for some time. Should there be some point at which the
>> >> > controller no longer accepts authorization requests because of the
>> >> > potential staleness of the ACLs? Perhaps we could even reuse the
>> >> > `AUTHORIZER_NOT_READY_ERROR` error code in this condition. In other
>> >> words,
>> >> > maybe the readiness condition is more of a dynamic assertion that we
>> have
>> >> > caught up to near the current end of the log.
>> >> >
>> >>
>> >> We've discussed this issue in the past in the context of the broker.
>> >> Should we fence if the broker can't apply metadata for a certain amount
>> of
>> >> time? It's controversial because fencing may add more load to an already
>> >> loaded cluster. However, without new metadata, we won't get new ACL
>> >> changes, learn about new topics, etc. etc.
>> >>
>> >> This is probably something we should discuss in a separate KIP, since
>> it's
>> >> a bigger issue about how we handle nodes with stale metadata. And of
>> course
>> >> this issue does exist in the current ZK implementation -- you can talk
>> to a
>> >> broker that has been fenced from ZK for days, and there you are relying
>> on
>> >> whatever ACLs were in place when it could last talk to ZK. If we
>> implement
>> >> bounded staleness for KRaft, we may want to consider doing the same for
>> ZK,
>> >> for consistency.
>> >>
>> >> Another approach, which I suspect is more realistic in practice, is to
>> >> have a metric tracking staleness and fire off an alert when it grows too
>> >> high. Making the stale node inaccessible is sort of the nuclear option,
>> and
>> >> may tend to make a bad situation worse in practice.
>> >>
>> >> best,
>> >> Colin
>> >>
>> >>
>> >> >
>> >> > On Wed, Jan 26, 2022 at 10:23 AM Colin McCabe <[email protected]>
>> >> wrote:
>> >> >
>> >> >> On Wed, Jan 26, 2022, at 01:54, Rajini Sivaram wrote:
>> >> >> > Hi Colin,
>> >> >> >
>> >> >> > Thanks for the KIP. A couple of questions:
>> >> >> >
>> >> >> > 1) The KIP says:
>> >> >> > *However, when the controller is active, its Authorizer state may
>> be
>> >> >> > slightly ahead of the the broker's Authorizer state. This will
>> happen
>> >> in
>> >> >> > the time between when a new ACL is created (or deleted) and the
>> time
>> >> that
>> >> >> > this change is persisted durably by a majority of controller quorum
>> >> >> peers.*
>> >> >> >
>> >> >> > Once the ACL takes effect on the controller, do we guarantee that
>> it
>> >> will
>> >> >> > be applied everywhere or can there be scenarios where the
>> controller
>> >> has
>> >> >> > applied the change, but others don't due to a subsequent failure in
>> >> >> > persisting the ACL changes?
>> >> >>
>> >> >> Hi Rajini,
>> >> >>
>> >> >> Thanks for taking a look.
>> >> >>
>> >> >> Good question. In general, for any new metadata record (not just ACL
>> >> >> records), there are two fates:
>> >> >> 1. the record will be persisted to the raft quorum.
>> >> >> 2. the raft leader will fail and a new leader will be elected that
>> >> doesn't
>> >> >> have the new record.
>> >> >>
>> >> >> In the case of #2, the standby controllers and the brokers will never
>> >> >> apply the record.
>> >> >>
>> >> >> In general the active controller always rolls back its state to the
>> last
>> >> >> committed offset if there is a failure and it loses the leadership.
>> So
>> >> this
>> >> >> isn't really unique to the KRaft Authorizer, it just seemed worth
>> >> pointing
>> >> >> out since there will be a brief period when the active controller is
>> >> >> "ahead."
>> >> >>
>> >> >> We do try pretty hard to apply KIP-801 ACL records in order so that
>> the
>> >> >> states you will see on brokers and standby controllers will always be
>> >> valid
>> >> >> states from some point in the past timeline. The consistency here
>> >> should be
>> >> >> at least as good as the current system.
>> >> >>
>> >> >> >
>> >> >> > 2) Have we considered using a different config with limited
>> privileges
>> >> >> for
>> >> >> > bootstrapping instead of the all-powerful *super.users*?
>> >> >> >
>> >> >>
>> >> >> That's an interesting idea, but I'm not sure how much we could limit
>> it.
>> >> >> The brokers and controllers at least need CLUSTER_ACTION on CLUSTER
>> in
>> >> >> order to do most of what they do. This might be something we could
>> >> explore
>> >> >> in a future KIP since it's pretty cross-cutting (if we had such a
>> >> limited
>> >> >> bootstrapping user, all the authorizers could implement it, not just
>> the
>> >> >> KIP-801 one...)
>> >> >>
>> >> >> best,
>> >> >> Colin
>> >> >>
>> >> >> >
>> >> >> > Regards,
>> >> >> >
>> >> >> > Rajini
>> >> >> >
>> >> >> >
>> >> >> > On Wed, Jan 26, 2022 at 1:50 AM Colin McCabe <[email protected]>
>> >> wrote:
>> >> >> >
>> >> >> >> How about this:
>> >> >> >>
>> >> >> >> We create a configuration key called early.start.listeners which
>> >> >> contains
>> >> >> >> a list of listener names. If this is not specified, its value
>> >> defaults
>> >> >> to
>> >> >> >> just the controller listeners. Optionally, other listeners can be
>> >> added
>> >> >> too.
>> >> >> >>
>> >> >> >> If super.users contains any user names, early start listeners will
>> >> start
>> >> >> >> immediately. In the beginning they only authorize users that are
>> in
>> >> >> >> super.users. All other listeners receive a new error code,
>> >> >> >> AUTHORIZER_NOT_READY_ERROR. If super.users does not contain any
>> user
>> >> >> names,
>> >> >> >> then early start listeners will not be treated differently than
>> other
>> >> >> >> listeners.
>> >> >> >>
>> >> >> >> This will allow the controller listeners to get started
>> immediately
>> >> if
>> >> >> the
>> >> >> >> broker user is in super.users, which will speed up startup. It
>> also
>> >> >> will be
>> >> >> >> useful for breaking chicken/egg cycles like needing to pull the
>> SCRAM
>> >> >> >> metadata to authorize pulling the SCRAM metadata.
>> >> >> >>
>> >> >> >> There are still a few use cases where super.users won't be
>> required,
>> >> but
>> >> >> >> it may be useful in many cases to have this early start
>> >> functionality.
>> >> >> >>
>> >> >> >> Leaving aside the preceding discussion, do you agree with
>> starting up
>> >> >> all
>> >> >> >> endpoints (including non-early start ones) once we load a metadata
>> >> >> >> snapshot? How feasible would it be for us to get a callback from
>> the
>> >> >> Raft
>> >> >> >> layer the first time we caught up to the last stable offset? (we
>> only
>> >> >> want
>> >> >> >> the callback the first time, not any other time). (I think the
>> >> metadata
>> >> >> >> shell also would want something like this, at least as an option).
>> >> >> >>
>> >> >> >> best,
>> >> >> >> Colin
>> >> >> >>
>> >> >> >>
>> >> >> >> On Tue, Jan 25, 2022, at 13:34, Jason Gustafson wrote:
>> >> >> >> > Hi Colin,
>> >> >> >> >
>> >> >> >> > Thanks for the writeup. I had one question about bootstrapping.
>> For
>> >> >> the
>> >> >> >> > brokers, I understand that listener startup is delayed until the
>> >> >> >> Authorizer
>> >> >> >> > is ready. However, I was not very clear how this would work for
>> the
>> >> >> >> > controller listeners. We may need them to startup before the
>> >> metadata
>> >> >> log
>> >> >> >> > is ready so that a quorum can be established (as noted in the
>> KIP).
>> >> >> This
>> >> >> >> > works fine if we assume that the controller principals are among
>> >> >> >> > `super.users`. For requests forwarded from brokers, on the other
>> >> >> hand, we
>> >> >> >> > need to ensure the ACLs have been loaded properly before we
>> begin
>> >> >> >> > authorizing. The problem is that we currently use the same
>> listener
>> >> >> for
>> >> >> >> > quorum requests and for forwarded requests. So my question is
>> how
>> >> does
>> >> >> >> the
>> >> >> >> > Authorizer communicate to the controller when it is safe to
>> begin
>> >> >> >> > authorizing different request types?
>> >> >> >> >
>> >> >> >> > There are a couple ways I can see this working. First, we could
>> >> allow
>> >> >> the
>> >> >> >> > user to configure the listener used for forwarded requests
>> >> separately.
>> >> >> >> That
>> >> >> >> > would work with the existing `Authorizer.start` API.
>> Alternatively,
>> >> >> >> perhaps
>> >> >> >> > we could modify `Authorizer.start` to work with something more
>> >> >> granular
>> >> >> >> > than `EndPoint`. This would allow the controller to begin
>> accepting
>> >> >> >> > requests from the other quorum members before it is ready to
>> >> authorize
>> >> >> >> > forwarded requests from clients.  Then we would need some way to
>> >> let
>> >> >> >> > brokers know when the controller is ready to accept these
>> forwarded
>> >> >> >> > requests (e.g. through an error code in the `Envelope`
>> response).
>> >> >> >> >
>> >> >> >> > What do you think?
>> >> >> >> >
>> >> >> >> > Thanks,
>> >> >> >> > Jason
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > On Wed, Jan 12, 2022 at 12:57 PM David Arthur
>> >> >> >> > <[email protected]> wrote:
>> >> >> >> >
>> >> >> >> >> +1 binding, thanks Colin!
>> >> >> >> >>
>> >> >> >> >> On Mon, Dec 13, 2021 at 7:47 PM Colin McCabe <
>> [email protected]>
>> >> >> >> wrote:
>> >> >> >> >>
>> >> >> >> >> > Hi all,
>> >> >> >> >> >
>> >> >> >> >> > I'd like to start the vote on KIP-801: Implement an
>> Authorizer
>> >> that
>> >> >> >> >> stores
>> >> >> >> >> > metadata in __cluster_metadata
>> >> >> >> >> >
>> >> >> >> >> > The KIP is here:
>> https://cwiki.apache.org/confluence/x/h5KqCw
>> >> >> >> >> >
>> >> >> >> >> > The original DISCUSS thread is here:
>> >> >> >> >> >
>> >> >> >> >> >
>> >> https://lists.apache.org/thread/3d5o7h17ztjztjhblx4fln0wbbs1rmdq
>> >> >> >> >> >
>> >> >> >> >> > Please take a look and vote if you can.
>> >> >> >> >> >
>> >> >> >> >> > best,
>> >> >> >> >> > Colin
>> >> >> >> >> >
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> --
>> >> >> >> >> -David
>> >> >> >> >>
>> >> >> >>
>> >> >>
>> >>
>>

Re: [VOTE] KIP-801: Implement an Authorizer that stores metadata in __cluster_metadata

Reply via email to