> One of the goals of KIP-584 (feature versioning) is that we can get rid of
IBP in the future. So does this change prevent us from removing IBP in the
future?

That is a good question. I think the problem here is that request
forwarding puts an expectation on api version support which covers more
than one broker. This is why the normal ApiVersions behavior doesn't work.
I thought about this a bit and haven't come up with a good alternative. One
thought I've been considering is letting the controller in the post-kip-500
world set the maximum range of api support for the cluster. However, even
then we would need some way to tell when the controller quorum itself is
ready to enable support for a new api version. My feeling is that we will
probably always need something like the IBP to control when it is safe to
expose versions of APIs which have a cross-broker dependence. However,
KIP-584 would still allow us to manage the IBP at the level of a feature so
that we don't need two rolling restarts anymore.

Best,
Jason




On Thu, Sep 24, 2020 at 1:59 PM Jun Rao <j...@confluent.io> wrote:

> Hi, Boyang,
>
> One of the goals of KIP-584 (feature versioning) is that we can get rid of
> IBP in the future. So does this change prevent us from removing IBP in the
> future?
>
> Thanks,
>
> Jun
>
> On Thu, Sep 24, 2020 at 12:46 PM Jason Gustafson <ja...@confluent.io>
> wrote:
>
> > Hey Boyang,
> >
> > Thanks for the update. This seems like the best thing we can do. The
> > alternative would be to always ensure that the forwarded APIs are safe
> for
> > conversion between versions, but that would restrict the flexibility that
> > the versioning is providing. It would also be a large effort to avoid
> > introducing regressions through conversion. Sadly this broadens the scope
> > of the IBP, but in fact forwarded APIs are inter-broker APIs.
> >
> > Thanks,
> > Jason
> >
> > On Thu, Sep 24, 2020 at 9:23 AM Boyang Chen <reluctanthero...@gmail.com>
> > wrote:
> >
> > > Hey there,
> > >
> > > we spotted a necessary case to handle the redirect request versioning,
> > and
> > > proposed the following changes:
> > >
> > > 1. For redirection RPCs (AlterConfig, Acl, Token etc), the
> corresponding
> > > allowed versions in the ApiVersionResponse will be affected by the
> entire
> > > cluster's versioning, not just the receiving broker, since we need to
> > > ensure the chosen version get properly handled by all parties. Thus
> from
> > > now on, RPC with redirection will be treated as inter-broker RPC, and
> any
> > > version bump for these RPCs has to go through IBP bump as well.
> > > ApiVersionResponse will take IBP into considerations for the
> redirection
> > > RPCs allowable versions.
> > >
> > > 2. We would do the best effort to maintain the same request version for
> > > the entire admin client -> receiving broker -> controller broker path,
> > but
> > > for old RPC versions, they may not have flexible fields introduced yet.
> > > Thus, we would have to upgrade the RPC to the minimum version which
> > > supports flexible fields
> > > and add another tagged field in the header called
> > `OriginalRequestVersion`
> > > to help the controller broker correctly deserialize the request with
> the
> > > original admin client sent out version. We would not downgrade the
> > original
> > > request in any circumstance, since the flexible field support is
> required
> > > to be open-ended on the high side.
> > >
> > > Let me know if you have any questions.
> > >
> > > Best,
> > > Boyang
> > >
> > > On Thu, Aug 6, 2020 at 6:11 PM Boyang Chen <reluctanthero...@gmail.com
> >
> > > wrote:
> > >
> > > > Hey there,
> > > >
> > > > we are going to introduce a minor change to bump the version of
> several
> > > > RPCs which are currently not supporting flexible versions. It is
> > > necessary
> > > > because they need to be able to construct request header with initial
> > > > principal name and client id as optional fields for redirection. The
> > are
> > > > only two of them:
> > > >
> > > > 1. AlterConfig
> > > > 2. AlterClientQuotas
> > > >
> > > > Let me know if you have any questions.
> > > >
> > > > Boyang
> > > >
> > > > On Fri, Jul 31, 2020 at 11:42 AM Boyang Chen <
> > reluctanthero...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > >> Hey David,
> > > >>
> > > >> After discussing with Colin offline, I would like to correct one
> case
> > in
> > > >> the described workflow, where the CLUSTER_ACTION authorization would
> > > not be
> > > >> based on the initial principal field check, because it is not a
> > secured
> > > >> condition which anyone could forge. The revised workflow shall be:
> > > >>
> > > >> Step 1. Filter out resources that are authorized
> > > >>          1.1 Use traditional principals to verify first. If
> > authorized,
> > > >> continue
> > > >>          1.2 If not authorized, check whether the request is from
> the
> > > >> control plane. Note that this is a best-effort to verify whether the
> > > >> request is internal.
> > > >>          1.3 If the request is not from the control plane, return
> > > >> authorization failure
> > > >>          1.4 If the request is from the control plane, use
> > > CLUSTER_ACTION
> > > >> to verify and determine the result
> > > >>
> > > >> Step 2. Check the request context to see if this is a forwarding
> > > request,
> > > >> by checking whether it is from control plane and uses extra header
> > > fields
> > > >>         2.1 if the resource is authorized, and if this is the active
> > > >> controller, process it
> > > >>         2.2 if the resource is authorized but this is not the active
> > > >> controller, return NOT_CONTROLLER to the sender (forwarding broker)
> > for
> > > >> retry
> > > >>         2.3 if the resource is not authorized, return
> > > >> CLUSTER_AUTHORIZATION_FAILURE to propagate back to the original
> client
> > > >> through forwarding broker
> > > >> Step 3. If the request is not a forwarding request
> > > >>         3.1 If the resource is authorized, and this is the active
> > > >> controller, process it
> > > >>         3.2 If the resource is authorized, but this is not active
> > > >> controller, put the resource into the preparation for a new
> > AlterConfig
> > > >> request for forwarding
> > > >>         3.3 If the resource is not authorized, reply the original
> > client
> > > >> AUTHORIZATION_FAILURE when the forwarding request is returned
> > > >>
> > > >> On Thu, Jul 30, 2020 at 3:47 PM Boyang Chen <
> > reluctanthero...@gmail.com
> > > >
> > > >> wrote:
> > > >>
> > > >>>
> > > >>>
> > > >>> On Thu, Jul 30, 2020 at 7:18 AM David Jacot <dja...@confluent.io>
> > > wrote:
> > > >>>
> > > >>>> Hi Boyang,
> > > >>>>
> > > >>>> Thanks for your answers.
> > > >>>>
> > > >>>> > The point for using the listener name is more of a security
> > purpose,
> > > >>>> to
> > > >>>> > detect any forged request to our best effort.
> > > >>>> > For throttling I think we could just check the request header
> for
> > > >>>> > *InitialClientId* existence, to distinguish whether to apply
> > > >>>> > throttling strategy as forwarded request or direct request.
> > > >>>>
> > > >>>> Reading "security" and "best effort" in the same sentence makes
> me a
> > > >>>> little nervous :).
> > > >>>>
> > > >>>> The identification issue is also valid for quota as we don't want
> > one
> > > >>>> to be
> > > >>>> able to bypass the quota by forging a request as well, isn't it?
> > > >>>> Otherwise,
> > > >>>> anyone could just set the InitialPrincipal to bypass it. I think
> > that
> > > we
> > > >>>> should
> > > >>>> only use InitialPrincipal and/or InitialClientId when we know that
> > > they
> > > >>>> come
> > > >>>> from another broker. Based on what I read in the KIP, it looks
> like
> > we
> > > >>>> could
> > > >>>> only use them when the principal has CLUSTER_ACTION privilege. Do
> I
> > > >>>> understand it correctly?
> > > >>>>
> > > >>>> There is no 100% safe way to distinguish between raw request and
> > > >>> forward request if you take malicious request into consideration,
> > which
> > > >>> could happen
> > > >>> anyway if the attacker prepares those requests to pass the checks
> > for a
> > > >>> pre-KIP-500 cluster.
> > > >>> We could at most know whether it is sent to the control plane, or
> the
> > > >>> data plane, and whether it has extra header fields such as initial
> > > >>> principal and client id defined. For a non-malicious request
> > > >>> going into the control plane, it must be sent from a valid broker,
> > > which
> > > >>> is a prerequisite to check its
> > > >>> CLUSTER_ACTION principal. Take AlterConfig for an example, the
> > intended
> > > >>> workflow for a *KIP-590 broker* would be:
> > > >>>
> > > >>> Step 1. Check the request context to see if this is a forwarding
> > > >>> request, by checking whether it is from control plane and uses
> extra
> > > header
> > > >>> fields
> > > >>>         1.1 if it is a forwarding request, use CLUSTER_ACTION to
> > verify
> > > >>> the given resource
> > > >>>         1.2 if the resource is authorized, and if this is the
> active
> > > >>> controller, process it
> > > >>>         1.3 if the resource is authorized but this is not the
> active
> > > >>> controller, return NOT_CONTROLLER to the sender (forwarding broker)
> > for
> > > >>> retry
> > > >>>         1.4 if the resource is not authorized, return
> > > >>> CLUSTER_AUTHORIZATION_FAILURE to propagate back to the original
> > client
> > > >>> through forwarding broker
> > > >>> Step 2. If the request is not a forwarding request
> > > >>>         2.1 Verify with normal principal as ALTER on the given
> > > resources
> > > >>>         2.2 If the resource is authorized, and this is the active
> > > >>> controller, process it
> > > >>>         2.3 If the resource is authorized, but this is not active
> > > >>> controller, put the resource into the preparation for a new
> > AlterConfig
> > > >>> request for forwarding
> > > >>>         2.4 If the resource is not authorized, reply the original
> > > client
> > > >>> AUTHORIZATION_FAILURE when the forwarding request is returned
> > > >>>
> > > >>> When the control plane and data plane are using the same listener,
> we
> > > >>> couldn't distinguish whether a request is forwarded or not for
> sure,
> > > so in
> > > >>> terms of the forward request checking, we have to require the extra
> > > header
> > > >>> fields to present. A stronger checking mechanism could only be
> > enforced
> > > >>> when we upgrade to post-KIP-500 with a separate controller network.
> > > >>>
> > > >>>
> > > >>>> I have made another pass on the whole KIP, I have few nits:
> > > >>>>
> > > >>>> - The sentence "Take AlterConfig as an example to understand the
> > > changes
> > > >>>> we are making." does not make much sense anymore in the beginning
> of
> > > the
> > > >>>> "Proposed Changes" chapter.
> > > >>>>
> > > >>>> Sure, deleted.
> > > >>>
> > > >>>
> > > >>>> - When you say "Existing RPCs which are sending directly to the
> > > >>>> controller
> > > >>>> will
> > > >>>> rely on forwarding as well.". I suggest to explicitly mention how
> > "old
> > > >>>> admin clients"
> > > >>>> will work here to complement the sentence. Something like: They
> will
> > > >>>> get a
> > > >>>> random
> > > >>>> broker id as the controller id in the metadata response and stick
> to
> > > it
> > > >>>> as
> > > >>>> you explained.
> > > >>>>
> > > >>>> Sg, changed.
> > > >>>
> > > >>>
> > > >>>> - "The purpose of adding principal name is for the audit logging,
> > and
> > > >>>> the
> > > >>>> client id is
> > > >>>> being used to throttling according to KIP-599 requirement."
> > Actually,
> > > >>>> KIP-599 needs
> > > >>>> both the principal and the clientId.
> > > >>>>
> > > >>>> Makes sense.
> > > >>>
> > > >>>
> > > >>>> - In the "Routing Request Security" chapter. It is written that
> the
> > > >>>> forwarding broker
> > > >>>> will verify the request with its own authorizer and will just
> > forward
> > > >>>> it if
> > > >>>> the request
> > > >>>> looks good. When a request contains for instance multiple topics,
> I
> > > >>>> suppose
> > > >>>> that
> > > >>>> we will forward only the authorized ones and not the whole
> original
> > > >>>> request
> > > >>>> as is.
> > > >>>> We may want to reword the sentence to make this clear.
> > > >>>>
> > > >>>> That makes sense, I will put this more detailed version of routing
> > > into
> > > >>> the design.
> > > >>>
> > > >>>
> > > >>>> - For the record, should we put the previous proposal in the
> > rejected
> > > >>>> alternatives as
> > > >>>> well?
> > > >>>>
> > > >>>> We do have big changes in this KIP, our current strategy is to
> rely
> > on
> > > >>> wiki revisions if people
> > > >>> are interested to figure out our previous design strategy. Putting
> > the
> > > >>> full design in current KIP proposal
> > > >>> would distract too much for readers as we discussed during the last
> > > time
> > > >>> of big refactoring.
> > > >>>
> > > >>> Best,
> > > >>>> David
> > > >>>>
> > > >>>> On Thu, Jul 30, 2020 at 3:51 AM Boyang Chen <
> > > reluctanthero...@gmail.com
> > > >>>> >
> > > >>>> wrote:
> > > >>>>
> > > >>>> > Thanks David for the feedback!
> > > >>>> >
> > > >>>> > On Wed, Jul 29, 2020 at 7:53 AM David Jacot <
> dja...@confluent.io>
> > > >>>> wrote:
> > > >>>> >
> > > >>>> > > Hi, Colin, Boyang,
> > > >>>> > >
> > > >>>> > > Colin, thanks for the clarification. Somehow, I thought that
> > even
> > > >>>> if the
> > > >>>> > > controller is ran independently, it
> > > >>>> > > would still run the listeners of the broker and thus would be
> > > >>>> accessible
> > > >>>> > by
> > > >>>> > > redirecting on the loopback
> > > >>>> > > interface. My mistake.
> > > >>>> > >
> > > >>>> > > Boyang, I have few questions/comments regarding the updated
> KIP:
> > > >>>> > >
> > > >>>> > > 1. I think that it would be great if we could clarify how old
> > > admin
> > > >>>> > clients
> > > >>>> > > which are directly talking to the
> > > >>>> > > controller will work with this KIP. I read between the lines
> > that,
> > > >>>> as we
> > > >>>> > > propose to provide a random
> > > >>>> > > broker Id as the controller Id in the metadata response, they
> > will
> > > >>>> use a
> > > >>>> > > single node as a proxy. Is that
> > > >>>> > > correct? This deserves to be called out more explicitly in the
> > > >>>> design
> > > >>>> > > section instead of being hidden
> > > >>>> > > in the protocol bump of the metadata RPC.
> > > >>>> > >
> > > >>>> > > Makes sense, I stress this point in the compatibility section.
> > > >>>> >
> > > >>>> >
> > > >>>> > > 1.1 If I understand correctly, could we assume that old admin
> > > >>>> clients
> > > >>>> > will
> > > >>>> > > stick to the same "fake controller"
> > > >>>> > > until they refresh their metadata? Refreshing the metadata
> > usually
> > > >>>> > happens
> > > >>>> > > when NOT_CONTROLLER
> > > >>>> > > is received but this won't happen anymore so they should
> change
> > > >>>> > > infrequently.
> > > >>>> > >
> > > >>>> > > That is correct, old admin clients would not try to refresh
> > their
> > > >>>> > metadata
> > > >>>> > due to NOT_CONTROLLER,
> > > >>>> > which is impossible to happen with the new broker cluster.
> > > >>>> >
> > > >>>> >
> > > >>>> > > 2. For the new admin client, I suppose that we plan on using
> > > >>>> > > LeastLoadedNodeProvider for the
> > > >>>> > > requests that are using ControllerNodeProvider. We could
> perhaps
> > > >>>> mention
> > > >>>> > > it.
> > > >>>> > >
> > > >>>> > > Sure, added.
> > > >>>> >
> > > >>>> >
> > > >>>> > > 3. Pre KIP-500, will we have a way to distinguish if a request
> > > that
> > > >>>> is
> > > >>>> > > received by the controller is
> > > >>>> > > coming directly from a client or from a broker? You mention
> that
> > > the
> > > >>>> > > listener can be used to do
> > > >>>> > > this but as you pointed out, it is not mandatory. Do we have
> > > another
> > > >>>> > > reliable method? I am asking
> > > >>>> > > in the context of KIP-599 with the current controller, we may
> > need
> > > >>>> to
> > > >>>> > > throttle differently if the
> > > >>>> > > request comes from a client or from a broker.
> > > >>>> > >
> > > >>>> > > The point for using the listener name is more of a security
> > > >>>> purpose, to
> > > >>>> > detect any forged request to our best effort.
> > > >>>> > For throttling I think we could just check the request header
> for
> > > >>>> > *InitialClientId* existence, to distinguish whether to apply
> > > >>>> > throttling strategy as forwarded request or direct request.
> > > >>>> >
> > > >>>> >
> > > >>>> > > 4. Could we add `InitialClientId` as well? This will be
> required
> > > >>>> for the
> > > >>>> > > quota as we can apply them
> > > >>>> > > by principal and/or clientId.
> > > >>>> > >
> > > >>>> > > Sounds good, added.
> > > >>>> >
> > > >>>> >
> > > >>>> > > 5. A small remark regarding the structure of the KIP. It is a
> > bit
> > > >>>> weird
> > > >>>> > > that requests that do not go
> > > >>>> > > to the controller are mentioned in the Proposed Design section
> > and
> > > >>>> the
> > > >>>> > > requests that go to the
> > > >>>> > > controller are mentioned in the Public Interfaces. When one
> read
> > > the
> > > >>>> > > Proposed Design, it does not
> > > >>>> > > get a full picture of the whole new routing proposal for old
> and
> > > new
> > > >>>> > > clients. It would be great if we
> > > >>>> > > could have a full overview in that section.
> > > >>>> > >
> > > >>>> > > Good point, I will move the pieces around.
> > > >>>> >
> > > >>>> >
> > > >>>> > > Overall the change makes sense to me. I will work on drafting
> an
> > > >>>> addendum
> > > >>>> > > to KIP-599 to
> > > >>>> > > alter the design to cope with these changes. At a first
> glance,
> > > that
> > > >>>> > seems
> > > >>>> > > doable if 1.1, 3
> > > >>>> > > and 4 are OK.
> > > >>>> > >
> > > >>>> > > Thank you for the help!
> > > >>>> >
> > > >>>> >
> > > >>>> > > Thanks,
> > > >>>> > > David
> > > >>>> > >
> > > >>>> > > On Wed, Jul 29, 2020 at 5:29 AM Boyang Chen <
> > > >>>> reluctanthero...@gmail.com>
> > > >>>> > > wrote:
> > > >>>> > >
> > > >>>> > > > Thanks for the feedback Colin!
> > > >>>> > > >
> > > >>>> > > > On Tue, Jul 28, 2020 at 2:11 PM Colin McCabe <
> > > cmcc...@apache.org>
> > > >>>> > wrote:
> > > >>>> > > >
> > > >>>> > > > > Hi Boyang,
> > > >>>> > > > >
> > > >>>> > > > > Thanks for updating this.  A few comments below:
> > > >>>> > > > >
> > > >>>> > > > > In the "Routing Request Security" section, there is a
> > > reference
> > > >>>> to
> > > >>>> > > "older
> > > >>>> > > > > requests that need redirection."  But after these new
> > > >>>> revisions, both
> > > >>>> > > new
> > > >>>> > > > > and old requests need redirection.  So we should rephrase
> > > this.
> > > >>>> > > > >
> > > >>>> > > > > > In addition, to avoid exposing this forwarding power to
> > the
> > > >>>> admin
> > > >>>> > > > > clients,
> > > >>>> > > > > > the routing request shall be forwarded towards the
> > > controller
> > > >>>> > broker
> > > >>>> > > > > internal
> > > >>>> > > > > > endpoint which should be only visible to other brokers
> > > inside
> > > >>>> the
> > > >>>> > > > > cluster
> > > >>>> > > > > > in the KIP-500 controller. Any admin configuration
> request
> > > >>>> with
> > > >>>> > > broker
> > > >>>> > > > > > principal should not be going through the public
> endpoint
> > > and
> > > >>>> will
> > > >>>> > be
> > > >>>> > > > > > rejected for security purpose.
> > > >>>> > > > >
> > > >>>> > > > > We should also describe how this will work in the
> > pre-KIP-500
> > > >>>> case.
> > > >>>> > In
> > > >>>> > > > > that case, CLUSTER_ACTION gets the extra permissions
> > described
> > > >>>> here
> > > >>>> > > only
> > > >>>> > > > > when the message comes in on the inter-broker listener.
> We
> > > >>>> should
> > > >>>> > > state
> > > >>>> > > > > that here.
> > > >>>> > > > >
> > > >>>> > > > > (I can see that you have this information later on in the
> > > >>>> "Security
> > > >>>> > > > Access
> > > >>>> > > > > Changes" section, but it would be good to have it here as
> > > well,
> > > >>>> to
> > > >>>> > > avoid
> > > >>>> > > > > confusion.)
> > > >>>> > > > >
> > > >>>> > > > > > To be more strict of protecting controller information,
> > the
> > > >>>> > > > > "ControllerId"
> > > >>>> > > > > > field in new MetadataResponse shall be set to -1 when
> the
> > > >>>> original
> > > >>>> > > > > request
> > > >>>> > > > > > comes from a non-broker client and it is already on v10.
> > We
> > > >>>> shall
> > > >>>> > use
> > > >>>> > > > the
> > > >>>> > > > > > request listener name to distinguish whether a given
> > request
> > > >>>> is
> > > >>>> > > > > inter-broker,
> > > >>>> > > > > > or from the client.
> > > >>>> > > > >
> > > >>>> > > > > I'm not sure why we would need to distinguish between
> broker
> > > >>>> clients
> > > >>>> > > and
> > > >>>> > > > > non-broker clients.  Brokers don't generally send
> > > >>>> MetadataRequests to
> > > >>>> > > > other
> > > >>>> > > > > brokers, do they?  Brokers learn about metadata from
> > > >>>> > > > UpdateMetadataRequest
> > > >>>> > > > > and LeaderAndIsrRequest, not by sending MetadataRequests
> to
> > > >>>> other
> > > >>>> > > > brokers.
> > > >>>> > > > >
> > > >>>> > > > > We do have one use case where the MetadataRequest gets
> sent
> > > >>>> between
> > > >>>> > the
> > > >>>> > > > brokers, which is the InterBrokerSendThread. Currently we
> > don't
> > > >>>> rely on
> > > >>>> > > it
> > > >>>> > > > to get the controller id, so I guess your suggestion should
> be
> > > >>>> good to
> > > >>>> > > > enforce. We could use some meta comment on the NetworkClient
> > > that
> > > >>>> it
> > > >>>> > > should
> > > >>>> > > > not be used to get the controller location.
> > > >>>> > > >
> > > >>>> > > > Probably what we want here is: v0-v9: return a random broker
> > in
> > > >>>> the
> > > >>>> > > cluster
> > > >>>> > > > > as the controller ID.  v10: no controllerID present in the
> > > >>>> > > > > MetadataResponse.  We should also deprecate the
> adminClient
> > > >>>> method
> > > >>>> > > which
> > > >>>> > > > > gets the controllerId.
> > > >>>> > > > >
> > > >>>> > > > > > BROKER_AUTHORIZATION_FAILURE(92, "Authorization failed
> for
> > > the
> > > >>>> > > > > > request during forwarding, this indicates an internal
> > error
> > > >>>> on the
> > > >>>> > > > broker
> > > >>>> > > > > > cluster security setup.",
> > > >>>> > BrokerAuthorizationFailureException::new);
> > > >>>> > > > >
> > > >>>> > > > > Grammar nitpick: It would be good to have a period between
> > > >>>> > "forwarding"
> > > >>>> > > > > and "this" to avoid a run-on sentence :)
> > > >>>> > > > >
> > > >>>> > > > > best,
> > > >>>> > > > > Colin
> > > >>>> > > > >
> > > >>>> > > > >
> > > >>>> > > > > On Mon, Jul 27, 2020, at 21:47, Boyang Chen wrote:
> > > >>>> > > > > > Hey there,
> > > >>>> > > > > >
> > > >>>> > > > > > I'm re-opening this thread because after some initial
> > > >>>> > implementations
> > > >>>> > > > > > started, we spotted some gaps in the approved KIP as
> well
> > as
> > > >>>> some
> > > >>>> > > > > > inconsistencies with KIP-631 controller. There are a
> > couple
> > > of
> > > >>>> > > > addendums
> > > >>>> > > > > to
> > > >>>> > > > > > the existing KIP, specifically:
> > > >>>> > > > > >
> > > >>>> > > > > > 1. As the controller is foreseen to be only accessible
> to
> > > the
> > > >>>> > > brokers,
> > > >>>> > > > > the
> > > >>>> > > > > > new admin client would not have direct access to the
> > > >>>> controller. It
> > > >>>> > > is
> > > >>>> > > > > > guaranteed on the MetadataResponse level which no longer
> > > >>>> provides
> > > >>>> > > > > > `ControllerId` to client side requests.
> > > >>>> > > > > >
> > > >>>> > > > > > 2. The broker would forward any direct ZK path mutation
> > > >>>> requests,
> > > >>>> > > > > including
> > > >>>> > > > > > topic creation/deletion, reassignment, etc since we
> > > deprecate
> > > >>>> the
> > > >>>> > > > direct
> > > >>>> > > > > > controller access on the client side. No more protocol
> > > >>>> version bump
> > > >>>> > > is
> > > >>>> > > > > > necessary for the configuration requests.
> > > >>>> > > > > >
> > > >>>> > > > > > 3. To make sure forwarding requests pass the
> > authorization,
> > > >>>> broker
> > > >>>> > > > > > principal CLUSTER_ACTION would be allowed to be used as
> an
> > > >>>> > > alternative
> > > >>>> > > > > > authentication method for a variety of principal
> > operations,
> > > >>>> > > including
> > > >>>> > > > > > ALTER, ALTER_CONFIG, DELETE, etc. It is because the
> > > forwarding
> > > >>>> > > request
> > > >>>> > > > > > needs to use the proxy broker's own principal, which is
> > > >>>> currently
> > > >>>> > not
> > > >>>> > > > > > supported to be used for many configuration change
> > > >>>> authentication
> > > >>>> > > > listed
> > > >>>> > > > > > above. The full list could be found in the KIP.
> > > >>>> > > > > >
> > > >>>> > > > > > 4. Add a new BROKER_AUTHORIZATION_FAILURE error code to
> > > >>>> indicate
> > > >>>> > any
> > > >>>> > > > > > internal security configuration failure, when the
> > forwarded
> > > >>>> request
> > > >>>> > > > > failed
> > > >>>> > > > > > authentication on the controller side.
> > > >>>> > > > > >
> > > >>>> > > > > > Let me know what you think. With such a major refinement
> > of
> > > >>>> the
> > > >>>> > KIP,
> > > >>>> > > > I'm
> > > >>>> > > > > > open for re-vote after discussions converge.
> > > >>>> > > > > >
> > > >>>> > > > > > Boyang
> > > >>>> > > > > >
> > > >>>> > > > > > On Wed, Jul 1, 2020 at 2:17 PM Boyang Chen <
> > > >>>> > > reluctanthero...@gmail.com
> > > >>>> > > > >
> > > >>>> > > > > > wrote:
> > > >>>> > > > > >
> > > >>>> > > > > > > Hey folks,
> > > >>>> > > > > > >
> > > >>>> > > > > > > I have also synced on the KIP-578 which was doing the
> > > >>>> partition
> > > >>>> > > > limit,
> > > >>>> > > > > to
> > > >>>> > > > > > > make sure the partition limit error code would be
> > properly
> > > >>>> > > propagated
> > > >>>> > > > > once
> > > >>>> > > > > > > it is done on top of KIP-590. Let me know if you have
> > > >>>> further
> > > >>>> > > > > questions or
> > > >>>> > > > > > > concerns.
> > > >>>> > > > > > >
> > > >>>> > > > > > > Boyang
> > > >>>> > > > > > >
> > > >>>> > > > > > > On Tue, Jun 23, 2020 at 5:08 PM Boyang Chen <
> > > >>>> > > > > reluctanthero...@gmail.com>
> > > >>>> > > > > > > wrote:
> > > >>>> > > > > > >
> > > >>>> > > > > > >> Thanks for the clarification, Colin and Ismael.
> > > Personally
> > > >>>> I
> > > >>>> > also
> > > >>>> > > > feel
> > > >>>> > > > > > >> Option A is better to prioritize fixing the gap. Just
> > to
> > > be
> > > >>>> > clear,
> > > >>>> > > > the
> > > >>>> > > > > > >> proposed solution would be:
> > > >>>> > > > > > >>
> > > >>>> > > > > > >> 1. Bump the Metadata RPC version to return
> > > >>>> POLICY_VIOLATION. In
> > > >>>> > > the
> > > >>>> > > > > > >> application level, we should swap the error message
> > with
> > > >>>> the
> > > >>>> > > actual
> > > >>>> > > > > failure
> > > >>>> > > > > > >> reason such as "violation of topic creation policy
> when
> > > >>>> > attempting
> > > >>>> > > > to
> > > >>>> > > > > auto
> > > >>>> > > > > > >> create internal topic through MetadataRequest."
> > > >>>> > > > > > >>
> > > >>>> > > > > > >> 2. For older Metadata RPC, return
> AUTHORIZATION_FAILED
> > to
> > > >>>> fail
> > > >>>> > > fast.
> > > >>>> > > > > > >>
> > > >>>> > > > > > >> Will address our other discussed points as well in
> the
> > > >>>> KIP, let
> > > >>>> > me
> > > >>>> > > > > know
> > > >>>> > > > > > >> if you have further questions.
> > > >>>> > > > > > >>
> > > >>>> > > > > > >> Thanks,
> > > >>>> > > > > > >> Boyang
> > > >>>> > > > > > >>
> > > >>>> > > > > > >> On Tue, Jun 23, 2020 at 10:41 AM Ismael Juma <
> > > >>>> ism...@juma.me.uk
> > > >>>> > >
> > > >>>> > > > > wrote:
> > > >>>> > > > > > >>
> > > >>>> > > > > > >>> Option A is basically what I was thinking. But with
> a
> > > >>>> slight
> > > >>>> > > > > adjustment:
> > > >>>> > > > > > >>>
> > > >>>> > > > > > >>> New versions of MetadataResponse return
> > > POLICY_VIOLATION,
> > > >>>> old
> > > >>>> > > > > versions
> > > >>>> > > > > > >>> return AUTHORIZATION_FAILED. The latter works
> > correctly
> > > >>>> with
> > > >>>> > old
> > > >>>> > > > Java
> > > >>>> > > > > > >>> clients (i.e. the client fails fast and propagates
> the
> > > >>>> error),
> > > >>>> > > I've
> > > >>>> > > > > > >>> tested
> > > >>>> > > > > > >>> it. Adjust new clients to treat POLICY_VIOLATION
> like
> > > >>>> > > > > > >>> AUTHORIZATION_FAILED,
> > > >>>> > > > > > >>> but propagate the custom error message.
> > > >>>> > > > > > >>>
> > > >>>> > > > > > >>> Ismael
> > > >>>> > > > > > >>>
> > > >>>> > > > > > >>> On Mon, Jun 22, 2020 at 11:00 PM Colin McCabe <
> > > >>>> > > cmcc...@apache.org>
> > > >>>> > > > > > >>> wrote:
> > > >>>> > > > > > >>>
> > > >>>> > > > > > >>> > > > > On Fri, Jun 19, 2020 at 3:18 PM Ismael Juma
> <
> > > >>>> > > > > ism...@juma.me.uk>
> > > >>>> > > > > > >>> > wrote:
> > > >>>> > > > > > >>> > > > >
> > > >>>> > > > > > >>> > > > > > Hi Colin,
> > > >>>> > > > > > >>> > > > > >
> > > >>>> > > > > > >>> > > > > > The KIP states in the Compatibility
> section
> > > (not
> > > >>>> > Future
> > > >>>> > > > > work):
> > > >>>> > > > > > >>> > > > > >
> > > >>>> > > > > > >>> > > > > > "To support the proxy of requests, we need
> > to
> > > >>>> build a
> > > >>>> > > > > channel
> > > >>>> > > > > > >>> for
> > > >>>> > > > > > >>> > > > > > brokers to talk directly to the
> controller.
> > > >>>> This part
> > > >>>> > > of
> > > >>>> > > > > the
> > > >>>> > > > > > >>> design
> > > >>>> > > > > > >>> > > > > > is internal change only and won’t block
> the
> > > KIP
> > > >>>> > > > progress."
> > > >>>> > > > > > >>> > > > > >
> > > >>>> > > > > > >>> > > > > > I am clarifying that this is not internal
> > only
> > > >>>> due to
> > > >>>> > > the
> > > >>>> > > > > > >>> config.
> > > >>>> > > > > > >>> > If we
> > > >>>> > > > > > >>> > > > > > say that this KIP depends on another KIP
> > > before
> > > >>>> we
> > > >>>> > can
> > > >>>> > > > > merge
> > > >>>> > > > > > >>> > > > > > it, that's fine although it feels a bit
> > > >>>> unnecessary.
> > > >>>> > > > > > >>> > > > > >
> > > >>>> > > > > > >>> >
> > > >>>> > > > > > >>> > Hi Ismael,
> > > >>>> > > > > > >>> >
> > > >>>> > > > > > >>> > I didn't realize there was still a reference to
> the
> > > >>>> separate
> > > >>>> > > > > controller
> > > >>>> > > > > > >>> > channel in the "Compatibility, Deprecation, and
> > > >>>> Migration
> > > >>>> > Plan"
> > > >>>> > > > > > >>> section.  I
> > > >>>> > > > > > >>> > agree that it doesn't really belong there.  Given
> > that
> > > >>>> this
> > > >>>> > is
> > > >>>> > > > > creating
> > > >>>> > > > > > >>> > confusion, I would suggest that we just drop this
> > from
> > > >>>> the
> > > >>>> > KIP
> > > >>>> > > > > > >>> entirely.
> > > >>>> > > > > > >>> > It really is orthogonal to what this KIP is
> about--
> > we
> > > >>>> don't
> > > >>>> > > > need a
> > > >>>> > > > > > >>> > separate channel to implement redirection.
> > > >>>> > > > > > >>> >
> > > >>>> > > > > > >>> > Boyang wrote:
> > > >>>> > > > > > >>> >
> > > >>>> > > > > > >>> > >
> > > >>>> > > > > > >>> > > We are only opening the doors for specific
> > internal
> > > >>>> topics
> > > >>>> > > > > (offsets,
> > > >>>> > > > > > >>> txn
> > > >>>> > > > > > >>> > > log), which I assume the client should have no
> > > >>>> possibility
> > > >>>> > to
> > > >>>> > > > > mutate
> > > >>>> > > > > > >>> the
> > > >>>> > > > > > >>> > > topic policy?
> > > >>>> > > > > > >>> > >
> > > >>>> > > > > > >>> >
> > > >>>> > > > > > >>> > Hi Boyang,
> > > >>>> > > > > > >>> >
> > > >>>> > > > > > >>> > I think you and Ismael are talking about different
> > > >>>> scenarios.
> > > >>>> > > > You
> > > >>>> > > > > are
> > > >>>> > > > > > >>> > describing the scenario where the broker is
> > > >>>> auto-creating the
> > > >>>> > > > > > >>> transaction
> > > >>>> > > > > > >>> > log topic or consumer offset topic.  This scenario
> > > >>>> indeed
> > > >>>> > > should
> > > >>>> > > > > not
> > > >>>> > > > > > >>> happen
> > > >>>> > > > > > >>> > in a properly-configured cluster.  However, Ismael
> > is
> > > >>>> > > describing
> > > >>>> > > > a
> > > >>>> > > > > > >>> scenario
> > > >>>> > > > > > >>> > where the client is auto-creating some arbitrary
> > > >>>> non-internal
> > > >>>> > > > topic
> > > >>>> > > > > > >>> just by
> > > >>>> > > > > > >>> > sending a metadata request.
> > > >>>> > > > > > >>> >
> > > >>>> > > > > > >>> > As far as I can see, there are two solutions here:
> > > >>>> > > > > > >>> >
> > > >>>> > > > > > >>> > A. Close the hole in CreateTopicsPolicy
> immediately.
> > > >>>> In new
> > > >>>> > > > > versions,
> > > >>>> > > > > > >>> > allow MetadataResponse to return
> > AUTHORIZATION_FAILED
> > > >>>> if we
> > > >>>> > > tried
> > > >>>> > > > > to
> > > >>>> > > > > > >>> > auto-create a topic and failed.  Find some other
> > error
> > > >>>> code
> > > >>>> > to
> > > >>>> > > > > return
> > > >>>> > > > > > >>> for
> > > >>>> > > > > > >>> > existing versions.
> > > >>>> > > > > > >>> >
> > > >>>> > > > > > >>> > B. Keep the hole in CreateTopicsPolicy and add
> some
> > > >>>> > > configuration
> > > >>>> > > > > to
> > > >>>> > > > > > >>> allow
> > > >>>> > > > > > >>> > admins to gradually migrate to closing it.  In
> > > >>>> practice, this
> > > >>>> > > > > probably
> > > >>>> > > > > > >>> > means a configuration toggle that enables direct
> ZK
> > > >>>> access,
> > > >>>> > > that
> > > >>>> > > > > > >>> starts off
> > > >>>> > > > > > >>> > as enabled.  Then we can eventually default it to
> > > false
> > > >>>> and
> > > >>>> > > then
> > > >>>> > > > > > >>> remove it
> > > >>>> > > > > > >>> > entirely over time.
> > > >>>> > > > > > >>> >
> > > >>>> > > > > > >>> > best,
> > > >>>> > > > > > >>> > Colin
> > > >>>> > > > > > >>> >
> > > >>>> > > > > > >>>
> > > >>>> > > > > > >>
> > > >>>> > > > > >
> > > >>>> > > > >
> > > >>>> > > >
> > > >>>> > >
> > > >>>> >
> > > >>>>
> > > >>>
> > >
> >
>

Reply via email to