Hey Jason and Jun,

thanks for the reply. Actually after some offline discussion, we have seen
hassles around upgrading and downgrading RPCs during redirection, which is
an error-prone approach to coordinate all parties to choose the correct
version to handle. Alternatively, we propose to bring back the
EnvelopeRequest to solve this problem, by embedding the admin request data
inside the request with consistent version. The complete workflow looks
like:

1. broker authorizes all accesses and strips out rejected stuff
2. broker forwards envelope of authorized actions in envelope
3. controller checks cluster_action for envelope request
4. if check passes, then all actions in the request are assumed to be
authorized

Also we need to point out that we are not talking about going backwards.
This workflow restricts the Envelope RPC with cluster_action permission to
reduce the risk of impersonation at best effort. Additionally, we are not
proposing any incompatible changes such as principal serialization. We
shall still use the split and join semantic we built as of current to only
forward authenticated resources.

Let me know if this makes sense.

Boyang

On Thu, Sep 24, 2020 at 4:53 PM Jun Rao <j...@confluent.io> wrote:

> Hi, Jason,
>
> Yes, the most important thing is to be able to avoid two rolling restarts
> in the future. If we have a path to achieve that down the road, the changes
> here are fine.
>
> Thanks,
>
> Jun
>
> On Thu, Sep 24, 2020 at 3:20 PM Jason Gustafson <ja...@confluent.io>
> wrote:
>
> > > One of the goals of KIP-584 (feature versioning) is that we can get rid
> > of
> > IBP in the future. So does this change prevent us from removing IBP in
> the
> > future?
> >
> > That is a good question. I think the problem here is that request
> > forwarding puts an expectation on api version support which covers more
> > than one broker. This is why the normal ApiVersions behavior doesn't
> work.
> > I thought about this a bit and haven't come up with a good alternative.
> One
> > thought I've been considering is letting the controller in the
> post-kip-500
> > world set the maximum range of api support for the cluster. However, even
> > then we would need some way to tell when the controller quorum itself is
> > ready to enable support for a new api version. My feeling is that we will
> > probably always need something like the IBP to control when it is safe to
> > expose versions of APIs which have a cross-broker dependence. However,
> > KIP-584 would still allow us to manage the IBP at the level of a feature
> so
> > that we don't need two rolling restarts anymore.
> >
> > Best,
> > Jason
> >
> >
> >
> >
> > On Thu, Sep 24, 2020 at 1:59 PM Jun Rao <j...@confluent.io> wrote:
> >
> > > Hi, Boyang,
> > >
> > > One of the goals of KIP-584 (feature versioning) is that we can get rid
> > of
> > > IBP in the future. So does this change prevent us from removing IBP in
> > the
> > > future?
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Thu, Sep 24, 2020 at 12:46 PM Jason Gustafson <ja...@confluent.io>
> > > wrote:
> > >
> > > > Hey Boyang,
> > > >
> > > > Thanks for the update. This seems like the best thing we can do. The
> > > > alternative would be to always ensure that the forwarded APIs are
> safe
> > > for
> > > > conversion between versions, but that would restrict the flexibility
> > that
> > > > the versioning is providing. It would also be a large effort to avoid
> > > > introducing regressions through conversion. Sadly this broadens the
> > scope
> > > > of the IBP, but in fact forwarded APIs are inter-broker APIs.
> > > >
> > > > Thanks,
> > > > Jason
> > > >
> > > > On Thu, Sep 24, 2020 at 9:23 AM Boyang Chen <
> > reluctanthero...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hey there,
> > > > >
> > > > > we spotted a necessary case to handle the redirect request
> > versioning,
> > > > and
> > > > > proposed the following changes:
> > > > >
> > > > > 1. For redirection RPCs (AlterConfig, Acl, Token etc), the
> > > corresponding
> > > > > allowed versions in the ApiVersionResponse will be affected by the
> > > entire
> > > > > cluster's versioning, not just the receiving broker, since we need
> to
> > > > > ensure the chosen version get properly handled by all parties. Thus
> > > from
> > > > > now on, RPC with redirection will be treated as inter-broker RPC,
> and
> > > any
> > > > > version bump for these RPCs has to go through IBP bump as well.
> > > > > ApiVersionResponse will take IBP into considerations for the
> > > redirection
> > > > > RPCs allowable versions.
> > > > >
> > > > > 2. We would do the best effort to maintain the same request version
> > for
> > > > > the entire admin client -> receiving broker -> controller broker
> > path,
> > > > but
> > > > > for old RPC versions, they may not have flexible fields introduced
> > yet.
> > > > > Thus, we would have to upgrade the RPC to the minimum version which
> > > > > supports flexible fields
> > > > > and add another tagged field in the header called
> > > > `OriginalRequestVersion`
> > > > > to help the controller broker correctly deserialize the request
> with
> > > the
> > > > > original admin client sent out version. We would not downgrade the
> > > > original
> > > > > request in any circumstance, since the flexible field support is
> > > required
> > > > > to be open-ended on the high side.
> > > > >
> > > > > Let me know if you have any questions.
> > > > >
> > > > > Best,
> > > > > Boyang
> > > > >
> > > > > On Thu, Aug 6, 2020 at 6:11 PM Boyang Chen <
> > reluctanthero...@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Hey there,
> > > > > >
> > > > > > we are going to introduce a minor change to bump the version of
> > > several
> > > > > > RPCs which are currently not supporting flexible versions. It is
> > > > > necessary
> > > > > > because they need to be able to construct request header with
> > initial
> > > > > > principal name and client id as optional fields for redirection.
> > The
> > > > are
> > > > > > only two of them:
> > > > > >
> > > > > > 1. AlterConfig
> > > > > > 2. AlterClientQuotas
> > > > > >
> > > > > > Let me know if you have any questions.
> > > > > >
> > > > > > Boyang
> > > > > >
> > > > > > On Fri, Jul 31, 2020 at 11:42 AM Boyang Chen <
> > > > reluctanthero...@gmail.com
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > >> Hey David,
> > > > > >>
> > > > > >> After discussing with Colin offline, I would like to correct one
> > > case
> > > > in
> > > > > >> the described workflow, where the CLUSTER_ACTION authorization
> > would
> > > > > not be
> > > > > >> based on the initial principal field check, because it is not a
> > > > secured
> > > > > >> condition which anyone could forge. The revised workflow shall
> be:
> > > > > >>
> > > > > >> Step 1. Filter out resources that are authorized
> > > > > >>          1.1 Use traditional principals to verify first. If
> > > > authorized,
> > > > > >> continue
> > > > > >>          1.2 If not authorized, check whether the request is
> from
> > > the
> > > > > >> control plane. Note that this is a best-effort to verify whether
> > the
> > > > > >> request is internal.
> > > > > >>          1.3 If the request is not from the control plane,
> return
> > > > > >> authorization failure
> > > > > >>          1.4 If the request is from the control plane, use
> > > > > CLUSTER_ACTION
> > > > > >> to verify and determine the result
> > > > > >>
> > > > > >> Step 2. Check the request context to see if this is a forwarding
> > > > > request,
> > > > > >> by checking whether it is from control plane and uses extra
> header
> > > > > fields
> > > > > >>         2.1 if the resource is authorized, and if this is the
> > active
> > > > > >> controller, process it
> > > > > >>         2.2 if the resource is authorized but this is not the
> > active
> > > > > >> controller, return NOT_CONTROLLER to the sender (forwarding
> > broker)
> > > > for
> > > > > >> retry
> > > > > >>         2.3 if the resource is not authorized, return
> > > > > >> CLUSTER_AUTHORIZATION_FAILURE to propagate back to the original
> > > client
> > > > > >> through forwarding broker
> > > > > >> Step 3. If the request is not a forwarding request
> > > > > >>         3.1 If the resource is authorized, and this is the
> active
> > > > > >> controller, process it
> > > > > >>         3.2 If the resource is authorized, but this is not
> active
> > > > > >> controller, put the resource into the preparation for a new
> > > > AlterConfig
> > > > > >> request for forwarding
> > > > > >>         3.3 If the resource is not authorized, reply the
> original
> > > > client
> > > > > >> AUTHORIZATION_FAILURE when the forwarding request is returned
> > > > > >>
> > > > > >> On Thu, Jul 30, 2020 at 3:47 PM Boyang Chen <
> > > > reluctanthero...@gmail.com
> > > > > >
> > > > > >> wrote:
> > > > > >>
> > > > > >>>
> > > > > >>>
> > > > > >>> On Thu, Jul 30, 2020 at 7:18 AM David Jacot <
> dja...@confluent.io
> > >
> > > > > wrote:
> > > > > >>>
> > > > > >>>> Hi Boyang,
> > > > > >>>>
> > > > > >>>> Thanks for your answers.
> > > > > >>>>
> > > > > >>>> > The point for using the listener name is more of a security
> > > > purpose,
> > > > > >>>> to
> > > > > >>>> > detect any forged request to our best effort.
> > > > > >>>> > For throttling I think we could just check the request
> header
> > > for
> > > > > >>>> > *InitialClientId* existence, to distinguish whether to apply
> > > > > >>>> > throttling strategy as forwarded request or direct request.
> > > > > >>>>
> > > > > >>>> Reading "security" and "best effort" in the same sentence
> makes
> > > me a
> > > > > >>>> little nervous :).
> > > > > >>>>
> > > > > >>>> The identification issue is also valid for quota as we don't
> > want
> > > > one
> > > > > >>>> to be
> > > > > >>>> able to bypass the quota by forging a request as well, isn't
> it?
> > > > > >>>> Otherwise,
> > > > > >>>> anyone could just set the InitialPrincipal to bypass it. I
> think
> > > > that
> > > > > we
> > > > > >>>> should
> > > > > >>>> only use InitialPrincipal and/or InitialClientId when we know
> > that
> > > > > they
> > > > > >>>> come
> > > > > >>>> from another broker. Based on what I read in the KIP, it looks
> > > like
> > > > we
> > > > > >>>> could
> > > > > >>>> only use them when the principal has CLUSTER_ACTION privilege.
> > Do
> > > I
> > > > > >>>> understand it correctly?
> > > > > >>>>
> > > > > >>>> There is no 100% safe way to distinguish between raw request
> and
> > > > > >>> forward request if you take malicious request into
> consideration,
> > > > which
> > > > > >>> could happen
> > > > > >>> anyway if the attacker prepares those requests to pass the
> checks
> > > > for a
> > > > > >>> pre-KIP-500 cluster.
> > > > > >>> We could at most know whether it is sent to the control plane,
> or
> > > the
> > > > > >>> data plane, and whether it has extra header fields such as
> > initial
> > > > > >>> principal and client id defined. For a non-malicious request
> > > > > >>> going into the control plane, it must be sent from a valid
> > broker,
> > > > > which
> > > > > >>> is a prerequisite to check its
> > > > > >>> CLUSTER_ACTION principal. Take AlterConfig for an example, the
> > > > intended
> > > > > >>> workflow for a *KIP-590 broker* would be:
> > > > > >>>
> > > > > >>> Step 1. Check the request context to see if this is a
> forwarding
> > > > > >>> request, by checking whether it is from control plane and uses
> > > extra
> > > > > header
> > > > > >>> fields
> > > > > >>>         1.1 if it is a forwarding request, use CLUSTER_ACTION
> to
> > > > verify
> > > > > >>> the given resource
> > > > > >>>         1.2 if the resource is authorized, and if this is the
> > > active
> > > > > >>> controller, process it
> > > > > >>>         1.3 if the resource is authorized but this is not the
> > > active
> > > > > >>> controller, return NOT_CONTROLLER to the sender (forwarding
> > broker)
> > > > for
> > > > > >>> retry
> > > > > >>>         1.4 if the resource is not authorized, return
> > > > > >>> CLUSTER_AUTHORIZATION_FAILURE to propagate back to the original
> > > > client
> > > > > >>> through forwarding broker
> > > > > >>> Step 2. If the request is not a forwarding request
> > > > > >>>         2.1 Verify with normal principal as ALTER on the given
> > > > > resources
> > > > > >>>         2.2 If the resource is authorized, and this is the
> active
> > > > > >>> controller, process it
> > > > > >>>         2.3 If the resource is authorized, but this is not
> active
> > > > > >>> controller, put the resource into the preparation for a new
> > > > AlterConfig
> > > > > >>> request for forwarding
> > > > > >>>         2.4 If the resource is not authorized, reply the
> original
> > > > > client
> > > > > >>> AUTHORIZATION_FAILURE when the forwarding request is returned
> > > > > >>>
> > > > > >>> When the control plane and data plane are using the same
> > listener,
> > > we
> > > > > >>> couldn't distinguish whether a request is forwarded or not for
> > > sure,
> > > > > so in
> > > > > >>> terms of the forward request checking, we have to require the
> > extra
> > > > > header
> > > > > >>> fields to present. A stronger checking mechanism could only be
> > > > enforced
> > > > > >>> when we upgrade to post-KIP-500 with a separate controller
> > network.
> > > > > >>>
> > > > > >>>
> > > > > >>>> I have made another pass on the whole KIP, I have few nits:
> > > > > >>>>
> > > > > >>>> - The sentence "Take AlterConfig as an example to understand
> the
> > > > > changes
> > > > > >>>> we are making." does not make much sense anymore in the
> > beginning
> > > of
> > > > > the
> > > > > >>>> "Proposed Changes" chapter.
> > > > > >>>>
> > > > > >>>> Sure, deleted.
> > > > > >>>
> > > > > >>>
> > > > > >>>> - When you say "Existing RPCs which are sending directly to
> the
> > > > > >>>> controller
> > > > > >>>> will
> > > > > >>>> rely on forwarding as well.". I suggest to explicitly mention
> > how
> > > > "old
> > > > > >>>> admin clients"
> > > > > >>>> will work here to complement the sentence. Something like:
> They
> > > will
> > > > > >>>> get a
> > > > > >>>> random
> > > > > >>>> broker id as the controller id in the metadata response and
> > stick
> > > to
> > > > > it
> > > > > >>>> as
> > > > > >>>> you explained.
> > > > > >>>>
> > > > > >>>> Sg, changed.
> > > > > >>>
> > > > > >>>
> > > > > >>>> - "The purpose of adding principal name is for the audit
> > logging,
> > > > and
> > > > > >>>> the
> > > > > >>>> client id is
> > > > > >>>> being used to throttling according to KIP-599 requirement."
> > > > Actually,
> > > > > >>>> KIP-599 needs
> > > > > >>>> both the principal and the clientId.
> > > > > >>>>
> > > > > >>>> Makes sense.
> > > > > >>>
> > > > > >>>
> > > > > >>>> - In the "Routing Request Security" chapter. It is written
> that
> > > the
> > > > > >>>> forwarding broker
> > > > > >>>> will verify the request with its own authorizer and will just
> > > > forward
> > > > > >>>> it if
> > > > > >>>> the request
> > > > > >>>> looks good. When a request contains for instance multiple
> > topics,
> > > I
> > > > > >>>> suppose
> > > > > >>>> that
> > > > > >>>> we will forward only the authorized ones and not the whole
> > > original
> > > > > >>>> request
> > > > > >>>> as is.
> > > > > >>>> We may want to reword the sentence to make this clear.
> > > > > >>>>
> > > > > >>>> That makes sense, I will put this more detailed version of
> > routing
> > > > > into
> > > > > >>> the design.
> > > > > >>>
> > > > > >>>
> > > > > >>>> - For the record, should we put the previous proposal in the
> > > > rejected
> > > > > >>>> alternatives as
> > > > > >>>> well?
> > > > > >>>>
> > > > > >>>> We do have big changes in this KIP, our current strategy is to
> > > rely
> > > > on
> > > > > >>> wiki revisions if people
> > > > > >>> are interested to figure out our previous design strategy.
> > Putting
> > > > the
> > > > > >>> full design in current KIP proposal
> > > > > >>> would distract too much for readers as we discussed during the
> > last
> > > > > time
> > > > > >>> of big refactoring.
> > > > > >>>
> > > > > >>> Best,
> > > > > >>>> David
> > > > > >>>>
> > > > > >>>> On Thu, Jul 30, 2020 at 3:51 AM Boyang Chen <
> > > > > reluctanthero...@gmail.com
> > > > > >>>> >
> > > > > >>>> wrote:
> > > > > >>>>
> > > > > >>>> > Thanks David for the feedback!
> > > > > >>>> >
> > > > > >>>> > On Wed, Jul 29, 2020 at 7:53 AM David Jacot <
> > > dja...@confluent.io>
> > > > > >>>> wrote:
> > > > > >>>> >
> > > > > >>>> > > Hi, Colin, Boyang,
> > > > > >>>> > >
> > > > > >>>> > > Colin, thanks for the clarification. Somehow, I thought
> that
> > > > even
> > > > > >>>> if the
> > > > > >>>> > > controller is ran independently, it
> > > > > >>>> > > would still run the listeners of the broker and thus would
> > be
> > > > > >>>> accessible
> > > > > >>>> > by
> > > > > >>>> > > redirecting on the loopback
> > > > > >>>> > > interface. My mistake.
> > > > > >>>> > >
> > > > > >>>> > > Boyang, I have few questions/comments regarding the
> updated
> > > KIP:
> > > > > >>>> > >
> > > > > >>>> > > 1. I think that it would be great if we could clarify how
> > old
> > > > > admin
> > > > > >>>> > clients
> > > > > >>>> > > which are directly talking to the
> > > > > >>>> > > controller will work with this KIP. I read between the
> lines
> > > > that,
> > > > > >>>> as we
> > > > > >>>> > > propose to provide a random
> > > > > >>>> > > broker Id as the controller Id in the metadata response,
> > they
> > > > will
> > > > > >>>> use a
> > > > > >>>> > > single node as a proxy. Is that
> > > > > >>>> > > correct? This deserves to be called out more explicitly in
> > the
> > > > > >>>> design
> > > > > >>>> > > section instead of being hidden
> > > > > >>>> > > in the protocol bump of the metadata RPC.
> > > > > >>>> > >
> > > > > >>>> > > Makes sense, I stress this point in the compatibility
> > section.
> > > > > >>>> >
> > > > > >>>> >
> > > > > >>>> > > 1.1 If I understand correctly, could we assume that old
> > admin
> > > > > >>>> clients
> > > > > >>>> > will
> > > > > >>>> > > stick to the same "fake controller"
> > > > > >>>> > > until they refresh their metadata? Refreshing the metadata
> > > > usually
> > > > > >>>> > happens
> > > > > >>>> > > when NOT_CONTROLLER
> > > > > >>>> > > is received but this won't happen anymore so they should
> > > change
> > > > > >>>> > > infrequently.
> > > > > >>>> > >
> > > > > >>>> > > That is correct, old admin clients would not try to
> refresh
> > > > their
> > > > > >>>> > metadata
> > > > > >>>> > due to NOT_CONTROLLER,
> > > > > >>>> > which is impossible to happen with the new broker cluster.
> > > > > >>>> >
> > > > > >>>> >
> > > > > >>>> > > 2. For the new admin client, I suppose that we plan on
> using
> > > > > >>>> > > LeastLoadedNodeProvider for the
> > > > > >>>> > > requests that are using ControllerNodeProvider. We could
> > > perhaps
> > > > > >>>> mention
> > > > > >>>> > > it.
> > > > > >>>> > >
> > > > > >>>> > > Sure, added.
> > > > > >>>> >
> > > > > >>>> >
> > > > > >>>> > > 3. Pre KIP-500, will we have a way to distinguish if a
> > request
> > > > > that
> > > > > >>>> is
> > > > > >>>> > > received by the controller is
> > > > > >>>> > > coming directly from a client or from a broker? You
> mention
> > > that
> > > > > the
> > > > > >>>> > > listener can be used to do
> > > > > >>>> > > this but as you pointed out, it is not mandatory. Do we
> have
> > > > > another
> > > > > >>>> > > reliable method? I am asking
> > > > > >>>> > > in the context of KIP-599 with the current controller, we
> > may
> > > > need
> > > > > >>>> to
> > > > > >>>> > > throttle differently if the
> > > > > >>>> > > request comes from a client or from a broker.
> > > > > >>>> > >
> > > > > >>>> > > The point for using the listener name is more of a
> security
> > > > > >>>> purpose, to
> > > > > >>>> > detect any forged request to our best effort.
> > > > > >>>> > For throttling I think we could just check the request
> header
> > > for
> > > > > >>>> > *InitialClientId* existence, to distinguish whether to apply
> > > > > >>>> > throttling strategy as forwarded request or direct request.
> > > > > >>>> >
> > > > > >>>> >
> > > > > >>>> > > 4. Could we add `InitialClientId` as well? This will be
> > > required
> > > > > >>>> for the
> > > > > >>>> > > quota as we can apply them
> > > > > >>>> > > by principal and/or clientId.
> > > > > >>>> > >
> > > > > >>>> > > Sounds good, added.
> > > > > >>>> >
> > > > > >>>> >
> > > > > >>>> > > 5. A small remark regarding the structure of the KIP. It
> is
> > a
> > > > bit
> > > > > >>>> weird
> > > > > >>>> > > that requests that do not go
> > > > > >>>> > > to the controller are mentioned in the Proposed Design
> > section
> > > > and
> > > > > >>>> the
> > > > > >>>> > > requests that go to the
> > > > > >>>> > > controller are mentioned in the Public Interfaces. When
> one
> > > read
> > > > > the
> > > > > >>>> > > Proposed Design, it does not
> > > > > >>>> > > get a full picture of the whole new routing proposal for
> old
> > > and
> > > > > new
> > > > > >>>> > > clients. It would be great if we
> > > > > >>>> > > could have a full overview in that section.
> > > > > >>>> > >
> > > > > >>>> > > Good point, I will move the pieces around.
> > > > > >>>> >
> > > > > >>>> >
> > > > > >>>> > > Overall the change makes sense to me. I will work on
> > drafting
> > > an
> > > > > >>>> addendum
> > > > > >>>> > > to KIP-599 to
> > > > > >>>> > > alter the design to cope with these changes. At a first
> > > glance,
> > > > > that
> > > > > >>>> > seems
> > > > > >>>> > > doable if 1.1, 3
> > > > > >>>> > > and 4 are OK.
> > > > > >>>> > >
> > > > > >>>> > > Thank you for the help!
> > > > > >>>> >
> > > > > >>>> >
> > > > > >>>> > > Thanks,
> > > > > >>>> > > David
> > > > > >>>> > >
> > > > > >>>> > > On Wed, Jul 29, 2020 at 5:29 AM Boyang Chen <
> > > > > >>>> reluctanthero...@gmail.com>
> > > > > >>>> > > wrote:
> > > > > >>>> > >
> > > > > >>>> > > > Thanks for the feedback Colin!
> > > > > >>>> > > >
> > > > > >>>> > > > On Tue, Jul 28, 2020 at 2:11 PM Colin McCabe <
> > > > > cmcc...@apache.org>
> > > > > >>>> > wrote:
> > > > > >>>> > > >
> > > > > >>>> > > > > Hi Boyang,
> > > > > >>>> > > > >
> > > > > >>>> > > > > Thanks for updating this.  A few comments below:
> > > > > >>>> > > > >
> > > > > >>>> > > > > In the "Routing Request Security" section, there is a
> > > > > reference
> > > > > >>>> to
> > > > > >>>> > > "older
> > > > > >>>> > > > > requests that need redirection."  But after these new
> > > > > >>>> revisions, both
> > > > > >>>> > > new
> > > > > >>>> > > > > and old requests need redirection.  So we should
> > rephrase
> > > > > this.
> > > > > >>>> > > > >
> > > > > >>>> > > > > > In addition, to avoid exposing this forwarding power
> > to
> > > > the
> > > > > >>>> admin
> > > > > >>>> > > > > clients,
> > > > > >>>> > > > > > the routing request shall be forwarded towards the
> > > > > controller
> > > > > >>>> > broker
> > > > > >>>> > > > > internal
> > > > > >>>> > > > > > endpoint which should be only visible to other
> brokers
> > > > > inside
> > > > > >>>> the
> > > > > >>>> > > > > cluster
> > > > > >>>> > > > > > in the KIP-500 controller. Any admin configuration
> > > request
> > > > > >>>> with
> > > > > >>>> > > broker
> > > > > >>>> > > > > > principal should not be going through the public
> > > endpoint
> > > > > and
> > > > > >>>> will
> > > > > >>>> > be
> > > > > >>>> > > > > > rejected for security purpose.
> > > > > >>>> > > > >
> > > > > >>>> > > > > We should also describe how this will work in the
> > > > pre-KIP-500
> > > > > >>>> case.
> > > > > >>>> > In
> > > > > >>>> > > > > that case, CLUSTER_ACTION gets the extra permissions
> > > > described
> > > > > >>>> here
> > > > > >>>> > > only
> > > > > >>>> > > > > when the message comes in on the inter-broker
> listener.
> > > We
> > > > > >>>> should
> > > > > >>>> > > state
> > > > > >>>> > > > > that here.
> > > > > >>>> > > > >
> > > > > >>>> > > > > (I can see that you have this information later on in
> > the
> > > > > >>>> "Security
> > > > > >>>> > > > Access
> > > > > >>>> > > > > Changes" section, but it would be good to have it here
> > as
> > > > > well,
> > > > > >>>> to
> > > > > >>>> > > avoid
> > > > > >>>> > > > > confusion.)
> > > > > >>>> > > > >
> > > > > >>>> > > > > > To be more strict of protecting controller
> > information,
> > > > the
> > > > > >>>> > > > > "ControllerId"
> > > > > >>>> > > > > > field in new MetadataResponse shall be set to -1
> when
> > > the
> > > > > >>>> original
> > > > > >>>> > > > > request
> > > > > >>>> > > > > > comes from a non-broker client and it is already on
> > v10.
> > > > We
> > > > > >>>> shall
> > > > > >>>> > use
> > > > > >>>> > > > the
> > > > > >>>> > > > > > request listener name to distinguish whether a given
> > > > request
> > > > > >>>> is
> > > > > >>>> > > > > inter-broker,
> > > > > >>>> > > > > > or from the client.
> > > > > >>>> > > > >
> > > > > >>>> > > > > I'm not sure why we would need to distinguish between
> > > broker
> > > > > >>>> clients
> > > > > >>>> > > and
> > > > > >>>> > > > > non-broker clients.  Brokers don't generally send
> > > > > >>>> MetadataRequests to
> > > > > >>>> > > > other
> > > > > >>>> > > > > brokers, do they?  Brokers learn about metadata from
> > > > > >>>> > > > UpdateMetadataRequest
> > > > > >>>> > > > > and LeaderAndIsrRequest, not by sending
> MetadataRequests
> > > to
> > > > > >>>> other
> > > > > >>>> > > > brokers.
> > > > > >>>> > > > >
> > > > > >>>> > > > > We do have one use case where the MetadataRequest gets
> > > sent
> > > > > >>>> between
> > > > > >>>> > the
> > > > > >>>> > > > brokers, which is the InterBrokerSendThread. Currently
> we
> > > > don't
> > > > > >>>> rely on
> > > > > >>>> > > it
> > > > > >>>> > > > to get the controller id, so I guess your suggestion
> > should
> > > be
> > > > > >>>> good to
> > > > > >>>> > > > enforce. We could use some meta comment on the
> > NetworkClient
> > > > > that
> > > > > >>>> it
> > > > > >>>> > > should
> > > > > >>>> > > > not be used to get the controller location.
> > > > > >>>> > > >
> > > > > >>>> > > > Probably what we want here is: v0-v9: return a random
> > broker
> > > > in
> > > > > >>>> the
> > > > > >>>> > > cluster
> > > > > >>>> > > > > as the controller ID.  v10: no controllerID present in
> > the
> > > > > >>>> > > > > MetadataResponse.  We should also deprecate the
> > > adminClient
> > > > > >>>> method
> > > > > >>>> > > which
> > > > > >>>> > > > > gets the controllerId.
> > > > > >>>> > > > >
> > > > > >>>> > > > > > BROKER_AUTHORIZATION_FAILURE(92, "Authorization
> failed
> > > for
> > > > > the
> > > > > >>>> > > > > > request during forwarding, this indicates an
> internal
> > > > error
> > > > > >>>> on the
> > > > > >>>> > > > broker
> > > > > >>>> > > > > > cluster security setup.",
> > > > > >>>> > BrokerAuthorizationFailureException::new);
> > > > > >>>> > > > >
> > > > > >>>> > > > > Grammar nitpick: It would be good to have a period
> > between
> > > > > >>>> > "forwarding"
> > > > > >>>> > > > > and "this" to avoid a run-on sentence :)
> > > > > >>>> > > > >
> > > > > >>>> > > > > best,
> > > > > >>>> > > > > Colin
> > > > > >>>> > > > >
> > > > > >>>> > > > >
> > > > > >>>> > > > > On Mon, Jul 27, 2020, at 21:47, Boyang Chen wrote:
> > > > > >>>> > > > > > Hey there,
> > > > > >>>> > > > > >
> > > > > >>>> > > > > > I'm re-opening this thread because after some
> initial
> > > > > >>>> > implementations
> > > > > >>>> > > > > > started, we spotted some gaps in the approved KIP as
> > > well
> > > > as
> > > > > >>>> some
> > > > > >>>> > > > > > inconsistencies with KIP-631 controller. There are a
> > > > couple
> > > > > of
> > > > > >>>> > > > addendums
> > > > > >>>> > > > > to
> > > > > >>>> > > > > > the existing KIP, specifically:
> > > > > >>>> > > > > >
> > > > > >>>> > > > > > 1. As the controller is foreseen to be only
> accessible
> > > to
> > > > > the
> > > > > >>>> > > brokers,
> > > > > >>>> > > > > the
> > > > > >>>> > > > > > new admin client would not have direct access to the
> > > > > >>>> controller. It
> > > > > >>>> > > is
> > > > > >>>> > > > > > guaranteed on the MetadataResponse level which no
> > longer
> > > > > >>>> provides
> > > > > >>>> > > > > > `ControllerId` to client side requests.
> > > > > >>>> > > > > >
> > > > > >>>> > > > > > 2. The broker would forward any direct ZK path
> > mutation
> > > > > >>>> requests,
> > > > > >>>> > > > > including
> > > > > >>>> > > > > > topic creation/deletion, reassignment, etc since we
> > > > > deprecate
> > > > > >>>> the
> > > > > >>>> > > > direct
> > > > > >>>> > > > > > controller access on the client side. No more
> protocol
> > > > > >>>> version bump
> > > > > >>>> > > is
> > > > > >>>> > > > > > necessary for the configuration requests.
> > > > > >>>> > > > > >
> > > > > >>>> > > > > > 3. To make sure forwarding requests pass the
> > > > authorization,
> > > > > >>>> broker
> > > > > >>>> > > > > > principal CLUSTER_ACTION would be allowed to be used
> > as
> > > an
> > > > > >>>> > > alternative
> > > > > >>>> > > > > > authentication method for a variety of principal
> > > > operations,
> > > > > >>>> > > including
> > > > > >>>> > > > > > ALTER, ALTER_CONFIG, DELETE, etc. It is because the
> > > > > forwarding
> > > > > >>>> > > request
> > > > > >>>> > > > > > needs to use the proxy broker's own principal, which
> > is
> > > > > >>>> currently
> > > > > >>>> > not
> > > > > >>>> > > > > > supported to be used for many configuration change
> > > > > >>>> authentication
> > > > > >>>> > > > listed
> > > > > >>>> > > > > > above. The full list could be found in the KIP.
> > > > > >>>> > > > > >
> > > > > >>>> > > > > > 4. Add a new BROKER_AUTHORIZATION_FAILURE error code
> > to
> > > > > >>>> indicate
> > > > > >>>> > any
> > > > > >>>> > > > > > internal security configuration failure, when the
> > > > forwarded
> > > > > >>>> request
> > > > > >>>> > > > > failed
> > > > > >>>> > > > > > authentication on the controller side.
> > > > > >>>> > > > > >
> > > > > >>>> > > > > > Let me know what you think. With such a major
> > refinement
> > > > of
> > > > > >>>> the
> > > > > >>>> > KIP,
> > > > > >>>> > > > I'm
> > > > > >>>> > > > > > open for re-vote after discussions converge.
> > > > > >>>> > > > > >
> > > > > >>>> > > > > > Boyang
> > > > > >>>> > > > > >
> > > > > >>>> > > > > > On Wed, Jul 1, 2020 at 2:17 PM Boyang Chen <
> > > > > >>>> > > reluctanthero...@gmail.com
> > > > > >>>> > > > >
> > > > > >>>> > > > > > wrote:
> > > > > >>>> > > > > >
> > > > > >>>> > > > > > > Hey folks,
> > > > > >>>> > > > > > >
> > > > > >>>> > > > > > > I have also synced on the KIP-578 which was doing
> > the
> > > > > >>>> partition
> > > > > >>>> > > > limit,
> > > > > >>>> > > > > to
> > > > > >>>> > > > > > > make sure the partition limit error code would be
> > > > properly
> > > > > >>>> > > propagated
> > > > > >>>> > > > > once
> > > > > >>>> > > > > > > it is done on top of KIP-590. Let me know if you
> > have
> > > > > >>>> further
> > > > > >>>> > > > > questions or
> > > > > >>>> > > > > > > concerns.
> > > > > >>>> > > > > > >
> > > > > >>>> > > > > > > Boyang
> > > > > >>>> > > > > > >
> > > > > >>>> > > > > > > On Tue, Jun 23, 2020 at 5:08 PM Boyang Chen <
> > > > > >>>> > > > > reluctanthero...@gmail.com>
> > > > > >>>> > > > > > > wrote:
> > > > > >>>> > > > > > >
> > > > > >>>> > > > > > >> Thanks for the clarification, Colin and Ismael.
> > > > > Personally
> > > > > >>>> I
> > > > > >>>> > also
> > > > > >>>> > > > feel
> > > > > >>>> > > > > > >> Option A is better to prioritize fixing the gap.
> > Just
> > > > to
> > > > > be
> > > > > >>>> > clear,
> > > > > >>>> > > > the
> > > > > >>>> > > > > > >> proposed solution would be:
> > > > > >>>> > > > > > >>
> > > > > >>>> > > > > > >> 1. Bump the Metadata RPC version to return
> > > > > >>>> POLICY_VIOLATION. In
> > > > > >>>> > > the
> > > > > >>>> > > > > > >> application level, we should swap the error
> message
> > > > with
> > > > > >>>> the
> > > > > >>>> > > actual
> > > > > >>>> > > > > failure
> > > > > >>>> > > > > > >> reason such as "violation of topic creation
> policy
> > > when
> > > > > >>>> > attempting
> > > > > >>>> > > > to
> > > > > >>>> > > > > auto
> > > > > >>>> > > > > > >> create internal topic through MetadataRequest."
> > > > > >>>> > > > > > >>
> > > > > >>>> > > > > > >> 2. For older Metadata RPC, return
> > > AUTHORIZATION_FAILED
> > > > to
> > > > > >>>> fail
> > > > > >>>> > > fast.
> > > > > >>>> > > > > > >>
> > > > > >>>> > > > > > >> Will address our other discussed points as well
> in
> > > the
> > > > > >>>> KIP, let
> > > > > >>>> > me
> > > > > >>>> > > > > know
> > > > > >>>> > > > > > >> if you have further questions.
> > > > > >>>> > > > > > >>
> > > > > >>>> > > > > > >> Thanks,
> > > > > >>>> > > > > > >> Boyang
> > > > > >>>> > > > > > >>
> > > > > >>>> > > > > > >> On Tue, Jun 23, 2020 at 10:41 AM Ismael Juma <
> > > > > >>>> ism...@juma.me.uk
> > > > > >>>> > >
> > > > > >>>> > > > > wrote:
> > > > > >>>> > > > > > >>
> > > > > >>>> > > > > > >>> Option A is basically what I was thinking. But
> > with
> > > a
> > > > > >>>> slight
> > > > > >>>> > > > > adjustment:
> > > > > >>>> > > > > > >>>
> > > > > >>>> > > > > > >>> New versions of MetadataResponse return
> > > > > POLICY_VIOLATION,
> > > > > >>>> old
> > > > > >>>> > > > > versions
> > > > > >>>> > > > > > >>> return AUTHORIZATION_FAILED. The latter works
> > > > correctly
> > > > > >>>> with
> > > > > >>>> > old
> > > > > >>>> > > > Java
> > > > > >>>> > > > > > >>> clients (i.e. the client fails fast and
> propagates
> > > the
> > > > > >>>> error),
> > > > > >>>> > > I've
> > > > > >>>> > > > > > >>> tested
> > > > > >>>> > > > > > >>> it. Adjust new clients to treat POLICY_VIOLATION
> > > like
> > > > > >>>> > > > > > >>> AUTHORIZATION_FAILED,
> > > > > >>>> > > > > > >>> but propagate the custom error message.
> > > > > >>>> > > > > > >>>
> > > > > >>>> > > > > > >>> Ismael
> > > > > >>>> > > > > > >>>
> > > > > >>>> > > > > > >>> On Mon, Jun 22, 2020 at 11:00 PM Colin McCabe <
> > > > > >>>> > > cmcc...@apache.org>
> > > > > >>>> > > > > > >>> wrote:
> > > > > >>>> > > > > > >>>
> > > > > >>>> > > > > > >>> > > > > On Fri, Jun 19, 2020 at 3:18 PM Ismael
> > Juma
> > > <
> > > > > >>>> > > > > ism...@juma.me.uk>
> > > > > >>>> > > > > > >>> > wrote:
> > > > > >>>> > > > > > >>> > > > >
> > > > > >>>> > > > > > >>> > > > > > Hi Colin,
> > > > > >>>> > > > > > >>> > > > > >
> > > > > >>>> > > > > > >>> > > > > > The KIP states in the Compatibility
> > > section
> > > > > (not
> > > > > >>>> > Future
> > > > > >>>> > > > > work):
> > > > > >>>> > > > > > >>> > > > > >
> > > > > >>>> > > > > > >>> > > > > > "To support the proxy of requests, we
> > need
> > > > to
> > > > > >>>> build a
> > > > > >>>> > > > > channel
> > > > > >>>> > > > > > >>> for
> > > > > >>>> > > > > > >>> > > > > > brokers to talk directly to the
> > > controller.
> > > > > >>>> This part
> > > > > >>>> > > of
> > > > > >>>> > > > > the
> > > > > >>>> > > > > > >>> design
> > > > > >>>> > > > > > >>> > > > > > is internal change only and won’t
> block
> > > the
> > > > > KIP
> > > > > >>>> > > > progress."
> > > > > >>>> > > > > > >>> > > > > >
> > > > > >>>> > > > > > >>> > > > > > I am clarifying that this is not
> > internal
> > > > only
> > > > > >>>> due to
> > > > > >>>> > > the
> > > > > >>>> > > > > > >>> config.
> > > > > >>>> > > > > > >>> > If we
> > > > > >>>> > > > > > >>> > > > > > say that this KIP depends on another
> KIP
> > > > > before
> > > > > >>>> we
> > > > > >>>> > can
> > > > > >>>> > > > > merge
> > > > > >>>> > > > > > >>> > > > > > it, that's fine although it feels a
> bit
> > > > > >>>> unnecessary.
> > > > > >>>> > > > > > >>> > > > > >
> > > > > >>>> > > > > > >>> >
> > > > > >>>> > > > > > >>> > Hi Ismael,
> > > > > >>>> > > > > > >>> >
> > > > > >>>> > > > > > >>> > I didn't realize there was still a reference
> to
> > > the
> > > > > >>>> separate
> > > > > >>>> > > > > controller
> > > > > >>>> > > > > > >>> > channel in the "Compatibility, Deprecation,
> and
> > > > > >>>> Migration
> > > > > >>>> > Plan"
> > > > > >>>> > > > > > >>> section.  I
> > > > > >>>> > > > > > >>> > agree that it doesn't really belong there.
> > Given
> > > > that
> > > > > >>>> this
> > > > > >>>> > is
> > > > > >>>> > > > > creating
> > > > > >>>> > > > > > >>> > confusion, I would suggest that we just drop
> > this
> > > > from
> > > > > >>>> the
> > > > > >>>> > KIP
> > > > > >>>> > > > > > >>> entirely.
> > > > > >>>> > > > > > >>> > It really is orthogonal to what this KIP is
> > > about--
> > > > we
> > > > > >>>> don't
> > > > > >>>> > > > need a
> > > > > >>>> > > > > > >>> > separate channel to implement redirection.
> > > > > >>>> > > > > > >>> >
> > > > > >>>> > > > > > >>> > Boyang wrote:
> > > > > >>>> > > > > > >>> >
> > > > > >>>> > > > > > >>> > >
> > > > > >>>> > > > > > >>> > > We are only opening the doors for specific
> > > > internal
> > > > > >>>> topics
> > > > > >>>> > > > > (offsets,
> > > > > >>>> > > > > > >>> txn
> > > > > >>>> > > > > > >>> > > log), which I assume the client should have
> no
> > > > > >>>> possibility
> > > > > >>>> > to
> > > > > >>>> > > > > mutate
> > > > > >>>> > > > > > >>> the
> > > > > >>>> > > > > > >>> > > topic policy?
> > > > > >>>> > > > > > >>> > >
> > > > > >>>> > > > > > >>> >
> > > > > >>>> > > > > > >>> > Hi Boyang,
> > > > > >>>> > > > > > >>> >
> > > > > >>>> > > > > > >>> > I think you and Ismael are talking about
> > different
> > > > > >>>> scenarios.
> > > > > >>>> > > > You
> > > > > >>>> > > > > are
> > > > > >>>> > > > > > >>> > describing the scenario where the broker is
> > > > > >>>> auto-creating the
> > > > > >>>> > > > > > >>> transaction
> > > > > >>>> > > > > > >>> > log topic or consumer offset topic.  This
> > scenario
> > > > > >>>> indeed
> > > > > >>>> > > should
> > > > > >>>> > > > > not
> > > > > >>>> > > > > > >>> happen
> > > > > >>>> > > > > > >>> > in a properly-configured cluster.  However,
> > Ismael
> > > > is
> > > > > >>>> > > describing
> > > > > >>>> > > > a
> > > > > >>>> > > > > > >>> scenario
> > > > > >>>> > > > > > >>> > where the client is auto-creating some
> arbitrary
> > > > > >>>> non-internal
> > > > > >>>> > > > topic
> > > > > >>>> > > > > > >>> just by
> > > > > >>>> > > > > > >>> > sending a metadata request.
> > > > > >>>> > > > > > >>> >
> > > > > >>>> > > > > > >>> > As far as I can see, there are two solutions
> > here:
> > > > > >>>> > > > > > >>> >
> > > > > >>>> > > > > > >>> > A. Close the hole in CreateTopicsPolicy
> > > immediately.
> > > > > >>>> In new
> > > > > >>>> > > > > versions,
> > > > > >>>> > > > > > >>> > allow MetadataResponse to return
> > > > AUTHORIZATION_FAILED
> > > > > >>>> if we
> > > > > >>>> > > tried
> > > > > >>>> > > > > to
> > > > > >>>> > > > > > >>> > auto-create a topic and failed.  Find some
> other
> > > > error
> > > > > >>>> code
> > > > > >>>> > to
> > > > > >>>> > > > > return
> > > > > >>>> > > > > > >>> for
> > > > > >>>> > > > > > >>> > existing versions.
> > > > > >>>> > > > > > >>> >
> > > > > >>>> > > > > > >>> > B. Keep the hole in CreateTopicsPolicy and add
> > > some
> > > > > >>>> > > configuration
> > > > > >>>> > > > > to
> > > > > >>>> > > > > > >>> allow
> > > > > >>>> > > > > > >>> > admins to gradually migrate to closing it.  In
> > > > > >>>> practice, this
> > > > > >>>> > > > > probably
> > > > > >>>> > > > > > >>> > means a configuration toggle that enables
> direct
> > > ZK
> > > > > >>>> access,
> > > > > >>>> > > that
> > > > > >>>> > > > > > >>> starts off
> > > > > >>>> > > > > > >>> > as enabled.  Then we can eventually default it
> > to
> > > > > false
> > > > > >>>> and
> > > > > >>>> > > then
> > > > > >>>> > > > > > >>> remove it
> > > > > >>>> > > > > > >>> > entirely over time.
> > > > > >>>> > > > > > >>> >
> > > > > >>>> > > > > > >>> > best,
> > > > > >>>> > > > > > >>> > Colin
> > > > > >>>> > > > > > >>> >
> > > > > >>>> > > > > > >>>
> > > > > >>>> > > > > > >>
> > > > > >>>> > > > > >
> > > > > >>>> > > > >
> > > > > >>>> > > >
> > > > > >>>> > >
> > > > > >>>> >
> > > > > >>>>
> > > > > >>>
> > > > >
> > > >
> > >
> >
>

Reply via email to