Thanks for the feedback Colin! On Tue, Jul 28, 2020 at 2:11 PM Colin McCabe <cmcc...@apache.org> wrote:
> Hi Boyang, > > Thanks for updating this. A few comments below: > > In the "Routing Request Security" section, there is a reference to "older > requests that need redirection." But after these new revisions, both new > and old requests need redirection. So we should rephrase this. > > > In addition, to avoid exposing this forwarding power to the admin > clients, > > the routing request shall be forwarded towards the controller broker > internal > > endpoint which should be only visible to other brokers inside the > cluster > > in the KIP-500 controller. Any admin configuration request with broker > > principal should not be going through the public endpoint and will be > > rejected for security purpose. > > We should also describe how this will work in the pre-KIP-500 case. In > that case, CLUSTER_ACTION gets the extra permissions described here only > when the message comes in on the inter-broker listener. We should state > that here. > > (I can see that you have this information later on in the "Security Access > Changes" section, but it would be good to have it here as well, to avoid > confusion.) > > > To be more strict of protecting controller information, the > "ControllerId" > > field in new MetadataResponse shall be set to -1 when the original > request > > comes from a non-broker client and it is already on v10. We shall use the > > request listener name to distinguish whether a given request is > inter-broker, > > or from the client. > > I'm not sure why we would need to distinguish between broker clients and > non-broker clients. Brokers don't generally send MetadataRequests to other > brokers, do they? Brokers learn about metadata from UpdateMetadataRequest > and LeaderAndIsrRequest, not by sending MetadataRequests to other brokers. > > We do have one use case where the MetadataRequest gets sent between the brokers, which is the InterBrokerSendThread. Currently we don't rely on it to get the controller id, so I guess your suggestion should be good to enforce. We could use some meta comment on the NetworkClient that it should not be used to get the controller location. Probably what we want here is: v0-v9: return a random broker in the cluster > as the controller ID. v10: no controllerID present in the > MetadataResponse. We should also deprecate the adminClient method which > gets the controllerId. > > > BROKER_AUTHORIZATION_FAILURE(92, "Authorization failed for the > > request during forwarding, this indicates an internal error on the broker > > cluster security setup.", BrokerAuthorizationFailureException::new); > > Grammar nitpick: It would be good to have a period between "forwarding" > and "this" to avoid a run-on sentence :) > > best, > Colin > > > On Mon, Jul 27, 2020, at 21:47, Boyang Chen wrote: > > Hey there, > > > > I'm re-opening this thread because after some initial implementations > > started, we spotted some gaps in the approved KIP as well as some > > inconsistencies with KIP-631 controller. There are a couple of addendums > to > > the existing KIP, specifically: > > > > 1. As the controller is foreseen to be only accessible to the brokers, > the > > new admin client would not have direct access to the controller. It is > > guaranteed on the MetadataResponse level which no longer provides > > `ControllerId` to client side requests. > > > > 2. The broker would forward any direct ZK path mutation requests, > including > > topic creation/deletion, reassignment, etc since we deprecate the direct > > controller access on the client side. No more protocol version bump is > > necessary for the configuration requests. > > > > 3. To make sure forwarding requests pass the authorization, broker > > principal CLUSTER_ACTION would be allowed to be used as an alternative > > authentication method for a variety of principal operations, including > > ALTER, ALTER_CONFIG, DELETE, etc. It is because the forwarding request > > needs to use the proxy broker's own principal, which is currently not > > supported to be used for many configuration change authentication listed > > above. The full list could be found in the KIP. > > > > 4. Add a new BROKER_AUTHORIZATION_FAILURE error code to indicate any > > internal security configuration failure, when the forwarded request > failed > > authentication on the controller side. > > > > Let me know what you think. With such a major refinement of the KIP, I'm > > open for re-vote after discussions converge. > > > > Boyang > > > > On Wed, Jul 1, 2020 at 2:17 PM Boyang Chen <reluctanthero...@gmail.com> > > wrote: > > > > > Hey folks, > > > > > > I have also synced on the KIP-578 which was doing the partition limit, > to > > > make sure the partition limit error code would be properly propagated > once > > > it is done on top of KIP-590. Let me know if you have further > questions or > > > concerns. > > > > > > Boyang > > > > > > On Tue, Jun 23, 2020 at 5:08 PM Boyang Chen < > reluctanthero...@gmail.com> > > > wrote: > > > > > >> Thanks for the clarification, Colin and Ismael. Personally I also feel > > >> Option A is better to prioritize fixing the gap. Just to be clear, the > > >> proposed solution would be: > > >> > > >> 1. Bump the Metadata RPC version to return POLICY_VIOLATION. In the > > >> application level, we should swap the error message with the actual > failure > > >> reason such as "violation of topic creation policy when attempting to > auto > > >> create internal topic through MetadataRequest." > > >> > > >> 2. For older Metadata RPC, return AUTHORIZATION_FAILED to fail fast. > > >> > > >> Will address our other discussed points as well in the KIP, let me > know > > >> if you have further questions. > > >> > > >> Thanks, > > >> Boyang > > >> > > >> On Tue, Jun 23, 2020 at 10:41 AM Ismael Juma <ism...@juma.me.uk> > wrote: > > >> > > >>> Option A is basically what I was thinking. But with a slight > adjustment: > > >>> > > >>> New versions of MetadataResponse return POLICY_VIOLATION, old > versions > > >>> return AUTHORIZATION_FAILED. The latter works correctly with old Java > > >>> clients (i.e. the client fails fast and propagates the error), I've > > >>> tested > > >>> it. Adjust new clients to treat POLICY_VIOLATION like > > >>> AUTHORIZATION_FAILED, > > >>> but propagate the custom error message. > > >>> > > >>> Ismael > > >>> > > >>> On Mon, Jun 22, 2020 at 11:00 PM Colin McCabe <cmcc...@apache.org> > > >>> wrote: > > >>> > > >>> > > > > On Fri, Jun 19, 2020 at 3:18 PM Ismael Juma < > ism...@juma.me.uk> > > >>> > wrote: > > >>> > > > > > > >>> > > > > > Hi Colin, > > >>> > > > > > > > >>> > > > > > The KIP states in the Compatibility section (not Future > work): > > >>> > > > > > > > >>> > > > > > "To support the proxy of requests, we need to build a > channel > > >>> for > > >>> > > > > > brokers to talk directly to the controller. This part of > the > > >>> design > > >>> > > > > > is internal change only and won’t block the KIP progress." > > >>> > > > > > > > >>> > > > > > I am clarifying that this is not internal only due to the > > >>> config. > > >>> > If we > > >>> > > > > > say that this KIP depends on another KIP before we can > merge > > >>> > > > > > it, that's fine although it feels a bit unnecessary. > > >>> > > > > > > > >>> > > > >>> > Hi Ismael, > > >>> > > > >>> > I didn't realize there was still a reference to the separate > controller > > >>> > channel in the "Compatibility, Deprecation, and Migration Plan" > > >>> section. I > > >>> > agree that it doesn't really belong there. Given that this is > creating > > >>> > confusion, I would suggest that we just drop this from the KIP > > >>> entirely. > > >>> > It really is orthogonal to what this KIP is about-- we don't need a > > >>> > separate channel to implement redirection. > > >>> > > > >>> > Boyang wrote: > > >>> > > > >>> > > > > >>> > > We are only opening the doors for specific internal topics > (offsets, > > >>> txn > > >>> > > log), which I assume the client should have no possibility to > mutate > > >>> the > > >>> > > topic policy? > > >>> > > > > >>> > > > >>> > Hi Boyang, > > >>> > > > >>> > I think you and Ismael are talking about different scenarios. You > are > > >>> > describing the scenario where the broker is auto-creating the > > >>> transaction > > >>> > log topic or consumer offset topic. This scenario indeed should > not > > >>> happen > > >>> > in a properly-configured cluster. However, Ismael is describing a > > >>> scenario > > >>> > where the client is auto-creating some arbitrary non-internal topic > > >>> just by > > >>> > sending a metadata request. > > >>> > > > >>> > As far as I can see, there are two solutions here: > > >>> > > > >>> > A. Close the hole in CreateTopicsPolicy immediately. In new > versions, > > >>> > allow MetadataResponse to return AUTHORIZATION_FAILED if we tried > to > > >>> > auto-create a topic and failed. Find some other error code to > return > > >>> for > > >>> > existing versions. > > >>> > > > >>> > B. Keep the hole in CreateTopicsPolicy and add some configuration > to > > >>> allow > > >>> > admins to gradually migrate to closing it. In practice, this > probably > > >>> > means a configuration toggle that enables direct ZK access, that > > >>> starts off > > >>> > as enabled. Then we can eventually default it to false and then > > >>> remove it > > >>> > entirely over time. > > >>> > > > >>> > best, > > >>> > Colin > > >>> > > > >>> > > >> > > >