Hi David,

Thanks for bringing this up.  This is indeed something that we overlooked, that 
we'll have to figure out.

The active controller may not be co-located with a broker in the post-KIP-500 
world.  So it does not make sense to have the client communicate directly with 
the controller.  Just to reiterate, the controller network is not accessible to 
clients, much like the ZK network today.

It seems like we can still do what we need to do for KIP-599, but we just can't 
mute the channel.  This just means we need to do the backpressure at a slightly 
higher level.

best,
Colin


On Tue, Jul 28, 2020, at 00:51, David Jacot wrote:
> Hi Boyang,
> 
> Thanks for the update.
> 
> 1./2. In KIP-599 (accepted and already in trunk), we throttle the
> CreateTopicsRequest,
> CreatePartitionsRequest, and DeleteTopicsRequests by muting the channel used
> by the Admin client and setting the throttleTimeMs in the response. The
> change that
> you propose breaks this. If we distribute these requests to all brokers,
> muting channels
> does not make sense anymore.
> 
> Have we considered continuing sending controller requests to the broker
> that hosts the
> controller? We could continue to send requests to the broker listener and
> redirect them
> to the controller internally. This would still keep the controller
> isolated. Another advantage
> of doing so is that the forwarding on the same machine does not require to
> go over the
> network.
> 
> Could you elaborate a bit more on the motivation and the reasoning behind
> this change?
> Is there a requirement or a strong advantage that I have missed?
> 
> Best,
> David
> 
> On Tue, Jul 28, 2020 at 6:48 AM Boyang Chen <reluctanthero...@gmail.com>
> wrote:
> 
> > Hey there,
> >
> > I'm re-opening this thread because after some initial implementations
> > started, we spotted some gaps in the approved KIP as well as some
> > inconsistencies with KIP-631 controller. There are a couple of addendums to
> > the existing KIP, specifically:
> >
> > 1. As the controller is foreseen to be only accessible to the brokers, the
> > new admin client would not have direct access to the controller. It is
> > guaranteed on the MetadataResponse level which no longer provides
> > `ControllerId` to client side requests.
> >
> > 2. The broker would forward any direct ZK path mutation requests, including
> > topic creation/deletion, reassignment, etc since we deprecate the direct
> > controller access on the client side. No more protocol version bump is
> > necessary for the configuration requests.
> >
> > 3. To make sure forwarding requests pass the authorization, broker
> > principal CLUSTER_ACTION would be allowed to be used as an alternative
> > authentication method for a variety of principal operations, including
> > ALTER, ALTER_CONFIG, DELETE, etc. It is because the forwarding request
> > needs to use the proxy broker's own principal, which is currently not
> > supported to be used for many configuration change authentication listed
> > above. The full list could be found in the KIP.
> >
> > 4. Add a new BROKER_AUTHORIZATION_FAILURE error code to indicate any
> > internal security configuration failure, when the forwarded request failed
> > authentication on the controller side.
> >
> > Let me know what you think. With such a major refinement of the KIP, I'm
> > open for re-vote after discussions converge.
> >
> > Boyang
> >
> > On Wed, Jul 1, 2020 at 2:17 PM Boyang Chen <reluctanthero...@gmail.com>
> > wrote:
> >
> > > Hey folks,
> > >
> > > I have also synced on the KIP-578 which was doing the partition limit, to
> > > make sure the partition limit error code would be properly propagated
> > once
> > > it is done on top of KIP-590. Let me know if you have further questions
> > or
> > > concerns.
> > >
> > > Boyang
> > >
> > > On Tue, Jun 23, 2020 at 5:08 PM Boyang Chen <reluctanthero...@gmail.com>
> > > wrote:
> > >
> > >> Thanks for the clarification, Colin and Ismael. Personally I also feel
> > >> Option A is better to prioritize fixing the gap. Just to be clear, the
> > >> proposed solution would be:
> > >>
> > >> 1. Bump the Metadata RPC version to return POLICY_VIOLATION. In the
> > >> application level, we should swap the error message with the actual
> > failure
> > >> reason such as "violation of topic creation policy when attempting to
> > auto
> > >> create internal topic through MetadataRequest."
> > >>
> > >> 2. For older Metadata RPC, return AUTHORIZATION_FAILED to fail fast.
> > >>
> > >> Will address our other discussed points as well in the KIP, let me know
> > >> if you have further questions.
> > >>
> > >> Thanks,
> > >> Boyang
> > >>
> > >> On Tue, Jun 23, 2020 at 10:41 AM Ismael Juma <ism...@juma.me.uk> wrote:
> > >>
> > >>> Option A is basically what I was thinking. But with a slight
> > adjustment:
> > >>>
> > >>> New versions of MetadataResponse return POLICY_VIOLATION, old versions
> > >>> return AUTHORIZATION_FAILED. The latter works correctly with old Java
> > >>> clients (i.e. the client fails fast and propagates the error), I've
> > >>> tested
> > >>> it. Adjust new clients to treat POLICY_VIOLATION like
> > >>> AUTHORIZATION_FAILED,
> > >>> but propagate the custom error message.
> > >>>
> > >>> Ismael
> > >>>
> > >>> On Mon, Jun 22, 2020 at 11:00 PM Colin McCabe <cmcc...@apache.org>
> > >>> wrote:
> > >>>
> > >>> > > > > On Fri, Jun 19, 2020 at 3:18 PM Ismael Juma <ism...@juma.me.uk
> > >
> > >>> > wrote:
> > >>> > > > >
> > >>> > > > > > Hi Colin,
> > >>> > > > > >
> > >>> > > > > > The KIP states in the Compatibility section (not Future
> > work):
> > >>> > > > > >
> > >>> > > > > > "To support the proxy of requests, we need to build a channel
> > >>> for
> > >>> > > > > > brokers to talk directly to the controller. This part of the
> > >>> design
> > >>> > > > > > is internal change only and won’t block the KIP progress."
> > >>> > > > > >
> > >>> > > > > > I am clarifying that this is not internal only due to the
> > >>> config.
> > >>> > If we
> > >>> > > > > > say that this KIP depends on another KIP before we can merge
> > >>> > > > > > it, that's fine although it feels a bit unnecessary.
> > >>> > > > > >
> > >>> >
> > >>> > Hi Ismael,
> > >>> >
> > >>> > I didn't realize there was still a reference to the separate
> > controller
> > >>> > channel in the "Compatibility, Deprecation, and Migration Plan"
> > >>> section.  I
> > >>> > agree that it doesn't really belong there.  Given that this is
> > creating
> > >>> > confusion, I would suggest that we just drop this from the KIP
> > >>> entirely.
> > >>> > It really is orthogonal to what this KIP is about-- we don't need a
> > >>> > separate channel to implement redirection.
> > >>> >
> > >>> > Boyang wrote:
> > >>> >
> > >>> > >
> > >>> > > We are only opening the doors for specific internal topics
> > (offsets,
> > >>> txn
> > >>> > > log), which I assume the client should have no possibility to
> > mutate
> > >>> the
> > >>> > > topic policy?
> > >>> > >
> > >>> >
> > >>> > Hi Boyang,
> > >>> >
> > >>> > I think you and Ismael are talking about different scenarios.  You
> > are
> > >>> > describing the scenario where the broker is auto-creating the
> > >>> transaction
> > >>> > log topic or consumer offset topic.  This scenario indeed should not
> > >>> happen
> > >>> > in a properly-configured cluster.  However, Ismael is describing a
> > >>> scenario
> > >>> > where the client is auto-creating some arbitrary non-internal topic
> > >>> just by
> > >>> > sending a metadata request.
> > >>> >
> > >>> > As far as I can see, there are two solutions here:
> > >>> >
> > >>> > A. Close the hole in CreateTopicsPolicy immediately.  In new
> > versions,
> > >>> > allow MetadataResponse to return AUTHORIZATION_FAILED if we tried to
> > >>> > auto-create a topic and failed.  Find some other error code to return
> > >>> for
> > >>> > existing versions.
> > >>> >
> > >>> > B. Keep the hole in CreateTopicsPolicy and add some configuration to
> > >>> allow
> > >>> > admins to gradually migrate to closing it.  In practice, this
> > probably
> > >>> > means a configuration toggle that enables direct ZK access, that
> > >>> starts off
> > >>> > as enabled.  Then we can eventually default it to false and then
> > >>> remove it
> > >>> > entirely over time.
> > >>> >
> > >>> > best,
> > >>> > Colin
> > >>> >
> > >>>
> > >>
> >
>

Reply via email to