On Thu, Apr 16, 2020, at 08:51, Ismael Juma wrote:
> I don't think these requests are necessarily infrequent under multi tenant
> environments though. I've seen Controller availability being an issue for
> describe topics for example (before it was changed to go to any broker).

Hi Ismael,

I don't think DescribeTopics is a good comparison.  That RPC is available to 
regular users and is used many orders of magnitude more frequently than 
administrative operations like changing ACLs or setting quotas.

The operations we're talking about redirecting here all require the highest 
possible permissions and will not be frequent in any real-world cluster... 
unless someone is running a stress-test or a benchmark.  We didn't even notice 
some of the serious bugs in setting dynamic configs until recently because the 
alterConfigs / incrementalAlterConfigs RPCs are so infrequently called.

Additionally, this KIP fixes some existing bugs.  The current approach of 
having random writers do a read-write-modify cycle on a configuration znode is 
buggy since it could be interleaved with another node's read-modify write 
cycle.  It has a "lost updates" problem.

For example, node 1 reads a config znode.  Node 2 reads the same config znode.  
Node 1 writes back a modified version of the znode.  Node 2 writes back its 
(differently) modified version, overwriting the changes from node 1.

I don't think anyone ever noticed this problem since, again, these operations 
are very infrequent, making the chance of such a collision low.  But it is a 
serious bug that is fixed by having a single writer.  (We should add this to 
the KIP...)

> 
> Would it be better to redirect once the controller quorum is there?

This KIP is needed for the bridge release.  The bridge release upgrade process 
relies on the old nodes sending their administrative operations to the 
controller quorum, not directly to zookeeper.

best,
Colin


> 
> Note that this is different from things like AlterIsr since these calls are
> coming from clients versus other brokers.
> 
> Ismael
> 
> On Wed, Apr 15, 2020, 5:10 PM Colin McCabe <cmcc...@apache.org> wrote:
> 
> > Hi Ismael,
> >
> > I agree that sending these requests through the controller will not work
> > during the periods when there is no controller.  However, those periods
> > should be short-- otherwise we have bigger problems in the cluster.
> >
> > These requests are very infrequent because they are administrative
> > operations.  Basically the affected operations are changing ACLs, changing
> > dynamic configurations, and changing quotas.
> >
> > best,
> > Colin
> >
> >
> > On Wed, Apr 15, 2020, at 15:25, Ismael Juma wrote:
> > > Hi Boyang,
> > >
> > > Thanks for the KIP. Have we considered that this reduces availability for
> > > these operations since we have a single Controller instead of the ZK
> > quorum?
> > >
> > > Ismael
> > >
> > > On Fri, Apr 3, 2020 at 4:45 PM Boyang Chen <reluctanthero...@gmail.com>
> > > wrote:
> > >
> > > > Hey all,
> > > >
> > > > I would like to start off the discussion for KIP-590, a follow-up
> > > > initiative after KIP-500:
> > > >
> > > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-590%3A+Redirect+Zookeeper+Mutation+Protocols+to+The+Controller
> > > >
> > > > This KIP proposes to migrate existing Zookeeper mutation paths,
> > including
> > > > configuration, security and quota changes, to controller-only by always
> > > > routing these alterations to the controller.
> > > >
> > > > Let me know your thoughts!
> > > >
> > > > Best,
> > > > Boyang
> > > >
> > >
> >
>

Reply via email to