On Thu, Apr 9, 2020, at 09:36, Paolo Moriello wrote:
> Hi Colin,
> 
> Thanks again for checking this out.
> 
> Indeed you are right, a configuration problem is what leads to
> authorization failure (and consequently to the internal topics bug): i.e.
> incorrect ACLs configuration. In particular, in case of insufficient
> cluster-level ACLs, so if one does not include the broker CN required to
> allow inter-broker communication when client SSL is required:
> 1) FindCoordinator request completes successfully, and __consumer_offsets
> topic is created in zk
> 2) but subsequent UpdateMetadata and LeaderAndIsr fail. This leaves the
> internal topic in a bad state
> 
> A deeper look confirmed that the change I proposed initially does not work,
> since authorizing the user principal is not enough to prevent the issue.
> However, I believe that we should still avoid creating the internal
> topic(s) at all in case of insufficient broker ACLs (which means, make
> FindCoordinator request fail since we won't have the required metadata). A
> possibility could be to try to check the existence of brokers' ACLs before
> creating the internal topic.
> Let me know if you have any feedback.

Hi Paolo,

If the problem is broker ACLs being configured incorrectly so that it can't 
receive requests from the controller, a lot of things will fail.  This isn't 
really related to anything with FindCoordinator.

best,
Colin


> 
> Thanks,
> Paolo
> 
> 
> On Tue, 7 Apr 2020 at 17:12, Colin McCabe <cmcc...@apache.org> wrote:
> 
> > On Tue, Apr 7, 2020, at 08:08, Paolo Moriello wrote:
> > > Hi Colin,
> > >
> > > Thanks for your interest in this. I agree with you, this change could
> > break
> > > compatibility. However, changing the source principal is non trivial in
> > > this case. In fact, here the problem is not in the internal topic
> > creation
> > > - which succeeds - but in the two subsequent LeaderAndIsr and
> > > UpdateMetadata requests.
> > >
> > > When a consumer tries to consume for the first time, the creation of
> > > internal topic completes, zk-nodes are filled with the necessary
> > metadata,
> > > and this triggers a ZkPartitionStateMachine (PartitionStateMachine.scala)
> > > update which, in turn, makes the ControllerChannelManager
> > > (ControllerChannelManager.scala) send LeaderAndIsr and UpdateMetadata
> > > requests to the brokers; (I can be wrong, but I believe that this
> > requests
> > > are already being executed with broker principal). These requests fail
> > > because we authorize the cluster operation there, so the
> > __consumer_offsets
> > > topic remains in a bad state.
> >
> > I might be misunderstanding something here, but it seems to me that if
> > LeaderAndIsrRequest or UpdateMetadataRequest are failing with authorization
> > errors, then there is a configuration problem on the cluster which doesn't
> > have anything to do with the __consumer_offsets topic.
> >
> > >
> > > Is there a reason to not authorize the operation for find coordinator
> > > requests as well?
> >
> > To be clear, we can't change the authorization for FindCoordinatorRequest.
> >
> > best,
> > Colin
> >
>

Reply via email to