Re: [DISCUSS] KIP-631: The Quorum-based Kafka Controller

Jun Rao Thu, 03 Dec 2020 16:37:45 -0800

Hi, Colin,

Thanks for the updated KIP. A few more comments below.


80.2 For deprecated configs, we need to include zookeeper.* and
broker.id.generation.enable.

83.1 If a broker is down, does the controller keep the previously
registered broker epoch forever? If not, how long does the controller keep
it? What does the controller do when receiving a broker heartbeat request
with an unfound broker epoch?

100. Have you figured out if we need to add a new record type for reporting
partitions on failed disks?

102. For debugging purposes, sometimes it's useful to read the metadata
topic using tools like console-consumer. Should we support that and if so,
how?

200. "brokers which are fenced will not appear in MetadataResponses. The
broker will not respond to these requests-- instead, it will simply
disconnect." If the controller is partitioned off from the brokers, this
design will cause every broker to stop accepting new client requests. In
contrast, if ZK is partitioned off, the existing behavior is that the
brokers can continue to work based on the last known metadata. So, I am not
sure if we should change the existing behavior because of the bigger impact
in the new one. Another option is to keep the existing behavior and expose
a metric for fenced brokers so that the operator could be alerted.

201. I read Ron's comment, but I am still not sure the benefit of keeping
broker.id and controller.id in meta.properties. It seems that we are just
duplicating the same info in two places and have the additional burden of
making sure the values in the two places are consistent.

202. controller.connect.security.protocol: Is this needed since
controller.listener.names and listener.security.protocol.map imply the
security protocol already?

203. registration.heartbeat.interval.ms: It defaults to 2k. ZK uses 1/3 of
the session timeout for heartbeat. So, given the default 18k for
registration.lease.timeout.ms, should we default
registration.heartbeat.interval.ms to 6k?

204. "The highest metadata offset which the broker has not reached." It
seems this should be "has reached".

205. UnfenceBrokerRecord and UnregisterBrokerRecord: To me, they seem to be
the same. Do we need both?

206. TopicRecord: The Deleting field is used to indicate that the topic is
being deleted. I am wondering if this is really needed since RemoveTopic
already indicates the same thing.

Jun

On Wed, Dec 2, 2020 at 2:50 PM Colin McCabe <[email protected]> wrote:

> On Wed, Dec 2, 2020, at 14:07, Ron Dagostino wrote:
> > Hi Colin.  Thanks for the updates.  It's now clear to me that brokers
> > keep their broker epoch for the life of their JVM -- they register
> > once, get their broker epoch in the response, and then never
> > re-register again.  Brokers may get fenced, but they keep the same
> > broker epoch for the life of their JVM.  The incarnation ID is also
> > kept for the life of the JVM but is generated by the broker itself
> > upon startup, and the combination of the two allows the Controller to
> > act idempotently if any previously-sent registration response gets
> > lost.  Makes sense.
> >
>
> Thanks, Ron.  That's a good summary.
>
> > One thing I wonder about is if it might be helpful for the broker to
> > send the Cluster ID as determined from its meta.properties file in its
> > registration request.  Does it even make sense for the broker to
> > successfully register and enter the Fenced state if it has the wrong
> > Cluster ID?
>
> Yeah, that's a good idea.  Let's have the broker pass its cluster ID in
> the registration RPC, and then registration can fail if the broker is
> configured for the wrong cluster.
>
> >  The nextMetadatOffset value that the broker communicates
> > in its registration request only has meaning within the correct
> > cluster, so it feels to me that the Controller should have some way to
> > perform this sanity check.  There is currently (pre-KIP 500) a check
> > in the broker to make sure its configured cluster ID matches the one
> > stored in ZooKeeper, and we will have to perform this validation
> > somewhere in the KIP-500 world.  If the Controller doesn't do it
> > within the registration request then the broker will have to make a
> > metadata request to the Controller, retrieve the Cluster ID, and
> > perform the check itself.  It feels to me that it might be better for
> > the Controller to just do it, and then the broker doesn't have to
> > worry about it anymore once it successfully registers.
> >
> > I also have a question about the broker.id value and meta.properties.
> > The KIP now says "In version 0 of meta.properties, there is a
> > broker.id field.  Version 1 does not have this field.  It is no longer
> > needed because we no longer support dynamic broker id assignment."
> > But then there is an example version 1 meta.properties file that shows
> > the broker.id value.  I actually wonder if maybe the broker.id value
> > would be good to keep in the version 1 meta.properties file because it
> > currently (pre-KIP 500, version 0) acts as a sanity check to make sure
> > the broker is using the correct log directory.  Similarly with the
> > controller.id value on controllers -- it would allow the same type of
> > sanity check for quorum controllers.
> >
>
> That's a good point.  I will add broker.id back, and also add
> controller.id as a possibility.
>
> cheers,
> Colin
>
> >
> > On Mon, Nov 30, 2020 at 7:41 PM Colin McCabe <[email protected]> wrote:
> > >
> > > On Fri, Oct 23, 2020, at 16:10, Jun Rao wrote:
> > > > Hi, Colin,
> > > >
> > > > Thanks for the reply. A few more comments.
> > >
> > > Hi Jun,
> > >
> > > Thanks again for the reply.  Sorry for the long hiatus.  I was on
> vacation for a while.
> > >
> > > >
> > > > 55. There is still text that favors new broker registration. "When a
> broker
> > > > first starts up, when it is in the INITIAL state, it will always
> "win"
> > > > broker ID conflicts.  However, once it is granted a lease, it
> transitions
> > > > out of the INITIAL state.  Thereafter, it may lose subsequent
> conflicts if
> > > > its broker epoch is stale.  (See KIP-380 for some background on
> broker
> > > > epoch.)  The reason for favoring new processes is to accommodate the
> common
> > > > case where a process is killed with kill -9 and then restarted.  We
> want it
> > > > to be able to reclaim its old ID quickly in this case."
> > > >
> > >
> > > Thanks for the reminder.  I have clarified the language here.
> Hopefully now it is clear that we don't allow quick re-use of broker IDs.
> > >
> > > > 80.1 Sounds good. Could you document that listeners is a required
> config
> > > > now? It would also be useful to annotate other required configs. For
> > > > example, controller.connect should be required.
> > > >
> > >
> > > I added a note specifying that these are required.
> > >
> > > > 80.2 Could you list all deprecated existing configs? Another one is
> > > > control.plane.listener.name since the controller no longer sends
> > > > LeaderAndIsr, UpdateMetadata and StopReplica requests.
> > > >
> > >
> > > I added a section specifying some deprecated configs.
> > >
> > > > 83.1 It seems that the broker can transition from FENCED to RUNNING
> without
> > > > registering for a new broker epoch. I am not sure how this works.
> Once the
> > > > controller fences a broker, there is no need for the controller to
> keep the
> > > > boker epoch around. So, if the fenced broker's heartbeat request
> with the
> > > > existing broker epoch will be rejected, leading the broker back to
> the
> > > > FENCED state again.
> > > >
> > >
> > > The broker epoch refers to the broker registration.  So we DO keep the
> broker epoch around even while the broker is fenced.
> > >
> > > The broker epoch changes only when there is a new broker
> registration.  Fencing or unfencing the broker doesn't change the broker
> epoch.
> > >
> > > > 83.5 Good point on KIP-590. Then should we expose the controller for
> > > > debugging purposes? If not, we should deprecate the controllerID
> field in
> > > > MetadataResponse?
> > > >
> > >
> > > I think it's OK to expose it for now, with the proviso that it won't
> be reachable by clients.
> > >
> > > > 90. We rejected the shared ID with just one reason "This is not a
> good idea
> > > > because NetworkClient assumes a single ID space.  So if there is
> both a
> > > > controller 1 and a broker 1, we don't have a way of picking the
> "right"
> > > > one." This doesn't seem to be a strong reason. For example, we could
> > > > address the NetworkClient issue with the node type as you pointed
> out or
> > > > using the negative value of a broker ID as the controller ID.
> > > >
> > >
> > > It would require a lot of code changes to support multiple types of
> node IDs.  It's not clear to me that the end result would be better -- I
> tend to think it would be worse, since it would be more complex.  In a
> similar vein, using negative numbers seems dangerous, since we use
> negatives or -1 as "special values" in many places.  For example, -1 often
> represents "no such node."
> > >
> > > One important thing to keep in mind is that we want to be able to
> transition from a broker and a controller being co-located to them no
> longer being co-located.  This is much easier to do when they have separate
> IDs.
> > >
> > > > 100. In KIP-589
> > > > <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-589+Add+API+to+update+Replica+state+in+Controller
> >,
> > > > the broker reports all offline replicas due to a disk failure to the
> > > > controller. It seems this information needs to be persisted to the
> > > > metadata
> > > > log. Do we have a corresponding record for that?
> > > >
> > >
> > > Hmm, I have to look into this a little bit more.  We may need a new
> record type.
> > >
> > > > 101. Currently, StopReplica request has 2 modes, without deletion
> and with
> > > > deletion. The former is used for controlled shutdown and handling
> disk
> > > > failure, and causes the follower to stop. The latter is for topic
> deletion
> > > > and partition reassignment, and causes the replica to be deleted.
> Since we
> > > > are deprecating StopReplica, could we document what triggers the
> stopping
> > > > of a follower and the deleting of a replica now?
> > > >
> > >
> > > RemoveTopic triggers deletion.  In general the functionality of
> StopReplica is subsumed by the metadata records.
> > >
> > > > 102. Should we include the metadata topic in the MetadataResponse?
> If so,
> > > > when it will be included and what will the metadata response look
> like?
> > > >
> > >
> > > No, it won't be included in the metadata response sent back from the
> brokers.
> > >
> > > > 103. "The active controller assigns the broker a new broker epoch,
> based on
> > > > the latest committed offset in the log." This seems inaccurate since
> the
> > > > latest committed offset doesn't always advance on every log append.
> > > >
> > >
> > > Given that the new broker epoch won't be visible until the commit has
> happened, I have changed this to "the next available offset in the log"
> > >
> > > > 104. REGISTERING(1) : It says "Otherwise, the broker moves into the
> FENCED
> > > > state.". It seems this should be RUNNING?
> > > >
> > > > 105. RUNNING: Should we require the broker to catch up to the
> metadata log
> > > > to get into this state?
> > >
> > > For 104 and 105, these sections have been reworked.
> > >
> > > best,
> > > Colin
> > >
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > >
> > > >
> > > > On Fri, Oct 23, 2020 at 1:20 PM Colin McCabe <[email protected]>
> wrote:
> > > >
> > > > > On Wed, Oct 21, 2020, at 05:51, Tom Bentley wrote:
> > > > > > Hi Colin,
> > > > > >
> > > > > > On Mon, Oct 19, 2020, at 08:59, Ron Dagostino wrote:
> > > > > > > > Hi Colin.  Thanks for the hard work on this KIP.
> > > > > > > >
> > > > > > > > I have some questions about what happens to a broker when it
> becomes
> > > > > > > > fenced (e.g. because it can't send a heartbeat request to
> keep its
> > > > > > > > lease).  The KIP says "When a broker is fenced, it cannot
> process any
> > > > > > > > client requests.  This prevents brokers which are not
> receiving
> > > > > > > > metadata updates or that are not receiving and processing
> them fast
> > > > > > > > enough from causing issues to clients." And in the
> description of the
> > > > > > > > FENCED(4) state it likewise says "While in this state, the
> broker
> > > > > does
> > > > > > > > not respond to client requests."  It makes sense that a
> fenced broker
> > > > > > > > should not accept producer requests -- I assume any such
> requests
> > > > > > > > would result in NotLeaderOrFollowerException.  But what
> about KIP-392
> > > > > > > > (fetch from follower) consumer requests?  It is conceivable
> that
> > > > > these
> > > > > > > > could continue.  Related to that, would a fenced broker
> continue to
> > > > > > > > fetch data for partitions where it thinks it is a follower?
> Even if
> > > > > > > > it rejects consumer requests it might still continue to
> fetch as a
> > > > > > > > follower.  Might it be helpful to clarify both decisions
> here?
> > > > > > >
> > > > > > > Hi Ron,
> > > > > > >
> > > > > > > Good question.  I think a fenced broker should continue to
> fetch on
> > > > > > > partitions it was already fetching before it was fenced,
> unless it
> > > > > hits a
> > > > > > > problem.  At that point it won't be able to continue, since it
> doesn't
> > > > > have
> > > > > > > the new metadata.  For example, it won't know about leadership
> changes
> > > > > in
> > > > > > > the partitions it's fetching.  The rationale for continuing to
> fetch
> > > > > is to
> > > > > > > try to avoid disruptions as much as possible.
> > > > > > >
> > > > > > > I don't think fenced brokers should accept client requests.
> The issue
> > > > > is
> > > > > > > that the fenced broker may or may not have any data it is
> supposed to
> > > > > > > have.  It may or may not have applied any configuration
> changes, etc.
> > > > > that
> > > > > > > it is supposed to have applied.  So it could get pretty
> confusing, and
> > > > > also
> > > > > > > potentially waste the client's time.
> > > > > > >
> > > > > > >
> > > > > > When fenced, how would the broker reply to a client which did
> make a
> > > > > > request?
> > > > > >
> > > > >
> > > > > Hi Tom,
> > > > >
> > > > > The broker will respond with a retryable error in that case.  Once
> the
> > > > > client has re-fetched its metadata, it will no longer see the
> fenced broker
> > > > > as part of the cluster.  I added a note to the KIP.
> > > > >
> > > > > best,
> > > > > Colin
> > > > >
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Tom
> > > > > >
> > > > >
> > > >
> >
>

Re: [DISCUSS] KIP-631: The Quorum-based Kafka Controller

Reply via email to