Hi Jun,

Thanks for the comments.

10. My intent was to rely only on the LeaderAndIsr request. It seemed
simpler if there is just one way that ISR state is updated on leaders (e.g.
the request log is all you need to see the changes), but I'd be interested
if you can think of a reason to prefer giving AlterIsr synchronous
semantics. The zkVersion field in the AlterIsr response seems to be causing
some confusion, so I will probably take it out. I had thought it might be
useful for debugging, but it is not strictly needed since we have an error
code to indicate when the expected version does not match.

11. Yes, I was planning to introduce a dedicated thread for controller
requests. I think the key is that the leader only knows what it thinks to
be the latest version. It doesn't make much sense to try and reason about
multiple in-flight versions. So really this comes down to whether, in the
case of retries, we want to insist on always sending the same AlterIsr
request or if we allow it to be changed. It may be simplest to do the
former, but I thought it would be awkward to keep retrying a state which
had become obsolete. Maybe the simple approach will be a little easier to
explain in the end? In that case, if there is a pending ISR change, we can
ignore further changes until it is resolved. As documented in the KIP, when
the leader receives the updated state, it must check whether there are any
additional changes that are required.

-Jason


On Mon, Jul 29, 2019 at 3:53 PM Jun Rao <j...@confluent.io> wrote:

> Hi, Jason,
>
> Thanks for the KIP. Looks good overall. A couple of comments below.
>
> 10. It's not very clear to me when the partition state (i.e., leader, isr,
> leaderEpoch and zkVersion) is updated on the leader. My initial
> understanding is that the leader updates the partition state on receiving a
> successful AlterIsr response (which will container a higher zkVersion).
> However, your response to Colin seems to indicate that the leader only
> updates the partition state on LeaderAndIsr request. Both approaches are
> possible. If we do the former, the partition state can be updated through
> both AlterIsr response and LeaderAndIsr request. So, we need a way to
> serialize the ordering. We can potentially use zkVersion to do that. If we
> do the latter, we need to have the controller additionally send a
> LeaderAndIsr request on completing the AlterIsrRequest. We also need some
> way to make those temporary ISR changes (due to AlterIsr) permanent based
> on the LeaderAndIsr request.
>
> 11. A question on an implementation detail. I assume that if a follower
> fetchRequest causes an ISR expansion, the completion of the fetchRequest
> doesn't block on the AlterIsr request? If so, are we going to introduce a
> separate thread to handle the sending/receiving of AlterIsr request?
> Related to do this, do we guarantee that all the AlterIsr requests from a
> broker will be sent to the controller in order? Do we allow more than one
> pending AlterIsr request or not?
>
> Jun
>
>
>
> On Mon, Jul 29, 2019 at 12:59 PM Guozhang Wang <wangg...@gmail.com> wrote:
>
> > 2. Sounds good. Thanks!
> >
> > Guozhang
> >
> > On Mon, Jul 29, 2019 at 11:43 AM Jason Gustafson <ja...@confluent.io>
> > wrote:
> >
> > > @Colin
> > >
> > > Yeah, that's a good question. If the version is higher, I think it's
> not
> > > safe for the leader to use the new version until it has received the
> full
> > > LeaderAndIsr state. For example, there may be a reassignment in
> progress
> > > which could alter the leader's expected state change. At least knowing
> > > about the higher version saves the leader unnecessary retries and will
> be
> > > useful for debugging missed LeaderAndIsr updates. I considered letting
> > the
> > > AlterIsr response include the full leader and ISR state so that the
> > leader
> > > could continue without relying on the update. However, I think this
> would
> > > just tend to mask bugs in the controller's propagation logic. It seems
> > > simpler to have one mechanism for propagation of leader and ISR state
> > > rather than two.
> > >
> > > @Guozhang
> > >
> > > 1. Yes, it is the same version field used in LeaderAndIsr requests. I
> > will
> > > make this explicit.
> > > 2. I think both options can work. The main thing I'm trying to avoid is
> > > having the leader blocking on an ISR update. At some point, if the
> leader
> > > doesn't receive an expected LeaderAndIsr update, then it must retry the
> > > AlterIsr request. I thought it would be simpler if retries always
> > reflected
> > > the latest expectation on the ISR state rather than letting the leader
> be
> > > stuck retrying a state change which may no longer be relevant. This is
> > the
> > > approach that I modeled. That being said, if it's ok with you, I'd
> prefer
> > > to leave this decision to the implementation. I think the main point is
> > > that once the leader receives the latest update, it can discard any
> > pending
> > > state.
> > >
> > >
> > > Thanks,
> > > Jason
> > >
> > >
> > >
> > > On Mon, Jul 29, 2019 at 9:22 AM Guozhang Wang <wangg...@gmail.com>
> > wrote:
> > >
> > > > Hi Jason,
> > > >
> > > > Thanks for the KIP. It looks good to me overall.
> > > >
> > > > 1. Just clarifying the "CurrentVersion" field in the newly proposed
> > > > protocol. Does it contains the same value as zkVersion that
> controller
> > > get
> > > > from ZK?
> > > >
> > > > 2. As for the comment in the KIP: "We can either await the update or
> we
> > > can
> > > > send a new update with the current version. If we did the latter,
> then
> > we
> > > > would have multiple pending inflights using the same version." My
> > > > understanding is that it is the controller who acts as the source of
> > > truth
> > > > for "currentVersion", in which case I think there's little latency
> > > benefits
> > > > to send multiple pending requests with the same version, since
> > which-ever
> > > > arrives controller first would cause the zkVersion to be bumped and
> > > > therefore the rest of the requests would be rejected with
> > > > "INVALID_ISR_VERSION". So I'd favor just wait the update response
> from
> > > the
> > > > current inflight request before sending out the next request --
> > > admittedly
> > > > this requires a bit more complicated implementation on the brokers,
> but
> > > > maybe we can generalize the request queue module on controller for
> this
> > > > purpose?
> > > >
> > > >
> > > > Guozhang
> > > >
> > > >
> > > > On Sun, Jul 28, 2019 at 10:32 AM Colin McCabe <cmcc...@apache.org>
> > > wrote:
> > > >
> > > > > Hi Jason,
> > > > >
> > > > > This looks good.
> > > > >
> > > > > If the AlterIsr request returns a higher ZK version than the one
> the
> > > > > broker currently has, will the broker use that as its new ZK
> version?
> > > I
> > > > > suppose this could happen if some of the updates the controller
> > pushed
> > > > out
> > > > > were not received or not received yet by the broker in question.
> > > > >
> > > > > best,
> > > > > Colin
> > > > >
> > > > >
> > > > > On Fri, Jul 26, 2019, at 09:43, Jason Gustafson wrote:
> > > > > > Hi All,
> > > > > >
> > > > > > I have written a proposal to change the way leaders make ISR
> > > > > > modifications:
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-497%3A+Add+inter-broker+API+to+alter+ISR
> > > > > .
> > > > > > Have a look and let me know what you think.
> > > > > >
> > > > > > Thanks,
> > > > > > Jason
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > -- Guozhang
> > > >
> > >
> >
> >
> > --
> > -- Guozhang
> >
>

Reply via email to