Hi, Jason, Thanks for the KIP. Looks good overall. A couple of comments below.
10. It's not very clear to me when the partition state (i.e., leader, isr, leaderEpoch and zkVersion) is updated on the leader. My initial understanding is that the leader updates the partition state on receiving a successful AlterIsr response (which will container a higher zkVersion). However, your response to Colin seems to indicate that the leader only updates the partition state on LeaderAndIsr request. Both approaches are possible. If we do the former, the partition state can be updated through both AlterIsr response and LeaderAndIsr request. So, we need a way to serialize the ordering. We can potentially use zkVersion to do that. If we do the latter, we need to have the controller additionally send a LeaderAndIsr request on completing the AlterIsrRequest. We also need some way to make those temporary ISR changes (due to AlterIsr) permanent based on the LeaderAndIsr request. 11. A question on an implementation detail. I assume that if a follower fetchRequest causes an ISR expansion, the completion of the fetchRequest doesn't block on the AlterIsr request? If so, are we going to introduce a separate thread to handle the sending/receiving of AlterIsr request? Related to do this, do we guarantee that all the AlterIsr requests from a broker will be sent to the controller in order? Do we allow more than one pending AlterIsr request or not? Jun On Mon, Jul 29, 2019 at 12:59 PM Guozhang Wang <wangg...@gmail.com> wrote: > 2. Sounds good. Thanks! > > Guozhang > > On Mon, Jul 29, 2019 at 11:43 AM Jason Gustafson <ja...@confluent.io> > wrote: > > > @Colin > > > > Yeah, that's a good question. If the version is higher, I think it's not > > safe for the leader to use the new version until it has received the full > > LeaderAndIsr state. For example, there may be a reassignment in progress > > which could alter the leader's expected state change. At least knowing > > about the higher version saves the leader unnecessary retries and will be > > useful for debugging missed LeaderAndIsr updates. I considered letting > the > > AlterIsr response include the full leader and ISR state so that the > leader > > could continue without relying on the update. However, I think this would > > just tend to mask bugs in the controller's propagation logic. It seems > > simpler to have one mechanism for propagation of leader and ISR state > > rather than two. > > > > @Guozhang > > > > 1. Yes, it is the same version field used in LeaderAndIsr requests. I > will > > make this explicit. > > 2. I think both options can work. The main thing I'm trying to avoid is > > having the leader blocking on an ISR update. At some point, if the leader > > doesn't receive an expected LeaderAndIsr update, then it must retry the > > AlterIsr request. I thought it would be simpler if retries always > reflected > > the latest expectation on the ISR state rather than letting the leader be > > stuck retrying a state change which may no longer be relevant. This is > the > > approach that I modeled. That being said, if it's ok with you, I'd prefer > > to leave this decision to the implementation. I think the main point is > > that once the leader receives the latest update, it can discard any > pending > > state. > > > > > > Thanks, > > Jason > > > > > > > > On Mon, Jul 29, 2019 at 9:22 AM Guozhang Wang <wangg...@gmail.com> > wrote: > > > > > Hi Jason, > > > > > > Thanks for the KIP. It looks good to me overall. > > > > > > 1. Just clarifying the "CurrentVersion" field in the newly proposed > > > protocol. Does it contains the same value as zkVersion that controller > > get > > > from ZK? > > > > > > 2. As for the comment in the KIP: "We can either await the update or we > > can > > > send a new update with the current version. If we did the latter, then > we > > > would have multiple pending inflights using the same version." My > > > understanding is that it is the controller who acts as the source of > > truth > > > for "currentVersion", in which case I think there's little latency > > benefits > > > to send multiple pending requests with the same version, since > which-ever > > > arrives controller first would cause the zkVersion to be bumped and > > > therefore the rest of the requests would be rejected with > > > "INVALID_ISR_VERSION". So I'd favor just wait the update response from > > the > > > current inflight request before sending out the next request -- > > admittedly > > > this requires a bit more complicated implementation on the brokers, but > > > maybe we can generalize the request queue module on controller for this > > > purpose? > > > > > > > > > Guozhang > > > > > > > > > On Sun, Jul 28, 2019 at 10:32 AM Colin McCabe <cmcc...@apache.org> > > wrote: > > > > > > > Hi Jason, > > > > > > > > This looks good. > > > > > > > > If the AlterIsr request returns a higher ZK version than the one the > > > > broker currently has, will the broker use that as its new ZK version? > > I > > > > suppose this could happen if some of the updates the controller > pushed > > > out > > > > were not received or not received yet by the broker in question. > > > > > > > > best, > > > > Colin > > > > > > > > > > > > On Fri, Jul 26, 2019, at 09:43, Jason Gustafson wrote: > > > > > Hi All, > > > > > > > > > > I have written a proposal to change the way leaders make ISR > > > > > modifications: > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-497%3A+Add+inter-broker+API+to+alter+ISR > > > > . > > > > > Have a look and let me know what you think. > > > > > > > > > > Thanks, > > > > > Jason > > > > > > > > > > > > > > > > > > -- > > > -- Guozhang > > > > > > > > -- > -- Guozhang >