Hi Viktor, Thanks for pointing this out.
I forgot to make this clear in the KIP. I'll update it. ClusterAction on Cluster resource is exactly right, see `ControllerApis.handleAssignReplicasToDirs`. [1] -- Igor [1]: https://github.com/apache/kafka/pull/14863/files#diff-91060c918c99d25342f625c146f14425716eda9d8fcfe1126b2c45feff388362R1073 On Mon, Dec 4, 2023, at 12:28 PM, Viktor Somogyi-Vass wrote: > Hi Igor, > > I'm just reading through your KIP and noticed that the new protocol you > created doesn't say anything about ACLs of the new AssignReplicasToDirs > API. Would it make sense to authorize these requests as other inter-broker > protocol calls are usually authorized, that is ClusterAction on Cluster > resource? > > Thanks, > Viktor > > On Tue, Nov 28, 2023 at 4:18 PM Igor Soarez <i...@soarez.me> wrote: > > > Hi everyone, > > > > There have been a number of further changes. > > > > I have updated the KIP to reflect them, but for reference, > > I'd also like to update this thread with a summary. > > > > 1. The reserved Uuids and their names for directories have changed. > > The first 100 Uuids are reserved for future use. > > > > 2. During the ZooKeeper to KRaft migration, if a broker still > > configured in ZK mode has any log directory offline, it will > > shutdown and refuse to startup. The expectation is that this > > escalation from a log directory's unavailability to the entire > > broker's unavailability will be temporary, limited to the migration > > period. And that the simplification will help develop, test and > > support this feature. > > > > 3. The representation of replica directories in metadata records is > > no longer tightly coupled with the respective broker IDs. Instead of > > replacing the int[] replicas field in PartitionRecord and > > PartitionChangeRecord, we are instead introducing a new field Uuid[] > > named directories, that should be kept in the same size and order as > > the existing replicas field. See > > https://github.com/apache/kafka/pull/14290 for further details. > > > > 4. Assignments that are respective to failed log directories are no > > longer prioritized. Previously the KIP proposed prioritizing > > assignments that related to failed log directories, aiming to > > synchronize the necessary replica to directory mapping on the > > controller before handling the directory failure. Recently, we have > > decided to remove any prioritization of these assignments, as > > delaying the reporting of directory failures is considered > > detrimental for any reason > > > > 5. Uuids for log directories that failed after startup are always > > included in every broker heartbeat request. Previously the KIP > > proposed sending Uuids for failed directories in the broker > > heartbeat until a successful reply is received. However, due to the > > overload mode handling of broker heartbeats, because broker > > heartbeat requests may receive a successful response without being > > fully processed, it is preferable to always send the cumulative list > > of directory IDs that have failed since startup. In the future, this > > list can be trimmed to remove directory IDs that are seen to be > > removed from the broker registration, as the broker catches up with > > metadata. > > > > 6. The proposal to shutdown the broker log.dir.failure.timeout.ms > > after not being able to communicate that some log directory is > > offline is now more of an open question. It's unclear if this will > > actually be necessary. > > > > Please share if you have any thoughts. > > > > Best, > > > > -- > > Igor > > > > > > On Tue, Oct 10, 2023, at 5:28 AM, Igor Soarez wrote: > > > Hi Colin, > > > > > > Thanks for the renaming suggestions. UNASSIGNED is better then > > > UNKNOWN, MIGRATING is also better than SELECTED and I don't > > > expect it to be used outside of the migration phase. > > > LOST can also work instead of OFFLINE, but I think there will > > > be other uses for this value outside of the migration, like > > > in the intra-broker replica movement edge cases described in the KIP. > > > I've updated the KIP and also filed a tiny PR with your suggestion, > > > except I'm keeping the description of LOST more broad than just > > > scoped to the migration. > > > > > > https://github.com/apache/kafka/pull/14517 > > > > > > > > > The KIP already proposes that the broker does not want to unfence > > > until it has confirmed all the assignments are communicated > > > with the controller. And you're right about the interaction > > > with ReplicaManager, we definitely don't want RPCs coming > > > out of there. My intention is to introduce a new manager, as you > > > suggest, with its own event loop, that batches and prioritizes > > > assignment and dir failure events, called DirectoryEventManager. > > > There's already an open PR, perhaps you could have a look? > > > > > > KAFKA-15357: Aggregate and propagate assignments and logdir failures > > > https://github.com/apache/kafka/pull/14369 > > > > > > > > > > With regard to the failure detection "gap" during hybrid mode: the > > > > kraft controller sends a full LeaderAndIsrRequest to the brokers > > > > that are in hybrid mode, right? And there is a per-partition > > > > response as well. Right now, we don't pay attention to the error > > > > codes sent back in the response. But we could. Any replica with an > > > > error could be transitioned from MIGRATING -> LOST, right? That > > > > would close the failure detection gap. > > > > > > Almost. The missing bit is that the controller would also need to > > > watch the /log_dir_event_notification znode, and on any change > > > read the broker ID in the value and send a full LeaderAndIsrRequest > > > to the respective broker. Without this watch, it might be a very long > > > time between a dir failing and the controller sending > > > LeaderAndIsrRequests covering every partition hosted in the failed dir. > > > > > > Watching the znode is a read-only action, and it doesn't seem like > > > a big thing to add this watch plus the error handling. > > > Maybe extending the ZK Controller compatibility functionality in this > > > way isn't such a bad option after all? > > > > > > > > > Best, > > > > > > -- > > > Igor > > > > > >