Yes, this should be CLUSTER_ACTION on CLUSTER, to be consistent with our other internal RPCs.
best, Colin On Mon, Dec 4, 2023, at 04:28, Viktor Somogyi-Vass wrote: > Hi Igor, > > I'm just reading through your KIP and noticed that the new protocol you > created doesn't say anything about ACLs of the new AssignReplicasToDirs > API. Would it make sense to authorize these requests as other inter-broker > protocol calls are usually authorized, that is ClusterAction on Cluster > resource? > > Thanks, > Viktor > > On Tue, Nov 28, 2023 at 4:18 PM Igor Soarez <i...@soarez.me> wrote: > >> Hi everyone, >> >> There have been a number of further changes. >> >> I have updated the KIP to reflect them, but for reference, >> I'd also like to update this thread with a summary. >> >> 1. The reserved Uuids and their names for directories have changed. >> The first 100 Uuids are reserved for future use. >> >> 2. During the ZooKeeper to KRaft migration, if a broker still >> configured in ZK mode has any log directory offline, it will >> shutdown and refuse to startup. The expectation is that this >> escalation from a log directory's unavailability to the entire >> broker's unavailability will be temporary, limited to the migration >> period. And that the simplification will help develop, test and >> support this feature. >> >> 3. The representation of replica directories in metadata records is >> no longer tightly coupled with the respective broker IDs. Instead of >> replacing the int[] replicas field in PartitionRecord and >> PartitionChangeRecord, we are instead introducing a new field Uuid[] >> named directories, that should be kept in the same size and order as >> the existing replicas field. See >> https://github.com/apache/kafka/pull/14290 for further details. >> >> 4. Assignments that are respective to failed log directories are no >> longer prioritized. Previously the KIP proposed prioritizing >> assignments that related to failed log directories, aiming to >> synchronize the necessary replica to directory mapping on the >> controller before handling the directory failure. Recently, we have >> decided to remove any prioritization of these assignments, as >> delaying the reporting of directory failures is considered >> detrimental for any reason >> >> 5. Uuids for log directories that failed after startup are always >> included in every broker heartbeat request. Previously the KIP >> proposed sending Uuids for failed directories in the broker >> heartbeat until a successful reply is received. However, due to the >> overload mode handling of broker heartbeats, because broker >> heartbeat requests may receive a successful response without being >> fully processed, it is preferable to always send the cumulative list >> of directory IDs that have failed since startup. In the future, this >> list can be trimmed to remove directory IDs that are seen to be >> removed from the broker registration, as the broker catches up with >> metadata. >> >> 6. The proposal to shutdown the broker log.dir.failure.timeout.ms >> after not being able to communicate that some log directory is >> offline is now more of an open question. It's unclear if this will >> actually be necessary. >> >> Please share if you have any thoughts. >> >> Best, >> >> -- >> Igor >> >> >> On Tue, Oct 10, 2023, at 5:28 AM, Igor Soarez wrote: >> > Hi Colin, >> > >> > Thanks for the renaming suggestions. UNASSIGNED is better then >> > UNKNOWN, MIGRATING is also better than SELECTED and I don't >> > expect it to be used outside of the migration phase. >> > LOST can also work instead of OFFLINE, but I think there will >> > be other uses for this value outside of the migration, like >> > in the intra-broker replica movement edge cases described in the KIP. >> > I've updated the KIP and also filed a tiny PR with your suggestion, >> > except I'm keeping the description of LOST more broad than just >> > scoped to the migration. >> > >> > https://github.com/apache/kafka/pull/14517 >> > >> > >> > The KIP already proposes that the broker does not want to unfence >> > until it has confirmed all the assignments are communicated >> > with the controller. And you're right about the interaction >> > with ReplicaManager, we definitely don't want RPCs coming >> > out of there. My intention is to introduce a new manager, as you >> > suggest, with its own event loop, that batches and prioritizes >> > assignment and dir failure events, called DirectoryEventManager. >> > There's already an open PR, perhaps you could have a look? >> > >> > KAFKA-15357: Aggregate and propagate assignments and logdir failures >> > https://github.com/apache/kafka/pull/14369 >> > >> > >> > > With regard to the failure detection "gap" during hybrid mode: the >> > > kraft controller sends a full LeaderAndIsrRequest to the brokers >> > > that are in hybrid mode, right? And there is a per-partition >> > > response as well. Right now, we don't pay attention to the error >> > > codes sent back in the response. But we could. Any replica with an >> > > error could be transitioned from MIGRATING -> LOST, right? That >> > > would close the failure detection gap. >> > >> > Almost. The missing bit is that the controller would also need to >> > watch the /log_dir_event_notification znode, and on any change >> > read the broker ID in the value and send a full LeaderAndIsrRequest >> > to the respective broker. Without this watch, it might be a very long >> > time between a dir failing and the controller sending >> > LeaderAndIsrRequests covering every partition hosted in the failed dir. >> > >> > Watching the znode is a read-only action, and it doesn't seem like >> > a big thing to add this watch plus the error handling. >> > Maybe extending the ZK Controller compatibility functionality in this >> > way isn't such a bad option after all? >> > >> > >> > Best, >> > >> > -- >> > Igor >> > >>