[ https://issues.apache.org/jira/browse/KAFKA-14426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Greg Harris updated KAFKA-14426: -------------------------------- Description: Currently there are a number of limitations for Kraft, which are described as the motivation for the following open KIPs: * [https://cwiki.apache.org/confluence/display/KAFKA/KIP-853%3A+KRaft+Controller+Membership+Changes] * [https://cwiki.apache.org/confluence/display/KAFKA/KIP-856%3A+KRaft+Disk+Failure+Recovery] * [https://cwiki.apache.org/confluence/display/KAFKA/KIP-650%3A+Enhance+Kafkaesque+Raft+semantics#KIP650:EnhanceKafkaesqueRaftsemantics-Pre-vote] These limitations are: * No online method of resizing the controller quorum * No online method of recovering from controller disk loss * No support for heterogeneous voter lists in running controller nodes * When using a quorum size 3, there is no live-upgrade roll which is tolerant of a single unplanned machine failure. * When using a quorum size >3, there is a risk of zombie leaders causing extended outages without the pre-vote feature. These are significant enough concerns for operations of a Kraft-enabled cluster that they should be documented as official limitations in the ops documentation. Optionally, we may wish to provide or link to more detailed operations documentation about performing the offline-resize or offline-recovery stages, in addition to describing that such offline procedures are necessary. was: Currently there are a number of limitations for Kraft, which are described as the motivation for the following open KIPs: * [https://cwiki.apache.org/confluence/display/KAFKA/KIP-853%3A+KRaft+Controller+Membership+Changes] * [https://cwiki.apache.org/confluence/display/KAFKA/KIP-856%3A+KRaft+Disk+Failure+Recovery] * [https://cwiki.apache.org/confluence/display/KAFKA/KIP-650%3A+Enhance+Kafkaesque+Raft+semantics#KIP650:EnhanceKafkaesqueRaftsemantics-Pre-vote] These limitations are: * No online method of resizing the controller quorum * No online method of recovering from controller disk loss * No support for heterogeneous voter lists in running controller nodes * When using a quorum size 3, there is no live-upgrade roll which is tolerant of a single unplanned machine failure. * When using a quorum size >3, there is a risk of non-linearizable reads. These are significant enough concerns for operations of a Kraft-enabled cluster that they should be documented as official limitations in the ops documentation. Optionally, we may wish to provide or link to more detailed operations documentation about performing the offline-resize or offline-recovery stages, in addition to describing that such offline procedures are necessary. > Add documentation for Kraft limtations that have open KIPs > ---------------------------------------------------------- > > Key: KAFKA-14426 > URL: https://issues.apache.org/jira/browse/KAFKA-14426 > Project: Kafka > Issue Type: Task > Components: documentation, kraft > Reporter: Greg Harris > Priority: Major > > Currently there are a number of limitations for Kraft, which are described as > the motivation for the following open KIPs: > * > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-853%3A+KRaft+Controller+Membership+Changes] > * > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-856%3A+KRaft+Disk+Failure+Recovery] > * > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-650%3A+Enhance+Kafkaesque+Raft+semantics#KIP650:EnhanceKafkaesqueRaftsemantics-Pre-vote] > > These limitations are: > * No online method of resizing the controller quorum > * No online method of recovering from controller disk loss > * No support for heterogeneous voter lists in running controller nodes > * When using a quorum size 3, there is no live-upgrade roll which is > tolerant of a single unplanned machine failure. > * When using a quorum size >3, there is a risk of zombie leaders causing > extended outages without the pre-vote feature. > These are significant enough concerns for operations of a Kraft-enabled > cluster that they should be documented as official limitations in the ops > documentation. > Optionally, we may wish to provide or link to more detailed operations > documentation about performing the offline-resize or offline-recovery stages, > in addition to describing that such offline procedures are necessary. -- This message was sent by Atlassian Jira (v8.20.10#820010)