[jira] [Updated] (KAFKA-16667) KRaftMigrationDriver gets stuck after successive failovers
[ https://issues.apache.org/jira/browse/KAFKA-16667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josep Prat updated KAFKA-16667: --- Fix Version/s: 3.9.0 (was: 3.8.0) > KRaftMigrationDriver gets stuck after successive failovers > -- > > Key: KAFKA-16667 > URL: https://issues.apache.org/jira/browse/KAFKA-16667 > Project: Kafka > Issue Type: Bug > Components: controller, migration >Reporter: David Arthur >Assignee: David Arthur >Priority: Major > Fix For: 3.9.0 > > > This is a continuation of KAFKA-16171. > It turns out that the active KRaftMigrationDriver can get a stale read from > ZK after becoming the active controller in ZK (i.e., writing to > "/controller"). > Because ZooKeeper only offers linearizability on writes to a given ZNode, it > is possible that we get a stale read on the "/migration" ZNode after writing > to "/controller" (and "/controller_epoch") when becoming active. > > The history looks like this: > # Node B becomes leader in the Raft layer. KRaftLeaderEvents are enqueued on > all KRaftMigrationDriver > # Node A writes some state to ZK, updates "/migration", and checks > "/controller_epoch" in one transaction. This happens before B claims > controller leadership in ZK. The "/migration" state is updated from X to Y > # Node B claims leadership by updating "/controller" and > "/controller_epoch". Leader B reads "/migration" state X > # Node A tries to write some state, fails on "/controller_epoch" check op. > # Node A processes new leader and becomes inactive > > This does not violate consistency guarantees made by ZooKeeper. > > > Write operations in ZooKeeper are {_}linearizable{_}. In other words, each > > {{write}} will appear to take effect atomically at some point between when > > the client issues the request and receives the corresponding response. > and > > Read operations in ZooKeeper are _not linearizable_ since they can return > > potentially stale data. This is because a {{read}} in ZooKeeper is not a > > quorum operation and a server will respond immediately to a client that is > > performing a {{{}read{}}}. > > --- > > The impact of this stale read is the same as KAFKA-16171. The > KRaftMigrationDriver never gets past SYNC_KRAFT_TO_ZK because it has a stale > zkVersion for the "/migration" ZNode. The result is brokers never learn about > the new controller and cannot update any partition state. > The workaround for this bug is to re-elect the controller by shutting down > the active KRaft controller. > This bug was found during a migration where the KRaft controller was rapidly > failing over due to an excess of metadata. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16667) KRaftMigrationDriver gets stuck after successive failovers
[ https://issues.apache.org/jira/browse/KAFKA-16667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Soarez updated KAFKA-16667: Fix Version/s: (was: 3.7.1) > KRaftMigrationDriver gets stuck after successive failovers > -- > > Key: KAFKA-16667 > URL: https://issues.apache.org/jira/browse/KAFKA-16667 > Project: Kafka > Issue Type: Bug > Components: controller, migration >Reporter: David Arthur >Assignee: David Arthur >Priority: Major > Fix For: 3.8.0 > > > This is a continuation of KAFKA-16171. > It turns out that the active KRaftMigrationDriver can get a stale read from > ZK after becoming the active controller in ZK (i.e., writing to > "/controller"). > Because ZooKeeper only offers linearizability on writes to a given ZNode, it > is possible that we get a stale read on the "/migration" ZNode after writing > to "/controller" (and "/controller_epoch") when becoming active. > > The history looks like this: > # Node B becomes leader in the Raft layer. KRaftLeaderEvents are enqueued on > all KRaftMigrationDriver > # Node A writes some state to ZK, updates "/migration", and checks > "/controller_epoch" in one transaction. This happens before B claims > controller leadership in ZK. The "/migration" state is updated from X to Y > # Node B claims leadership by updating "/controller" and > "/controller_epoch". Leader B reads "/migration" state X > # Node A tries to write some state, fails on "/controller_epoch" check op. > # Node A processes new leader and becomes inactive > > This does not violate consistency guarantees made by ZooKeeper. > > > Write operations in ZooKeeper are {_}linearizable{_}. In other words, each > > {{write}} will appear to take effect atomically at some point between when > > the client issues the request and receives the corresponding response. > and > > Read operations in ZooKeeper are _not linearizable_ since they can return > > potentially stale data. This is because a {{read}} in ZooKeeper is not a > > quorum operation and a server will respond immediately to a client that is > > performing a {{{}read{}}}. > > --- > > The impact of this stale read is the same as KAFKA-16171. The > KRaftMigrationDriver never gets past SYNC_KRAFT_TO_ZK because it has a stale > zkVersion for the "/migration" ZNode. The result is brokers never learn about > the new controller and cannot update any partition state. > The workaround for this bug is to re-elect the controller by shutting down > the active KRaft controller. > This bug was found during a migration where the KRaft controller was rapidly > failing over due to an excess of metadata. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16667) KRaftMigrationDriver gets stuck after successive failovers
[ https://issues.apache.org/jira/browse/KAFKA-16667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Arthur updated KAFKA-16667: - Fix Version/s: 3.8.0 3.7.1 > KRaftMigrationDriver gets stuck after successive failovers > -- > > Key: KAFKA-16667 > URL: https://issues.apache.org/jira/browse/KAFKA-16667 > Project: Kafka > Issue Type: Bug > Components: controller, migration >Reporter: David Arthur >Assignee: David Arthur >Priority: Major > Fix For: 3.8.0, 3.7.1 > > > This is a continuation of KAFKA-16171. > It turns out that the active KRaftMigrationDriver can get a stale read from > ZK after becoming the active controller in ZK (i.e., writing to > "/controller"). > Because ZooKeeper only offers linearizability on writes to a given ZNode, it > is possible that we get a stale read on the "/migration" ZNode after writing > to "/controller" (and "/controller_epoch") when becoming active. > > The history looks like this: > # Node B becomes leader in the Raft layer. KRaftLeaderEvents are enqueued on > all KRaftMigrationDriver > # Node A writes some state to ZK, updates "/migration", and checks > "/controller_epoch" in one transaction. This happens before B claims > controller leadership in ZK. The "/migration" state is updated from X to Y > # Node B claims leadership by updating "/controller" and > "/controller_epoch". Leader B reads "/migration" state X > # Node A tries to write some state, fails on "/controller_epoch" check op. > # Node A processes new leader and becomes inactive > > This does not violate consistency guarantees made by ZooKeeper. > > > Write operations in ZooKeeper are {_}linearizable{_}. In other words, each > > {{write}} will appear to take effect atomically at some point between when > > the client issues the request and receives the corresponding response. > and > > Read operations in ZooKeeper are _not linearizable_ since they can return > > potentially stale data. This is because a {{read}} in ZooKeeper is not a > > quorum operation and a server will respond immediately to a client that is > > performing a {{{}read{}}}. > > --- > > The impact of this stale read is the same as KAFKA-16171. The > KRaftMigrationDriver never gets past SYNC_KRAFT_TO_ZK because it has a stale > zkVersion for the "/migration" ZNode. The result is brokers never learn about > the new controller and cannot update any partition state. > The workaround for this bug is to re-elect the controller by shutting down > the active KRaft controller. > This bug was found during a migration where the KRaft controller was rapidly > failing over due to an excess of metadata. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16667) KRaftMigrationDriver gets stuck after successive failovers
[ https://issues.apache.org/jira/browse/KAFKA-16667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Arthur updated KAFKA-16667: - Description: This is a continuation of KAFKA-16171. It turns out that the active KRaftMigrationDriver can get a stale read from ZK after becoming the active controller in ZK (i.e., writing to "/controller"). Because ZooKeeper only offers linearizability on writes to a given ZNode, it is possible that we get a stale read on the "/migration" ZNode after writing to "/controller" (and "/controller_epoch") when becoming active. The history looks like this: # Node B becomes leader in the Raft layer. KRaftLeaderEvents are enqueued on all KRaftMigrationDriver # Node A writes some state to ZK, updates "/migration", and checks "/controller_epoch" in one transaction. This happens before B claims controller leadership in ZK. The "/migration" state is updated from X to Y # Node B claims leadership by updating "/controller" and "/controller_epoch". Leader B reads "/migration" state X # Node A tries to write some state, fails on "/controller_epoch" check op. # Node A processes new leader and becomes inactive This does not violate consistency guarantees made by ZooKeeper. > Write operations in ZooKeeper are {_}linearizable{_}. In other words, each > {{write}} will appear to take effect atomically at some point between when > the client issues the request and receives the corresponding response. and > Read operations in ZooKeeper are _not linearizable_ since they can return > potentially stale data. This is because a {{read}} in ZooKeeper is not a > quorum operation and a server will respond immediately to a client that is > performing a {{{}read{}}}. --- The impact of this stale read is the same as KAFKA-16171. The KRaftMigrationDriver never gets past SYNC_KRAFT_TO_ZK because it has a stale zkVersion for the "/migration" ZNode. The result is brokers never learn about the new controller and cannot update any partition state. The workaround for this bug is to re-elect the controller by shutting down the active KRaft controller. This bug was found during a migration where the KRaft controller was rapidly failing over due to an excess of metadata. was: This is a continuation of KAFKA-16171. It turns out that the active KRaftMigrationDriver can get a stale read from ZK after becoming the active controller in ZK (i.e., writing to "/controller"). Because ZooKeeper only offers linearizability on writes to a given ZNode, it is possible that we get a stale read on the "/migration" ZNode after writing to "/controller" (and "/controller_epoch") when becoming active. The history looks like this: # Node B becomes leader in the Raft layer. KRaftLeaderEvents are enqueued on all KRaftMigrationDriver-s # Node A writes some state to ZK, updates "/migration", and checks "/controller_epoch" in one transaction. This happens before B claims controller leadership in ZK. The "/migration" state is updated from X to Y # Node B claims leadership by updating "/controller" and "/controller_epoch". Leader B reads "/migration" state X # Node A tries to write some state, fails on "/controller_epoch" check op. # Node A processes new leader and becomes inactive This does not violate consistency guarantees made by ZooKeeper. > Write operations in ZooKeeper are {_}linearizable{_}. In other words, each > {{write}} will appear to take effect atomically at some point between when > the client issues the request and receives the corresponding response. and > Read operations in ZooKeeper are _not linearizable_ since they can return > potentially stale data. This is because a {{read}} in ZooKeeper is not a > quorum operation and a server will respond immediately to a client that is > performing a {{{}read{}}}. --- The impact of this stale read is the same as KAFKA-16171. The KRaftMigrationDriver never gets past SYNC_KRAFT_TO_ZK because it has a stale zkVersion for the "/migration" ZNode. The result is brokers never learn about the new controller and cannot update any partition state. The workaround for this bug is to re-elect the controller by shutting down the active KRaft controller. This bug was found during a migration where the KRaft controller was rapidly failing over due to an excess of metadata. > KRaftMigrationDriver gets stuck after successive failovers > -- > > Key: KAFKA-16667 > URL: https://issues.apache.org/jira/browse/KAFKA-16667 > Project: Kafka > Issue Type: Bug > Components: controller, migration >Reporter: David Arthur >Assignee: David Arthur >Priority: Major > > This is a continuation of KAFKA-16171. > It turns out that the active KRaftMigrationDriver can get a stale read from > ZK af