[ 
https://issues.apache.org/jira/browse/KAFKA-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roland Sommer updated KAFKA-20295:
----------------------------------
    Description: 
While upgrading our kafka clusters to new operating systems I switched to 
dynamic voter configuration and removed controller instances with 
{{/opt/kafka/bin/kafka-metadata-quorum.sh}} and the {{remove-controller}} 
subcommand. Inspecting the cluster with {{describe}} only shows the actual 
running nodes.
Now during the update to 4.2.0, the final metadata upgrade step complains about
{code:java}
Could not upgrade eligible.leader.replicas.version to 1. The update failed for 
all features since the following feature had an error: Invalid update version 
29 for feature metadata.version. Controller 351 only supports versions 
7-27{code}
with 351 being an ID of an already removed controller. Inspecting a snapshot 
with {{/opt/kafka/bin/kafka-metadata-shell.sh}} indeed shows all controller ids 
of already removed controllers:
{code:java}
>> ls image/cluster/controllers/
158 206 351 584 611 686 {code}
while other tools only show the expected nodes:
{code:java}
~$ /opt/kafka/bin/kafka-metadata-quorum.sh --bootstrap-controller 
localhost:9093 describe --replication --human-readable
NodeId DirectoryId LogEndOffset Lag LastFetchTimestamp LastCaughtUpTimestamp 
Status 
158 2gsvOvnT7urpZcA_-LUy5w 196823524 0 7 ms ago 8 ms ago Leader 
611 27Ii-xdAZ7ReQBLsvvJb0A 196823524 0 348 ms ago 348 ms ago Follower 
206 Q7X9o3XbKxk_3tz4T8torg 196823524 0 348 ms ago 348 ms ago Follower 
226 7n6aedUEuytkqhBnbe7ESw 196823524 0 348 ms ago 348 ms ago Observer 
181 tZ17VQ8cYpf7R-LyAQWf2w 196823524 0 349 ms ago 349 ms ago Observer 
299 P4qXt3K0G5Qg_7w_UdvaNA 196823524 0 348 ms ago 348 ms ago Observer 
290 bA0pqZFsUa45lRTB6bS4bg 196823524 0 348 ms ago 348 ms ago Observer 
293 Av_12222lURKVYVt-aNKOQ 196823524 0 348 ms ago 348 ms ago Observer 
485 glENIgkIng1MYDF8HxxoDQ 196823524 0 349 ms ago 350 ms ago Observer {code}
Grepping through {{bin/kafka-dump-log.sh --cluster-metadata-decoder}} only 
shows the expected three {{REGISTER_CONTROLLER_RECORD}} entries.

Is there any clear path for removing those stale nodes?

  was:
While upgrading our kafka clusters to new operating systems I switched to 
dynamic voter configuration and removed controller instances with 
{{/opt/kafka/bin/kafka-metadata-quorum.sh}} and the {{remove-controller}} 
subcommand. Inspecting the cluster with {{describe}} only shows the actual 
running nodes.
Now during the update to 4.2.0, the final metadata upgrade step complains about
{code:java}
Could not upgrade eligible.leader.replicas.version to 1. The update failed for 
all features since the following feature had an error: Invalid update version 
29 for feature metadata.version. Controller 351 only supports versions 
7-27{code}
with 351 being an ID of an already removed controller. Inspecting a snapshot 
with {{/opt/kafka/bin/kafka-metadata-shell.sh}} indeed shows all controller ids 
of already removed controllers:
{code:java}
>> ls image/cluster/controllers/
158 206 351 584 611 686 {code}
while other tools only show the expected nodes:
{code:java}
~$ /opt/kafka/bin/kafka-metadata-quorum.sh --bootstrap-controller 
localhost:9093 describe --replication --human-readable
NodeId DirectoryId LogEndOffset Lag LastFetchTimestamp LastCaughtUpTimestamp 
Status 
158 2gsvOvnT7urpZcA_-LUy5w 196823524 0 7 ms ago 8 ms ago Leader 
611 27Ii-xdAZ7ReQBLsvvJb0A 196823524 0 348 ms ago 348 ms ago Follower 
206 Q7X9o3XbKxk_3tz4T8torg 196823524 0 348 ms ago 348 ms ago Follower 
226 7n6aedUEuytkqhBnbe7ESw 196823524 0 348 ms ago 348 ms ago Observer 
181 tZ17VQ8cYpf7R-LyAQWf2w 196823524 0 349 ms ago 349 ms ago Observer 
299 P4qXt3K0G5Qg_7w_UdvaNA 196823524 0 348 ms ago 348 ms ago Observer 
290 bA0pqZFsUa45lRTB6bS4bg 196823524 0 348 ms ago 348 ms ago Observer 
293 Av_12222lURKVYVt-aNKOQ 196823524 0 348 ms ago 348 ms ago Observer 
485 glENIgkIng1MYDF8HxxoDQ 196823524 0 349 ms ago 350 ms ago Observer {code}
Is there any clear path for removing those stale nodes?


> Removed controllers still in metadata, blocking finalizing upgrade to 4.2.0
> ---------------------------------------------------------------------------
>
>                 Key: KAFKA-20295
>                 URL: https://issues.apache.org/jira/browse/KAFKA-20295
>             Project: Kafka
>          Issue Type: Bug
>          Components: controller
>         Environment: Kafka 4.2.0 (Scala 2.13) running on Debian Trixie 13.3
>            Reporter: Roland Sommer
>            Priority: Minor
>
> While upgrading our kafka clusters to new operating systems I switched to 
> dynamic voter configuration and removed controller instances with 
> {{/opt/kafka/bin/kafka-metadata-quorum.sh}} and the {{remove-controller}} 
> subcommand. Inspecting the cluster with {{describe}} only shows the actual 
> running nodes.
> Now during the update to 4.2.0, the final metadata upgrade step complains 
> about
> {code:java}
> Could not upgrade eligible.leader.replicas.version to 1. The update failed 
> for all features since the following feature had an error: Invalid update 
> version 29 for feature metadata.version. Controller 351 only supports 
> versions 7-27{code}
> with 351 being an ID of an already removed controller. Inspecting a snapshot 
> with {{/opt/kafka/bin/kafka-metadata-shell.sh}} indeed shows all controller 
> ids of already removed controllers:
> {code:java}
> >> ls image/cluster/controllers/
> 158 206 351 584 611 686 {code}
> while other tools only show the expected nodes:
> {code:java}
> ~$ /opt/kafka/bin/kafka-metadata-quorum.sh --bootstrap-controller 
> localhost:9093 describe --replication --human-readable
> NodeId DirectoryId LogEndOffset Lag LastFetchTimestamp LastCaughtUpTimestamp 
> Status 
> 158 2gsvOvnT7urpZcA_-LUy5w 196823524 0 7 ms ago 8 ms ago Leader 
> 611 27Ii-xdAZ7ReQBLsvvJb0A 196823524 0 348 ms ago 348 ms ago Follower 
> 206 Q7X9o3XbKxk_3tz4T8torg 196823524 0 348 ms ago 348 ms ago Follower 
> 226 7n6aedUEuytkqhBnbe7ESw 196823524 0 348 ms ago 348 ms ago Observer 
> 181 tZ17VQ8cYpf7R-LyAQWf2w 196823524 0 349 ms ago 349 ms ago Observer 
> 299 P4qXt3K0G5Qg_7w_UdvaNA 196823524 0 348 ms ago 348 ms ago Observer 
> 290 bA0pqZFsUa45lRTB6bS4bg 196823524 0 348 ms ago 348 ms ago Observer 
> 293 Av_12222lURKVYVt-aNKOQ 196823524 0 348 ms ago 348 ms ago Observer 
> 485 glENIgkIng1MYDF8HxxoDQ 196823524 0 349 ms ago 350 ms ago Observer {code}
> Grepping through {{bin/kafka-dump-log.sh --cluster-metadata-decoder}} only 
> shows the expected three {{REGISTER_CONTROLLER_RECORD}} entries.
> Is there any clear path for removing those stale nodes?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to