Elena, Did you provision a KRaft controller quorum before restarting the brokers?
If you don't mind, could you create a JIRA and attach the config files used for the brokers before/after the migration along with the controller configs? Please include the sequence of steps you took in the JIRA as well. Here is our JIRA project: https://issues.apache.org/jira/projects/KAFKA/issues, and general info on filing issues https://cwiki.apache.org/confluence/display/KAFKA/Reporting+Issues+in+Apache+Kafka Thanks! David On Tue, May 16, 2023 at 2:54 AM Elena Batranu <[email protected]> wrote: > Hello! I have a problem with my kafka configuration (kafka 3.4). I'm > trying to migrate from zookeeper to kraft. I have 3 brokers, on one of them > was also the zookeeper. I want to restart my brokers one by one, without > having downtime. I started with putting the configuration with also kraft > and zookeeper, to do the migration gradually. In this step my nodes are up, > but i have the following error in the logs from kraft. > [2023-05-16 06:35:19,485] DEBUG [BrokerToControllerChannelManager broker=0 > name=quorum]: No controller provided, retrying after backoff > (kafka.server.BrokerToControllerRequestThread)[2023-05-16 06:35:19,585] > DEBUG [BrokerToControllerChannelManager broker=0 name=quorum]: Controller > isn't cached, looking for local metadata changes > (kafka.server.BrokerToControllerRequestThread)[2023-05-16 06:35:19,586] > DEBUG [BrokerToControllerChannelManager broker=0 name=quorum]: No > controller provided, retrying after backoff > (kafka.server.BrokerToControllerRequestThread)[2023-05-16 06:35:19,624] > INFO [RaftManager nodeId=0] Node 3002 disconnected. > (org.apache.kafka.clients.NetworkClient)[2023-05-16 06:35:19,624] WARN > [RaftManager nodeId=0] Connection to node 3002 (/192.168.25.172:9093) > could not be established. Broker may not be available. > (org.apache.kafka.clients.NetworkClient)[2023-05-16 06:35:19,642] INFO > [RaftManager nodeId=0] Node 3001 disconnected. > (org.apache.kafka.clients.NetworkClient)[2023-05-16 06:35:19,642] WARN > [RaftManager nodeId=0] Connection to node 3001 (/192.168.25.232:9093) > could not be established. Broker may not be available. > (org.apache.kafka.clients.NetworkClient)[2023-05-16 06:35:19,643] INFO > [RaftManager nodeId=0] Node 3000 disconnected. > (org.apache.kafka.clients.NetworkClient)[2023-05-16 06:35:19,643] WARN > [RaftManager nodeId=0] Connection to node 3000 (/192.168.25.146:9093) > could not be established. Broker may not be available. > (org.apache.kafka.clients.NetworkClient) > I configured the controller on each broker, the file looks like this: > # Licensed to the Apache Software Foundation (ASF) under one or more# > contributor license agreements. See the NOTICE file distributed with# this > work for additional information regarding copyright ownership.# The ASF > licenses this file to You under the Apache License, Version 2.0# (the > "License"); you may not use this file except in compliance with# the > License. You may obtain a copy of the License at## > http://www.apache.org/licenses/LICENSE-2.0## Unless required by > applicable law or agreed to in writing, software# distributed under the > License is distributed on an "AS IS" BASIS,# WITHOUT WARRANTIES OR > CONDITIONS OF ANY KIND, either express or implied.# See the License for the > specific language governing permissions and# limitations under the License. > ## This configuration file is intended for use in KRaft mode, where# > Apache ZooKeeper is not present. See config/kraft/README.md for details.# > ############################# Server Basics ############################# > # The role of this server. Setting this puts us in KRaft > modeprocess.roles=controller > # The node id associated with this instance's rolesnode.id=3000# The > connect string for the controller > quorum#controller.quorum.voters=3000@localhost > :[email protected]:9093, > [email protected]:9093, > [email protected]:9093############################# > <http://[email protected]:9093#%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23> > Socket Server Settings ############################# > # The address the socket server listens on.# Note that only the controller > listeners are allowed here when `process.roles=controller`, and this > listener should be consistent with `controller.quorum.voters` value.# > FORMAT:# listeners = listener_name://host_name:port# EXAMPLE:# > listeners = PLAINTEXT://your.host.name:9092listeners=CONTROLLER://:9093 > # A comma-separated list of the names of the listeners used by the > controller.# This is required if running in KRaft > mode.controller.listener.names=CONTROLLER > # Maps listener names to security protocols, the default is for them to be > the same. See the config documentation for more > details#listener.security.protocol.map=PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL > # The number of threads that the server uses for receiving requests from > the network and sending responses to the networknum.network.threads=3 > # The number of threads that the server uses for processing requests, > which may include disk I/Onum.io.threads=8 > # The send buffer (SO_SNDBUF) used by the socket > serversocket.send.buffer.bytes=102400 > # The receive buffer (SO_RCVBUF) used by the socket > serversocket.receive.buffer.bytes=102400 > # The maximum size of a request that the socket server will accept > (protection against OOM)socket.request.max.bytes=104857600 > > ############################# Log Basics ############################# > # A comma separated list of directories under which to store log > fileslog.dirs=/data/kraft-controller-logs > # The default number of log partitions per topic. More partitions allow > greater# parallelism for consumption, but this will also result in more > files across# the brokers.num.partitions=1 > # The number of threads per data directory to be used for log recovery at > startup and flushing at shutdown.# This value is recommended to be > increased for installations with data dirs located in RAID > array.num.recovery.threads.per.data.dir=1 > ############################# Internal Topic Settings > ############################## The replication factor for the group > metadata internal topics "__consumer_offsets" and "__transaction_state"# > For anything other than development testing, a value greater than 1 is > recommended to ensure availability such as > 3.offsets.topic.replication.factor=1transaction.state.log.replication.factor=1transaction.state.log.min.isr=1 > ############################# Log Flush Policy > ############################# > # Messages are immediately written to the filesystem but by default we > only fsync() to sync# the OS cache lazily. The following configurations > control the flush of data to disk.# There are a few important trade-offs > here:# 1. Durability: Unflushed data may be lost if you are not using > replication.# 2. Latency: Very large flush intervals may lead to latency > spikes when the flush does occur as there will be a lot of data to flush.# > 3. Throughput: The flush is generally the most expensive operation, and a > small flush interval may lead to excessive seeks.# The settings below allow > one to configure the flush policy to flush data after a period of time or# > every N messages (or both). This can be done globally and overridden on a > per-topic basis. > # The number of messages to accept before forcing a flush of data to > disk#log.flush.interval.messages=10000 > # The maximum amount of time a message can sit in a log before we force a > flush#log.flush.interval.ms=1000 > ############################# Log Retention Policy > ############################# > # The following configurations control the disposal of log segments. The > policy can# be set to delete segments after a period of time, or after a > given size has accumulated.# A segment will be deleted whenever *either* of > these criteria are met. Deletion always happens# from the end of the log. > # The minimum age of a log file to be eligible for deletion due to > agelog.retention.hours=168 > # A size-based retention policy for logs. Segments are pruned from the log > unless the remaining# segments drop below log.retention.bytes. Functions > independently of log.retention.hours.#log.retention.bytes=1073741824 > # The maximum size of a log segment file. When this size is reached a new > log segment will be created.log.segment.bytes=1073741824 > # The interval at which log segments are checked to see if they can be > deleted according# to the retention > policieslog.retention.check.interval.ms=300000# Enable the > migrationzookeeper.metadata.migration.enable=true > # ZooKeeper client configurationzookeeper.connect=localhost:2181 > > > ############################################################################################### > Also this is my setup for the server.properties file: > # Licensed to the Apache Software Foundation (ASF) under one or more# > contributor license agreements. See the NOTICE file distributed with# this > work for additional information regarding copyright ownership.# The ASF > licenses this file to You under the Apache License, Version 2.0# (the > "License"); you may not use this file except in compliance with# the > License. You may obtain a copy of the License at## > http://www.apache.org/licenses/LICENSE-2.0## Unless required by > applicable law or agreed to in writing, software# distributed under the > License is distributed on an "AS IS" BASIS,# WITHOUT WARRANTIES OR > CONDITIONS OF ANY KIND, either express or implied.# See the License for the > specific language governing permissions and# limitations under the License. > ## This configuration file is intended for use in KRaft mode, where# > Apache ZooKeeper is not present. See config/kraft/README.md for details.# > ############################# Server Basics ############################# > # The role of this server. Setting this puts us in KRaft > mode#process.roles=broker,controller#process.roles=broker# The node id > associated with this instance's roles#node.id=0broker.id=0 > # The connect string for the controller quorum#controller.quorum.voters= > [email protected]:[email protected]:9093, > [email protected]:9093, > [email protected]:9093############################# > <http://[email protected]:9093#%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23> > Socket Server Settings ############################# > # The address the socket server listens on.# Combined nodes (i.e. those > with `process.roles=broker,controller`) must list the controller listener > here at a minimum.# If the broker listener is not defined, the default > listener will use a host name that is equal to the value of > java.net.InetAddress.getCanonicalHostName(),# with PLAINTEXT listener name, > and port 9092.# FORMAT:# listeners = listener_name://host_name:port# > EXAMPLE:# listeners = PLAINTEXT://your.host.name > :9092listeners=PLAINTEXT://192.168.25.146:9092 > > # Name of listener used for communication between brokers.# > inter.broker.listener.name=PLAINTEXT > # Listener name, hostname and port the broker will advertise to clients.# > If not set, it uses the value for > "listeners".advertised.listeners=PLAINTEXT://192.168.25.146:9092# A > comma-separated list of the names of the listeners used by the controller.# > If no explicit mapping set in `listener.security.protocol.map`, default > will be using PLAINTEXT protocol# This is required if running in KRaft > mode.controller.listener.names=CONTROLLER > # Maps listener names to security protocols, the default is for them to be > the same. See the config documentation for more > details#listener.security.protocol.map=CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSLlistener.security.protocol.map=PLAINTEXT:PLAINTEXT,CONTROLLER:PLAINTEXT# > The number of threads that the server uses for receiving requests from the > network and sending responses to the networknum.network.threads=3 > # The number of threads that the server uses for processing requests, > which may include disk I/Onum.io.threads=8 > # The send buffer (SO_SNDBUF) used by the socket > serversocket.send.buffer.bytes=102400 > # The receive buffer (SO_RCVBUF) used by the socket > serversocket.receive.buffer.bytes=102400 > # The maximum size of a request that the socket server will accept > (protection against OOM)socket.request.max.bytes=104857600 > > ############################# Log Basics ############################# > # A comma separated list of directories under which to store log > fileslog.dirs=/data/kraft > # The default number of log partitions per topic. More partitions allow > greater# parallelism for consumption, but this will also result in more > files across# the brokers.num.partitions=1 > # The number of threads per data directory to be used for log recovery at > startup and flushing at shutdown.# This value is recommended to be > increased for installations with data dirs located in RAID > array.num.recovery.threads.per.data.dir=1 > ############################# Internal Topic Settings > ############################## The replication factor for the group > metadata internal topics "__consumer_offsets" and "__transaction_state"# > For anything other than development testing, a value greater than 1 is > recommended to ensure availability such as > 3.offsets.topic.replication.factor=1transaction.state.log.replication.factor=1transaction.state.log.min.isr=1 > ############################# Log Flush Policy > ############################# > # Messages are immediately written to the filesystem but by default we > only fsync() to sync# the OS cache lazily. The following configurations > control the flush of data to disk.# There are a few important trade-offs > here:# 1. Durability: Unflushed data may be lost if you are not using > replication.# 2. Latency: Very large flush intervals may lead to latency > spikes when the flush does occur as there will be a lot of data to flush.# > 3. Throughput: The flush is generally the most expensive operation, and a > small flush interval may lead to excessive seeks.# The settings below allow > one to configure the flush policy to flush data after a period of time or# > every N messages (or both). This can be done globally and overridden on a > per-topic basis. > # The number of messages to accept before forcing a flush of data to > disk#log.flush.interval.messages=10000 > # The maximum amount of time a message can sit in a log before we force a > flush#log.flush.interval.ms=1000 > ############################# Log Retention Policy > ############################# > # The following configurations control the disposal of log segments. The > policy can# be set to delete segments after a period of time, or after a > given size has accumulated.# A segment will be deleted whenever *either* of > these criteria are met. Deletion always happens# from the end of the log. > # The minimum age of a log file to be eligible for deletion due to > agelog.retention.hours=168 > # A size-based retention policy for logs. Segments are pruned from the log > unless the remaining# segments drop below log.retention.bytes. Functions > independently of log.retention.hours.#log.retention.bytes=1073741824 > # The maximum size of a log segment file. When this size is reached a new > log segment will be created.log.segment.bytes=1073741824 > # The interval at which log segments are checked to see if they can be > deleted according# to the retention > policieslog.retention.check.interval.ms > =300000zookeeper.connect=192.168.25.146:2181 > zookeeper.metadata.migration.enable=trueinter.broker.protocol.version=3.4 > ########################################################################## > i tried to do the next steps, to comment the lines related with zookeeper > (on one of my broker)zookeeper.connect=192.168.25.146:2181 > zookeeper.metadata.migration.enable=trueinter.broker.protocol.version=3.4 > and put this in place of broker.idprocess.roles=broker > node.id=0 > > but after this kafka isn't working anymore. All my brokers are on the same > cluster, so i don't think is a problem with the connection between them. I > think i omitted something in the configurations files. > I want to fully migrate to kraft. Please take a look and tell me if you > have any suggestions. -- -David
