Re: Migration from zookeeper to kraft not working

David Arthur Thu, 18 May 2023 10:36:26 -0700

Elena,

Did you provision a KRaft controller quorum before restarting the brokers?


If you don't mind, could you create a JIRA and attach the config files used
for the brokers before/after the migration along with the controller
configs? Please include the sequence of steps you took in the JIRA as well.

Here is our JIRA project:
https://issues.apache.org/jira/projects/KAFKA/issues, and general info on
filing issues
https://cwiki.apache.org/confluence/display/KAFKA/Reporting+Issues+in+Apache+Kafka

Thanks!
David



On Tue, May 16, 2023 at 2:54 AM Elena Batranu
<batranuel...@yahoo.com.invalid> wrote:

> Hello! I have a problem with my kafka configuration (kafka 3.4). I'm
> trying to migrate from zookeeper to kraft. I have 3 brokers, on one of them
> was also the zookeeper. I want to restart my brokers one by one, without
> having downtime. I started with putting the configuration with also kraft
> and zookeeper, to do the migration gradually. In this step my nodes are up,
> but i have the following error in the logs from kraft.
> [2023-05-16 06:35:19,485] DEBUG [BrokerToControllerChannelManager broker=0
> name=quorum]: No controller provided, retrying after backoff
> (kafka.server.BrokerToControllerRequestThread)[2023-05-16 06:35:19,585]
> DEBUG [BrokerToControllerChannelManager broker=0 name=quorum]: Controller
> isn't cached, looking for local metadata changes
> (kafka.server.BrokerToControllerRequestThread)[2023-05-16 06:35:19,586]
> DEBUG [BrokerToControllerChannelManager broker=0 name=quorum]: No
> controller provided, retrying after backoff
> (kafka.server.BrokerToControllerRequestThread)[2023-05-16 06:35:19,624]
> INFO [RaftManager nodeId=0] Node 3002 disconnected.
> (org.apache.kafka.clients.NetworkClient)[2023-05-16 06:35:19,624] WARN
> [RaftManager nodeId=0] Connection to node 3002 (/192.168.25.172:9093)
> could not be established. Broker may not be available.
> (org.apache.kafka.clients.NetworkClient)[2023-05-16 06:35:19,642] INFO
> [RaftManager nodeId=0] Node 3001 disconnected.
> (org.apache.kafka.clients.NetworkClient)[2023-05-16 06:35:19,642] WARN
> [RaftManager nodeId=0] Connection to node 3001 (/192.168.25.232:9093)
> could not be established. Broker may not be available.
> (org.apache.kafka.clients.NetworkClient)[2023-05-16 06:35:19,643] INFO
> [RaftManager nodeId=0] Node 3000 disconnected.
> (org.apache.kafka.clients.NetworkClient)[2023-05-16 06:35:19,643] WARN
> [RaftManager nodeId=0] Connection to node 3000 (/192.168.25.146:9093)
> could not be established. Broker may not be available.
> (org.apache.kafka.clients.NetworkClient)
> I configured the controller on each broker, the file looks like this:
> # Licensed to the Apache Software Foundation (ASF) under one or more#
> contributor license agreements.  See the NOTICE file distributed with# this
> work for additional information regarding copyright ownership.# The ASF
> licenses this file to You under the Apache License, Version 2.0# (the
> "License"); you may not use this file except in compliance with# the
> License.  You may obtain a copy of the License at##
> http://www.apache.org/licenses/LICENSE-2.0## Unless required by
> applicable law or agreed to in writing, software# distributed under the
> License is distributed on an "AS IS" BASIS,# WITHOUT WARRANTIES OR
> CONDITIONS OF ANY KIND, either express or implied.# See the License for the
> specific language governing permissions and# limitations under the License.
> ## This configuration file is intended for use in KRaft mode, where#
> Apache ZooKeeper is not present.  See config/kraft/README.md for details.#
> ############################# Server Basics #############################
> # The role of this server. Setting this puts us in KRaft
> modeprocess.roles=controller
> # The node id associated with this instance's rolesnode.id=3000# The
> connect string for the controller
> quorum#controller.quorum.voters=3000@localhost
> :9093controller.quorum.voters=3000@192.168.25.146:9093,
> 3001@192.168.25.232:9093,
> 3002@192.168.25.172:9093#############################
> <http://3002@192.168.25.172:9093#%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23>
> Socket Server Settings #############################
> # The address the socket server listens on.# Note that only the controller
> listeners are allowed here when `process.roles=controller`, and this
> listener should be consistent with `controller.quorum.voters` value.#
>  FORMAT:#     listeners = listener_name://host_name:port#   EXAMPLE:#
>  listeners = PLAINTEXT://your.host.name:9092listeners=CONTROLLER://:9093
> # A comma-separated list of the names of the listeners used by the
> controller.# This is required if running in KRaft
> mode.controller.listener.names=CONTROLLER
> # Maps listener names to security protocols, the default is for them to be
> the same. See the config documentation for more
> details#listener.security.protocol.map=PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL
> # The number of threads that the server uses for receiving requests from
> the network and sending responses to the networknum.network.threads=3
> # The number of threads that the server uses for processing requests,
> which may include disk I/Onum.io.threads=8
> # The send buffer (SO_SNDBUF) used by the socket
> serversocket.send.buffer.bytes=102400
> # The receive buffer (SO_RCVBUF) used by the socket
> serversocket.receive.buffer.bytes=102400
> # The maximum size of a request that the socket server will accept
> (protection against OOM)socket.request.max.bytes=104857600
>
> ############################# Log Basics #############################
> # A comma separated list of directories under which to store log
> fileslog.dirs=/data/kraft-controller-logs
> # The default number of log partitions per topic. More partitions allow
> greater# parallelism for consumption, but this will also result in more
> files across# the brokers.num.partitions=1
> # The number of threads per data directory to be used for log recovery at
> startup and flushing at shutdown.# This value is recommended to be
> increased for installations with data dirs located in RAID
> array.num.recovery.threads.per.data.dir=1
> ############################# Internal Topic Settings
> ############################## The replication factor for the group
> metadata internal topics "__consumer_offsets" and "__transaction_state"#
> For anything other than development testing, a value greater than 1 is
> recommended to ensure availability such as
> 3.offsets.topic.replication.factor=1transaction.state.log.replication.factor=1transaction.state.log.min.isr=1
> ############################# Log Flush Policy
> #############################
> # Messages are immediately written to the filesystem but by default we
> only fsync() to sync# the OS cache lazily. The following configurations
> control the flush of data to disk.# There are a few important trade-offs
> here:#    1. Durability: Unflushed data may be lost if you are not using
> replication.#    2. Latency: Very large flush intervals may lead to latency
> spikes when the flush does occur as there will be a lot of data to flush.#
>   3. Throughput: The flush is generally the most expensive operation, and a
> small flush interval may lead to excessive seeks.# The settings below allow
> one to configure the flush policy to flush data after a period of time or#
> every N messages (or both). This can be done globally and overridden on a
> per-topic basis.
> # The number of messages to accept before forcing a flush of data to
> disk#log.flush.interval.messages=10000
> # The maximum amount of time a message can sit in a log before we force a
> flush#log.flush.interval.ms=1000
> ############################# Log Retention Policy
> #############################
> # The following configurations control the disposal of log segments. The
> policy can# be set to delete segments after a period of time, or after a
> given size has accumulated.# A segment will be deleted whenever *either* of
> these criteria are met. Deletion always happens# from the end of the log.
> # The minimum age of a log file to be eligible for deletion due to
> agelog.retention.hours=168
> # A size-based retention policy for logs. Segments are pruned from the log
> unless the remaining# segments drop below log.retention.bytes. Functions
> independently of log.retention.hours.#log.retention.bytes=1073741824
> # The maximum size of a log segment file. When this size is reached a new
> log segment will be created.log.segment.bytes=1073741824
> # The interval at which log segments are checked to see if they can be
> deleted according# to the retention
> policieslog.retention.check.interval.ms=300000# Enable the
> migrationzookeeper.metadata.migration.enable=true
> # ZooKeeper client configurationzookeeper.connect=localhost:2181
>
>
> ###############################################################################################
> Also this is my setup for the server.properties file:
> # Licensed to the Apache Software Foundation (ASF) under one or more#
> contributor license agreements.  See the NOTICE file distributed with# this
> work for additional information regarding copyright ownership.# The ASF
> licenses this file to You under the Apache License, Version 2.0# (the
> "License"); you may not use this file except in compliance with# the
> License.  You may obtain a copy of the License at##
> http://www.apache.org/licenses/LICENSE-2.0## Unless required by
> applicable law or agreed to in writing, software# distributed under the
> License is distributed on an "AS IS" BASIS,# WITHOUT WARRANTIES OR
> CONDITIONS OF ANY KIND, either express or implied.# See the License for the
> specific language governing permissions and# limitations under the License.
> ## This configuration file is intended for use in KRaft mode, where#
> Apache ZooKeeper is not present.  See config/kraft/README.md for details.#
> ############################# Server Basics #############################
> # The role of this server. Setting this puts us in KRaft
> mode#process.roles=broker,controller#process.roles=broker# The node id
> associated with this instance's roles#node.id=0broker.id=0
> # The connect string for the controller quorum#controller.quorum.voters=
> 3000@192.168.25.146:9093controller.quorum.voters=3000@192.168.25.146:9093,
> 3001@192.168.25.232:9093,
> 3002@192.168.25.172:9093#############################
> <http://3002@192.168.25.172:9093#%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23>
> Socket Server Settings #############################
> # The address the socket server listens on.# Combined nodes (i.e. those
> with `process.roles=broker,controller`) must list the controller listener
> here at a minimum.# If the broker listener is not defined, the default
> listener will use a host name that is equal to the value of
> java.net.InetAddress.getCanonicalHostName(),# with PLAINTEXT listener name,
> and port 9092.#   FORMAT:#     listeners = listener_name://host_name:port#
>  EXAMPLE:#     listeners = PLAINTEXT://your.host.name
> :9092listeners=PLAINTEXT://192.168.25.146:9092
>
> # Name of listener used for communication between brokers.#
> inter.broker.listener.name=PLAINTEXT
> # Listener name, hostname and port the broker will advertise to clients.#
> If not set, it uses the value for
> "listeners".advertised.listeners=PLAINTEXT://192.168.25.146:9092# A
> comma-separated list of the names of the listeners used by the controller.#
> If no explicit mapping set in `listener.security.protocol.map`, default
> will be using PLAINTEXT protocol# This is required if running in KRaft
> mode.controller.listener.names=CONTROLLER
> # Maps listener names to security protocols, the default is for them to be
> the same. See the config documentation for more
> details#listener.security.protocol.map=CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSLlistener.security.protocol.map=PLAINTEXT:PLAINTEXT,CONTROLLER:PLAINTEXT#
> The number of threads that the server uses for receiving requests from the
> network and sending responses to the networknum.network.threads=3
> # The number of threads that the server uses for processing requests,
> which may include disk I/Onum.io.threads=8
> # The send buffer (SO_SNDBUF) used by the socket
> serversocket.send.buffer.bytes=102400
> # The receive buffer (SO_RCVBUF) used by the socket
> serversocket.receive.buffer.bytes=102400
> # The maximum size of a request that the socket server will accept
> (protection against OOM)socket.request.max.bytes=104857600
>
> ############################# Log Basics #############################
> # A comma separated list of directories under which to store log
> fileslog.dirs=/data/kraft
> # The default number of log partitions per topic. More partitions allow
> greater# parallelism for consumption, but this will also result in more
> files across# the brokers.num.partitions=1
> # The number of threads per data directory to be used for log recovery at
> startup and flushing at shutdown.# This value is recommended to be
> increased for installations with data dirs located in RAID
> array.num.recovery.threads.per.data.dir=1
> ############################# Internal Topic Settings
> ############################## The replication factor for the group
> metadata internal topics "__consumer_offsets" and "__transaction_state"#
> For anything other than development testing, a value greater than 1 is
> recommended to ensure availability such as
> 3.offsets.topic.replication.factor=1transaction.state.log.replication.factor=1transaction.state.log.min.isr=1
> ############################# Log Flush Policy
> #############################
> # Messages are immediately written to the filesystem but by default we
> only fsync() to sync# the OS cache lazily. The following configurations
> control the flush of data to disk.# There are a few important trade-offs
> here:#    1. Durability: Unflushed data may be lost if you are not using
> replication.#    2. Latency: Very large flush intervals may lead to latency
> spikes when the flush does occur as there will be a lot of data to flush.#
>   3. Throughput: The flush is generally the most expensive operation, and a
> small flush interval may lead to excessive seeks.# The settings below allow
> one to configure the flush policy to flush data after a period of time or#
> every N messages (or both). This can be done globally and overridden on a
> per-topic basis.
> # The number of messages to accept before forcing a flush of data to
> disk#log.flush.interval.messages=10000
> # The maximum amount of time a message can sit in a log before we force a
> flush#log.flush.interval.ms=1000
> ############################# Log Retention Policy
> #############################
> # The following configurations control the disposal of log segments. The
> policy can# be set to delete segments after a period of time, or after a
> given size has accumulated.# A segment will be deleted whenever *either* of
> these criteria are met. Deletion always happens# from the end of the log.
> # The minimum age of a log file to be eligible for deletion due to
> agelog.retention.hours=168
> # A size-based retention policy for logs. Segments are pruned from the log
> unless the remaining# segments drop below log.retention.bytes. Functions
> independently of log.retention.hours.#log.retention.bytes=1073741824
> # The maximum size of a log segment file. When this size is reached a new
> log segment will be created.log.segment.bytes=1073741824
> # The interval at which log segments are checked to see if they can be
> deleted according# to the retention
> policieslog.retention.check.interval.ms
> =300000zookeeper.connect=192.168.25.146:2181
> zookeeper.metadata.migration.enable=trueinter.broker.protocol.version=3.4
> ##########################################################################
> i tried to do the next steps, to comment the lines related with zookeeper
> (on one of my broker)zookeeper.connect=192.168.25.146:2181
> zookeeper.metadata.migration.enable=trueinter.broker.protocol.version=3.4
> and put this in place of broker.idprocess.roles=broker
> node.id=0
>
> but after this kafka isn't working anymore. All my brokers are on the same
> cluster, so i don't think is a problem with the connection between them. I
> think i omitted something in the configurations files.
> I want to fully migrate to kraft. Please take a look and tell me if you
> have any suggestions.



-- 
-David

Re: Migration from zookeeper to kraft not working

Reply via email to