UnwritableMetadataException on Startup

2024-04-16 Thread tobias.bohn
Hello,

we currently have an "UnwritableMetadataException" error starting Kafka in
our test environment. Unfortunately, I don't know whether it is due to an
incorrect configuration or something similar. Kafka is running in a Docker
Container, but I don't think that's the cause. When I Google this error, I
unfortunately don't find any results. 

We are currently using Kafka version 3.7.0.

Does anyone have any tips or did anyone experience this error before? The
following is a log excerpt:

 

 

kafka  | kafka 06:17:49.01 INFO  ==>

kafka  | kafka 06:17:49.02 INFO  ==> Welcome to the Bitnami kafka container

kafka  | kafka 06:17:49.02 INFO  ==> Subscribe to project updates by
watching https://github.com/bitnami/containers

kafka  | kafka 06:17:49.02 INFO  ==> Submit issues and feature requests at
https://github.com/bitnami/containers/issues

kafka  | kafka 06:17:49.02 INFO  ==> Upgrade to Tanzu Application Catalog
for production environments to access custom-configured and pre-packaged
software components. Gain enhanced features, including Software Bill of
Materials (SBOM), CVE scan result reports, and VEX documents. To learn more,
visit https://bitnami.com/enterprise

kafka  | kafka 06:17:49.02 INFO  ==>

kafka  | kafka 06:17:49.03 INFO  ==> ** Starting Kafka setup **

kafka  | kafka 06:17:50.68 INFO  ==> Initializing Kafka...

kafka  | kafka 06:17:50.69 INFO  ==> No injected configuration files found,
creating default config files

kafka  | kafka 06:17:51.24 INFO  ==> Initializing KRaft storage metadata

kafka  | kafka 06:17:51.25 INFO  ==> Formatting storage directories to add
metadata...

kafka  |
metaPropertiesEnsemble=MetaPropertiesEnsemble(metadataLogDir=Optional.empty,
dirs={/bitnami/kafka/data: MetaProperties(version=1, clusterId=,
nodeId=2, directoryId=85zdFJUTcbfYz0NwP8U3iA)})

kafka  | All of the log directories are already formatted.

kafka  | kafka 06:17:53.21 INFO  ==> ** Kafka setup finished! **

kafka  |

kafka  | kafka 06:17:53.23 INFO  ==> ** Starting Kafka **

kafka  | [2024-04-16 06:17:54,210] INFO Registered
kafka:type=kafka.Log4jController MBean
(kafka.utils.Log4jControllerRegistration$)

kafka  | [2024-04-16 06:17:54,607] INFO Setting -D
jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated
TLS renegotiation (org.apache.zookeeper.common.X509Util)

kafka  | [2024-04-16 06:17:54,815] INFO Registered signal handlers for TERM,
INT, HUP (org.apache.kafka.common.utils.LoggingSignalHandler)

kafka  | [2024-04-16 06:17:54,819] INFO [ControllerServer id=2] Starting
controller (kafka.server.ControllerServer)

kafka  | [2024-04-16 06:17:55,465] INFO Updated connection-accept-rate max
connection creation rate to 2147483647 (kafka.network.ConnectionQuotas)

kafka  | [2024-04-16 06:17:56,298] INFO [SocketServer
listenerType=CONTROLLER, nodeId=2] Created data-plane acceptor and
processors for endpoint : ListenerName(CONTROLLER)
(kafka.network.SocketServer)

kafka  | [2024-04-16 06:17:56,302] INFO CONTROLLER: resolved wildcard host
to kafka (org.apache.kafka.metadata.ListenerInfo)

kafka  | [2024-04-16 06:17:56,312] INFO authorizerStart completed for
endpoint CONTROLLER. Endpoint is now READY.
(org.apache.kafka.server.network.EndpointReadyFutures)

kafka  | [2024-04-16 06:17:56,312] INFO [SharedServer id=2] Starting
SharedServer (kafka.server.SharedServer)

kafka  | [2024-04-16 06:17:56,405] INFO [LogLoader
partition=__cluster_metadata-0, dir=/bitnami/kafka/data] Recovering
unflushed segment 2639301. 0/3 recovered for __cluster_metadata-0.
(kafka.log.LogLoader)

kafka  | [2024-04-16 06:17:56,408] INFO [LogLoader
partition=__cluster_metadata-0, dir=/bitnami/kafka/data] Loading producer
state till offset 2639301 with message format version 2
(kafka.log.UnifiedLog$)

kafka  | [2024-04-16 06:17:56,409] INFO [LogLoader
partition=__cluster_metadata-0, dir=/bitnami/kafka/data] Reloading from
producer snapshot and rebuilding producer state from offset 2639301
(kafka.log.UnifiedLog$)

kafka  | [2024-04-16 06:17:56,410] INFO Deleted producer state snapshot
/bitnami/kafka/data/__cluster_metadata-0/03954853.snapshot
(org.apache.kafka.storage.internals.log.SnapshotFile)

kafka  | [2024-04-16 06:17:56,410] INFO Deleted producer state snapshot
/bitnami/kafka/data/__cluster_metadata-0/05164322.snapshot
(org.apache.kafka.storage.internals.log.SnapshotFile)

kafka  | [2024-04-16 06:17:56,411] INFO Deleted producer state snapshot
/bitnami/kafka/data/__cluster_metadata-0/06183812.snapshot
(org.apache.kafka.storage.internals.log.SnapshotFile)

kafka  | [2024-04-16 06:17:56,420] INFO [ProducerStateManager
partition=__cluster_metadata-0] Wrote producer snapshot at offset 2639301
with 0 producer ids in 7 ms.
(org.apache.kafka.storage.internals.log.ProducerStateManager)

kafka  | [2024-04-16 06:17:56,420] INFO [LogLoader
partition=__cluster_metadata-0, dir=/bitnami/kafka/data] Producer state
recovery took 1ms for snapshot load and 9ms for segment recovery from offse

AW: How do I resolve an UnwritableMetadataException

2024-06-04 Thread tobias.bohn
Had the same Issue some Weeks ago, we also did an upgrade to 3.7..
Did not receive any hints on this Topic back in the days.

There is an open StackOverflow 
https://stackoverflow.com/questions/78334479/unwritablemetadataexception-on-startup-in-apache-kafka
There is also an open Kafka Jira Ticket 
https://issues.apache.org/jira/browse/KAFKA-16662
We fixed it by downgrading to 3.6.

Hope this helps.

-Ursprüngliche Nachricht-
Von: Sejal Patel  
Gesendet: Montag, 3. Juni 2024 08:26
An: users@kafka.apache.org
Betreff: How do I resolve an UnwritableMetadataException

I was expanding my kafka cluster to 24 nodes (from 16 nodes) and rebalancing 
the topics. 1 of the partitions of the topic did not get rebalanced as expected 
(it was taking a million years so I decided to look and see what was 
happening). It turns out that the script for mounting the 2nd partition for use 
as /data did not kick in and thus there simply wasn't enough disk space 
available at the time of the rebalance. The system was left with like 5Mb of 
disk space and the kafka brokers were essentially borked at that point.

So I had to kill the kafka processes, move the original kafka data folder to a 
/tmp location, mounted the data partition, and migrated the /tmp kafka folder 
back to the original spot. But when I went to startup the kafka instance I got 
this message over and over again every few milliseconds.

[2024-06-03 06:14:01,503] ERROR Encountered metadata loading fault: Unhandled 
error initializing new publishers 
(org.apache.kafka.server.fault.LoggingFaultHandler) 
org.apache.kafka.image.writer.UnwritableMetadataException: Metadata has been 
lost because the following could not be represented in metadata version 
3.4-IV0: the directory assign ment state of one or more replicas at 
org.apache.kafka.image.writer.ImageWriterOptions.handleLoss(ImageWriterOptions.java:94)
 at 
org.apache.kafka.metadata.PartitionRegistration.toRecord(PartitionRegistration.java:391)
 at org.apache.kafka.image.TopicImage.write(TopicImage.java:71) at 
org.apache.kafka.image.TopicsImage.write(TopicsImage.java:84) at 
org.apache.kafka.image.MetadataImage.write(MetadataImage.java:155) at 
org.apache.kafka.image.loader.MetadataLoader.initializeNewPublishers(MetadataLoader.java:295)
 at 
org.apache.kafka.image.loader.MetadataLoader.lambda$scheduleInitializeNewPublishers$0(MetadataLoader.java:266)
 at 
org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:127)
 at 
org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210)
 at 
org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181)
 at java.base/java.lang.Thread.run(Thread.java:1583) [2024-06-03 06:14:01,556] 
INFO [BrokerLifecycleManager id=28] The broker is in RECOVERY. 
(kafka.server.BrokerLifecycleManager)
Scarier is that if any node that is working gets restarted, they too start 
sending off that message as well.

I am using a kraft setup and have within the past month upgraded to kafka 3.7 
(original setup over a year ago was kafka 3.4). How do I resolve this issue? 
I'm not sure what the problem is or how to fix it.

If I restart a kraft server, it dies with the same error message and can never 
get spun up again.

Is it possible to recover from this or do I need to start from scratch? If I 
start from scratch, how do I keep the topics? What is the best way to proceed 
from here? I'm unable to find anything related to this problem via a google 
search.

I'm at a loss and would appreciate any help you can provide.

Thank you.