[ 
https://issues.apache.org/jira/browse/KAFKA-16986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17864656#comment-17864656
 ] 

Rajini Sivaram commented on KAFKA-16986:
----------------------------------------

In the 3.6 branch, there is a bug while merging topic ids from a partial 
metadata update: 
[https://github.com/apache/kafka/blob/35f0eeb2b07cabba0287b1a83a124abfb15e6745/clients/src/main/java/org/apache/kafka/clients/MetadataSnapshot.java#L172.]
  We only retain topic ids from the current partial response. Hence the next 
full metadata update thinks the topic id was previously null and is being 
updated. This has been fixed in 3.7  and 
trunk:[https://github.com/apache/kafka/blob/0b11971f2c94f7aadc3fab2c51d94642065a72e5/clients/src/main/java/org/apache/kafka/clients/MetadataSnapshot.java#L179|https://github.com/apache/kafka/blob/0b11971f2c94f7aadc3fab2c51d94642065a72e5/clients/src/main/java/org/apache/kafka/clients/MetadataSnapshot.java#L179_].
 The fix seems to be from this commit: 
[https://github.com/apache/kafka/commit/a9565a799f3f87eb1d932b114dde7373748551ac]
 .

> After upgrading to Kafka 3.4.1, the producer constantly produces logs related 
> to topicId changes
> ------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-16986
>                 URL: https://issues.apache.org/jira/browse/KAFKA-16986
>             Project: Kafka
>          Issue Type: Bug
>          Components: clients, producer 
>    Affects Versions: 3.0.1, 3.6.1
>            Reporter: Vinicius Vieira dos Santos
>            Priority: Minor
>         Attachments: image-2024-07-01-09-05-17-147.png, image.png
>
>
> When updating the Kafka broker from version 2.7.0 to 3.4.1, we noticed that 
> the applications began to log the message "{*}Resetting the last seen epoch 
> of partition PAYMENTS-0 to 0 since the associated topicId changed from null 
> to szRLmiAiTs8Y0nI8b3Wz1Q{*}" in a very constant, from what I understand this 
> behavior is not expected because the topic was not deleted and recreated so 
> it should simply use the cached data and not go through this client log line.
> We have some applications with around 15 topics and 40 partitions which means 
> around 600 log lines when metadata updates occur
> The main thing for me is to know if this could indicate a problem or if I can 
> simply change the log level of the org.apache.kafka.clients.Metadata class to 
> warn without worries
>  
> There are other reports of the same behavior like this:  
> [https://stackoverflow.com/questions/74652231/apache-kafka-resetting-the-last-seen-epoch-of-partition-why]
>  
> *Some log occurrences over an interval of about 7 hours, each block refers to 
> an instance of the application in kubernetes*
>  
> !image.png!
> *My scenario:*
> *Application:*
>  - Java: 21
>  - Client: 3.6.1, also tested on 3.0.1 and has the same behavior
> *Broker:*
>  - Cluster running on Kubernetes with the bitnami/kafka:3.4.1-debian-11-r52 
> image
>  
> *Producer Config*
>  
>     acks = -1
>     auto.include.jmx.reporter = true
>     batch.size = 16384
>     bootstrap.servers = [server:9092]
>     buffer.memory = 33554432
>     client.dns.lookup = use_all_dns_ips
>     client.id = producer-1
>     compression.type = gzip
>     connections.max.idle.ms = 540000
>     delivery.timeout.ms = 30000
>     enable.idempotence = true
>     interceptor.classes = []
>     key.serializer = class 
> org.apache.kafka.common.serialization.ByteArraySerializer
>     linger.ms = 0
>     max.block.ms = 60000
>     max.in.flight.requests.per.connection = 1
>     max.request.size = 1048576
>     metadata.max.age.ms = 300000
>     metadata.max.idle.ms = 300000
>     metric.reporters = []
>     metrics.num.samples = 2
>     metrics.recording.level = INFO
>     metrics.sample.window.ms = 30000
>     partitioner.adaptive.partitioning.enable = true
>     partitioner.availability.timeout.ms = 0
>     partitioner.class = null
>     partitioner.ignore.keys = false
>     receive.buffer.bytes = 32768
>     reconnect.backoff.max.ms = 1000
>     reconnect.backoff.ms = 50
>     request.timeout.ms = 30000
>     retries = 3
>     retry.backoff.ms = 100
>     sasl.client.callback.handler.class = null
>     sasl.jaas.config = [hidden]
>     sasl.kerberos.kinit.cmd = /usr/bin/kinit
>     sasl.kerberos.min.time.before.relogin = 60000
>     sasl.kerberos.service.name = null
>     sasl.kerberos.ticket.renew.jitter = 0.05
>     sasl.kerberos.ticket.renew.window.factor = 0.8
>     sasl.login.callback.handler.class = null
>     sasl.login.class = null
>     sasl.login.connect.timeout.ms = null
>     sasl.login.read.timeout.ms = null
>     sasl.login.refresh.buffer.seconds = 300
>     sasl.login.refresh.min.period.seconds = 60
>     sasl.login.refresh.window.factor = 0.8
>     sasl.login.refresh.window.jitter = 0.05
>     sasl.login.retry.backoff.max.ms = 10000
>     sasl.login.retry.backoff.ms = 100
>     sasl.mechanism = PLAIN
>     sasl.oauthbearer.clock.skew.seconds = 30
>     sasl.oauthbearer.expected.audience = null
>     sasl.oauthbearer.expected.issuer = null
>     sasl.oauthbearer.jwks.endpoint.refresh.ms = 3600000
>     sasl.oauthbearer.jwks.endpoint.retry.backoff.max.ms = 10000
>     sasl.oauthbearer.jwks.endpoint.retry.backoff.ms = 100
>     sasl.oauthbearer.jwks.endpoint.url = null
>     sasl.oauthbearer.scope.claim.name = scope
>     sasl.oauthbearer.sub.claim.name = sub
>     sasl.oauthbearer.token.endpoint.url = null
>     security.protocol = SASL_PLAINTEXT
>     security.providers = null
>     send.buffer.bytes = 131072
>     socket.connection.setup.timeout.max.ms = 30000
>     socket.connection.setup.timeout.ms = 10000
>     ssl.cipher.suites = null
>     ssl.enabled.protocols = [TLSv1.2, TLSv1.3]
>     ssl.endpoint.identification.algorithm = https
>     ssl.engine.factory.class = null
>     ssl.key.password = null
>     ssl.keymanager.algorithm = SunX509
>     ssl.keystore.certificate.chain = null
>     ssl.keystore.key = null
>     ssl.keystore.location = null
>     ssl.keystore.password = null
>     ssl.keystore.type = JKS
>     ssl.protocol = TLSv1.3
>     ssl.provider = null
>     ssl.secure.random.implementation = null
>     ssl.trustmanager.algorithm = PKIX
>     ssl.truststore.certificates = null
>     ssl.truststore.location = null
>     ssl.truststore.password = null
>     ssl.truststore.type = JKS
>     transaction.timeout.ms = 60000
>     transactional.id = null
>     value.serializer = class 
> org.apache.kafka.common.serialization.ByteArraySerializer
>  
> If you need any more details, please let me know.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to