[ https://issues.apache.org/jira/browse/KAFKA-10635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17603789#comment-17603789 ]
Guozhang Wang commented on KAFKA-10635: --------------------------------------- Hi [~nicktelford], regarding the broker's behavior: > Irrespective of the behaviour on the Streams side, I'm confident that the > real issue is that brokers should not be producing an > OutOfOrderSequenceException just because partition leadership changed while a > producer was writing to that partition. As I mentioned in my earlier comment, > I believe this is caused by the producerEpoch not being properly tracked when > partition leadership changes. I think it is related to KIP-360 but when I looked through the history, I cannot find obvious relevance to what you've observed and on the current trunk the behavior seems not as what you observed either. Would you mind upgrading to a newer version of broker than 2.5.1, e.g. 3.0+, and see if this issue still preserves? > Streams application fails with OutOfOrderSequenceException after rolling > restarts of brokers > -------------------------------------------------------------------------------------------- > > Key: KAFKA-10635 > URL: https://issues.apache.org/jira/browse/KAFKA-10635 > Project: Kafka > Issue Type: Bug > Components: core, producer > Affects Versions: 2.5.1 > Reporter: Peeraya Maetasatidsuk > Priority: Blocker > > We are upgrading our brokers to version 2.5.1 (from 2.3.1) by performing a > rolling restart of the brokers after installing the new version. After the > restarts we notice one of our streams app (client version 2.4.1) fails with > OutOfOrderSequenceException: > > {code:java} > ERROR [2020-10-13 22:52:21,400] [com.aaa.bbb.ExceptionHandler] Unexpected > error. Record: a_record, destination topic: > topic-name-Aggregation-repartition > org.apache.kafka.common.errors.OutOfOrderSequenceException: The broker > received an out of order sequence number. > ERROR [2020-10-13 22:52:21,413] > [org.apache.kafka.streams.processor.internals.AssignedTasks] stream-thread > [topic-name-StreamThread-1] Failed to commit stream task 1_39 due to the > following error: org.apache.kafka.streams.errors.StreamsException: task > [1_39] Abort sending since an error caught with a previous record (timestamp > 1602654659000) to topic topic-name-Aggregation-repartition due to > org.apache.kafka.common.errors.OutOfOrderSequenceException: The broker > received an out of order sequence number. at > org.apache.kafka.streams.processor.internals.RecordCollectorImpl.recordSendError(RecordCollectorImpl.java:144) > at > org.apache.kafka.streams.processor.internals.RecordCollectorImpl.access$500(RecordCollectorImpl.java:52) > at > org.apache.kafka.streams.processor.internals.RecordCollectorImpl$1.onCompletion(RecordCollectorImpl.java:204) > at > org.apache.kafka.clients.producer.KafkaProducer$InterceptorCallback.onCompletion(KafkaProducer.java:1348) > at > org.apache.kafka.clients.producer.internals.ProducerBatch.completeFutureAndFireCallbacks(ProducerBatch.java:230) > at > org.apache.kafka.clients.producer.internals.ProducerBatch.done(ProducerBatch.java:196) > at > org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:730) > at > org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:716) > at > org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:674) > at > org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:596) > at > org.apache.kafka.clients.producer.internals.Sender.access$100(Sender.java:74) > at > org.apache.kafka.clients.producer.internals.Sender$1.onComplete(Sender.java:798) > at > org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109) > at > org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:569) > at > org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:561) at > org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:335) > at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:244) > at java.base/java.lang.Thread.run(Thread.java:834)Caused by: > org.apache.kafka.common.errors.OutOfOrderSequenceException: The broker > received an out of order sequence number. > {code} > We see a corresponding error on the broker side: > {code:java} > [2020-10-13 22:52:21,398] ERROR [ReplicaManager broker=137636348] Error > processing append operation on partition > topic-name-Aggregation-repartition-52 > (kafka.server.ReplicaManager)org.apache.kafka.common.errors.OutOfOrderSequenceException: > Out of order sequence number for producerId 2819098 at offset 1156041 in > partition topic-name-Aggregation-repartition-52: 29 (incoming seq. number), > -1 (current end sequence number) > {code} > We are able to reproduce this many times and it happens regardless of whether > the broker shutdown (at restart) is clean or unclean. However, when we > rollback the broker version to 2.3.1 from 2.5.1 and perform similar rolling > restarts, we don't see this error on the streams application at all. This is > blocking us from upgrading our broker version. > -- This message was sent by Atlassian Jira (v8.20.10#820010)