[jira] [Comment Edited] (KAFKA-8608) Broker shows WARN on reassignment partitions on new brokers: Replica LEO, follower position & Cache truncation
[ https://issues.apache.org/jira/browse/KAFKA-8608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304601#comment-17304601 ] Zhang Jianguo edited comment on KAFKA-8608 at 3/19/21, 2:54 AM: [~LillianY] [~xmar] I meet the same issue. [2021-03-18 17:37:30,799] WARN [ReplicaManager broker=15] Leader 15 failed to record follower 18's position 308399, and last sent HW since the replica is not recognized to be one of the assigned replicas 15,16 for partition Collect-gnodeb-24. Empty records will be returned for this partition. (kafka.server.ReplicaManager) After controller switched from 14 to 15, it looks like kafka became abnormal. It didn't work even if I restart brokers. *Logs of Broker 14* !image-2021-03-19-10-36-04-328.png! !image-2021-03-19-10-41-44-952.png! !image-2021-03-19-10-42-16-296.png! !image-2021-03-19-10-42-32-759.png! *producer LOG* !image-2021-03-19-10-41-03-203.png! *Consumer got timeout exception:* *!image-2021-03-19-10-39-24-728.png!* was (Author: alberyzjg): [~LillianY] I meet the same issue. [2021-03-18 17:37:30,799] WARN [ReplicaManager broker=15] Leader 15 failed to record follower 18's position 308399, and last sent HW since the replica is not recognized to be one of the assigned replicas 15,16 for partition Collect-gnodeb-24. Empty records will be returned for this partition. (kafka.server.ReplicaManager) After controller switched from 14 to 15, it looks like kafka became abnormal. It didn't work even if I restart brokers. *Logs of Broker 14* !image-2021-03-19-10-36-04-328.png! !image-2021-03-19-10-41-44-952.png! !image-2021-03-19-10-42-16-296.png! !image-2021-03-19-10-42-32-759.png! *producer LOG* !image-2021-03-19-10-41-03-203.png! *Consumer got timeout exception:* *!image-2021-03-19-10-39-24-728.png!* > Broker shows WARN on reassignment partitions on new brokers: Replica LEO, > follower position & Cache truncation > -- > > Key: KAFKA-8608 > URL: https://issues.apache.org/jira/browse/KAFKA-8608 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 2.1.1 > Environment: Kafka 2.1.1 >Reporter: Di Campo >Priority: Minor > Labels: broker, reassign, repartition > Attachments: image-2021-03-19-10-36-04-328.png, > image-2021-03-19-10-39-24-728.png, image-2021-03-19-10-41-03-203.png, > image-2021-03-19-10-41-44-952.png, image-2021-03-19-10-42-16-296.png, > image-2021-03-19-10-42-32-759.png > > > I added two brokers (brokerId 4,5) to a 3-node (brokerId 1,2,3) cluster where > there were 32 topics and 64 partitions on each, replication 3. > Running reassigning partitions. > On each run, I can see the following WARN messages, but when the reassignment > partition process finishes, it all seems OK. ISR is OK (count is 3 in every > partition). > But I get the following messages types, one per partition: > > {code:java} > [2019-06-27 12:42:03,946] WARN [LeaderEpochCache visitors-0.0.1-10] New epoch > entry EpochEntry(epoch=24, startOffset=51540) caused truncation of > conflicting entries ListBuffer(EpochEntry(epoch=22, startOffset=51540)). > Cache now contains 5 entries. (kafka.server.epoch.LeaderEpochFileCache) {code} > -> This relates to cache, so I suppose it's pretty safe. > {code:java} > [2019-06-27 12:42:04,250] WARN [ReplicaManager broker=1] Leader 1 failed to > record follower 3's position 47981 since the replica is not recognized to be > one of the assigned replicas 1,2,5 for partition visitors-0.0.1-28. Empty > records will be returned for this partition. > (kafka.server.ReplicaManager){code} > -> This is scary. I'm not sure about the severity of this, but it looks like > it may be missing records? > {code:java} > [2019-06-27 12:42:03,709] WARN [ReplicaManager broker=1] While recording the > replica LEO, the partition visitors-0.0.1-58 hasn't been created. > (kafka.server.ReplicaManager){code} > -> Here, these partitions are created. > First of all - am I supposed to be missing data here? I am assuming I don't, > and so far I don't see traces of losing anything. > If so, I'm not sure what these messages are trying to say here. Should they > really be at WARN level? If so - should the message clarify better the > different risks involved? > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (KAFKA-8608) Broker shows WARN on reassignment partitions on new brokers: Replica LEO, follower position & Cache truncation
[ https://issues.apache.org/jira/browse/KAFKA-8608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304601#comment-17304601 ] Zhang Jianguo edited comment on KAFKA-8608 at 3/19/21, 2:42 AM: [~LillianY] I meet the same issue. [2021-03-18 17:37:30,799] WARN [ReplicaManager broker=15] Leader 15 failed to record follower 18's position 308399, and last sent HW since the replica is not recognized to be one of the assigned replicas 15,16 for partition Collect-gnodeb-24. Empty records will be returned for this partition. (kafka.server.ReplicaManager) After controller switched from 14 to 15, it looks like kafka became abnormal. It didn't work even if I restart brokers. *Logs of Broker 14* !image-2021-03-19-10-36-04-328.png! !image-2021-03-19-10-41-44-952.png! !image-2021-03-19-10-42-16-296.png! !image-2021-03-19-10-42-32-759.png! *producer LOG* !image-2021-03-19-10-41-03-203.png! *Consumer got timeout exception:* *!image-2021-03-19-10-39-24-728.png!* was (Author: alberyzjg): [~LillianY] I meet the same issue. [2021-03-18 17:37:30,799] WARN [ReplicaManager broker=15] Leader 15 failed to record follower 18's position 308399, and last sent HW since the replica is not recognized to be one of the assigned replicas 15,16 for partition Collect-gnodeb-24. Empty records will be returned for this partition. (kafka.server.ReplicaManager) After controller switched from 14 to 15, it looks like kafka became abnormal. It didn't work even if I restart brokers. !image-2021-03-19-10-36-04-328.png! !image-2021-03-19-10-37-35-183.png! !image-2021-03-19-10-38-11-280.png! !image-2021-03-19-10-38-22-154.png! *producer LOG* *!image-2021-03-19-10-38-38-396.png!* *Consumer got timeout exception:* *!image-2021-03-19-10-39-24-728.png!* > Broker shows WARN on reassignment partitions on new brokers: Replica LEO, > follower position & Cache truncation > -- > > Key: KAFKA-8608 > URL: https://issues.apache.org/jira/browse/KAFKA-8608 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 2.1.1 > Environment: Kafka 2.1.1 >Reporter: Di Campo >Priority: Minor > Labels: broker, reassign, repartition > Attachments: image-2021-03-19-10-36-04-328.png, > image-2021-03-19-10-39-24-728.png, image-2021-03-19-10-41-03-203.png, > image-2021-03-19-10-41-44-952.png, image-2021-03-19-10-42-16-296.png, > image-2021-03-19-10-42-32-759.png > > > I added two brokers (brokerId 4,5) to a 3-node (brokerId 1,2,3) cluster where > there were 32 topics and 64 partitions on each, replication 3. > Running reassigning partitions. > On each run, I can see the following WARN messages, but when the reassignment > partition process finishes, it all seems OK. ISR is OK (count is 3 in every > partition). > But I get the following messages types, one per partition: > > {code:java} > [2019-06-27 12:42:03,946] WARN [LeaderEpochCache visitors-0.0.1-10] New epoch > entry EpochEntry(epoch=24, startOffset=51540) caused truncation of > conflicting entries ListBuffer(EpochEntry(epoch=22, startOffset=51540)). > Cache now contains 5 entries. (kafka.server.epoch.LeaderEpochFileCache) {code} > -> This relates to cache, so I suppose it's pretty safe. > {code:java} > [2019-06-27 12:42:04,250] WARN [ReplicaManager broker=1] Leader 1 failed to > record follower 3's position 47981 since the replica is not recognized to be > one of the assigned replicas 1,2,5 for partition visitors-0.0.1-28. Empty > records will be returned for this partition. > (kafka.server.ReplicaManager){code} > -> This is scary. I'm not sure about the severity of this, but it looks like > it may be missing records? > {code:java} > [2019-06-27 12:42:03,709] WARN [ReplicaManager broker=1] While recording the > replica LEO, the partition visitors-0.0.1-58 hasn't been created. > (kafka.server.ReplicaManager){code} > -> Here, these partitions are created. > First of all - am I supposed to be missing data here? I am assuming I don't, > and so far I don't see traces of losing anything. > If so, I'm not sure what these messages are trying to say here. Should they > really be at WARN level? If so - should the message clarify better the > different risks involved? > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-8608) Broker shows WARN on reassignment partitions on new brokers: Replica LEO, follower position & Cache truncation
[ https://issues.apache.org/jira/browse/KAFKA-8608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304601#comment-17304601 ] Zhang Jianguo commented on KAFKA-8608: -- [~LillianY] I meet the same issue. [2021-03-18 17:37:30,799] WARN [ReplicaManager broker=15] Leader 15 failed to record follower 18's position 308399, and last sent HW since the replica is not recognized to be one of the assigned replicas 15,16 for partition Collect-gnodeb-24. Empty records will be returned for this partition. (kafka.server.ReplicaManager) After controller switched from 14 to 15, it looks like kafka became abnormal. It didn't work even if I restart brokers. !image-2021-03-19-10-36-04-328.png! !image-2021-03-19-10-37-35-183.png! !image-2021-03-19-10-38-11-280.png! !image-2021-03-19-10-38-22-154.png! *producer LOG* *!image-2021-03-19-10-38-38-396.png!* *Consumer got timeout exception:* *!image-2021-03-19-10-39-24-728.png!* > Broker shows WARN on reassignment partitions on new brokers: Replica LEO, > follower position & Cache truncation > -- > > Key: KAFKA-8608 > URL: https://issues.apache.org/jira/browse/KAFKA-8608 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 2.1.1 > Environment: Kafka 2.1.1 >Reporter: Di Campo >Priority: Minor > Labels: broker, reassign, repartition > Attachments: image-2021-03-19-10-36-04-328.png, > image-2021-03-19-10-39-24-728.png > > > I added two brokers (brokerId 4,5) to a 3-node (brokerId 1,2,3) cluster where > there were 32 topics and 64 partitions on each, replication 3. > Running reassigning partitions. > On each run, I can see the following WARN messages, but when the reassignment > partition process finishes, it all seems OK. ISR is OK (count is 3 in every > partition). > But I get the following messages types, one per partition: > > {code:java} > [2019-06-27 12:42:03,946] WARN [LeaderEpochCache visitors-0.0.1-10] New epoch > entry EpochEntry(epoch=24, startOffset=51540) caused truncation of > conflicting entries ListBuffer(EpochEntry(epoch=22, startOffset=51540)). > Cache now contains 5 entries. (kafka.server.epoch.LeaderEpochFileCache) {code} > -> This relates to cache, so I suppose it's pretty safe. > {code:java} > [2019-06-27 12:42:04,250] WARN [ReplicaManager broker=1] Leader 1 failed to > record follower 3's position 47981 since the replica is not recognized to be > one of the assigned replicas 1,2,5 for partition visitors-0.0.1-28. Empty > records will be returned for this partition. > (kafka.server.ReplicaManager){code} > -> This is scary. I'm not sure about the severity of this, but it looks like > it may be missing records? > {code:java} > [2019-06-27 12:42:03,709] WARN [ReplicaManager broker=1] While recording the > replica LEO, the partition visitors-0.0.1-58 hasn't been created. > (kafka.server.ReplicaManager){code} > -> Here, these partitions are created. > First of all - am I supposed to be missing data here? I am assuming I don't, > and so far I don't see traces of losing anything. > If so, I'm not sure what these messages are trying to say here. Should they > really be at WARN level? If so - should the message clarify better the > different risks involved? > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-8608) Broker shows WARN on reassignment partitions on new brokers: Replica LEO, follower position & Cache truncation
[ https://issues.apache.org/jira/browse/KAFKA-8608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhang Jianguo updated KAFKA-8608: - Attachment: image-2021-03-19-10-39-24-728.png > Broker shows WARN on reassignment partitions on new brokers: Replica LEO, > follower position & Cache truncation > -- > > Key: KAFKA-8608 > URL: https://issues.apache.org/jira/browse/KAFKA-8608 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 2.1.1 > Environment: Kafka 2.1.1 >Reporter: Di Campo >Priority: Minor > Labels: broker, reassign, repartition > Attachments: image-2021-03-19-10-36-04-328.png, > image-2021-03-19-10-39-24-728.png > > > I added two brokers (brokerId 4,5) to a 3-node (brokerId 1,2,3) cluster where > there were 32 topics and 64 partitions on each, replication 3. > Running reassigning partitions. > On each run, I can see the following WARN messages, but when the reassignment > partition process finishes, it all seems OK. ISR is OK (count is 3 in every > partition). > But I get the following messages types, one per partition: > > {code:java} > [2019-06-27 12:42:03,946] WARN [LeaderEpochCache visitors-0.0.1-10] New epoch > entry EpochEntry(epoch=24, startOffset=51540) caused truncation of > conflicting entries ListBuffer(EpochEntry(epoch=22, startOffset=51540)). > Cache now contains 5 entries. (kafka.server.epoch.LeaderEpochFileCache) {code} > -> This relates to cache, so I suppose it's pretty safe. > {code:java} > [2019-06-27 12:42:04,250] WARN [ReplicaManager broker=1] Leader 1 failed to > record follower 3's position 47981 since the replica is not recognized to be > one of the assigned replicas 1,2,5 for partition visitors-0.0.1-28. Empty > records will be returned for this partition. > (kafka.server.ReplicaManager){code} > -> This is scary. I'm not sure about the severity of this, but it looks like > it may be missing records? > {code:java} > [2019-06-27 12:42:03,709] WARN [ReplicaManager broker=1] While recording the > replica LEO, the partition visitors-0.0.1-58 hasn't been created. > (kafka.server.ReplicaManager){code} > -> Here, these partitions are created. > First of all - am I supposed to be missing data here? I am assuming I don't, > and so far I don't see traces of losing anything. > If so, I'm not sure what these messages are trying to say here. Should they > really be at WARN level? If so - should the message clarify better the > different risks involved? > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-8608) Broker shows WARN on reassignment partitions on new brokers: Replica LEO, follower position & Cache truncation
[ https://issues.apache.org/jira/browse/KAFKA-8608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhang Jianguo updated KAFKA-8608: - Attachment: image-2021-03-19-10-36-04-328.png > Broker shows WARN on reassignment partitions on new brokers: Replica LEO, > follower position & Cache truncation > -- > > Key: KAFKA-8608 > URL: https://issues.apache.org/jira/browse/KAFKA-8608 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 2.1.1 > Environment: Kafka 2.1.1 >Reporter: Di Campo >Priority: Minor > Labels: broker, reassign, repartition > Attachments: image-2021-03-19-10-36-04-328.png > > > I added two brokers (brokerId 4,5) to a 3-node (brokerId 1,2,3) cluster where > there were 32 topics and 64 partitions on each, replication 3. > Running reassigning partitions. > On each run, I can see the following WARN messages, but when the reassignment > partition process finishes, it all seems OK. ISR is OK (count is 3 in every > partition). > But I get the following messages types, one per partition: > > {code:java} > [2019-06-27 12:42:03,946] WARN [LeaderEpochCache visitors-0.0.1-10] New epoch > entry EpochEntry(epoch=24, startOffset=51540) caused truncation of > conflicting entries ListBuffer(EpochEntry(epoch=22, startOffset=51540)). > Cache now contains 5 entries. (kafka.server.epoch.LeaderEpochFileCache) {code} > -> This relates to cache, so I suppose it's pretty safe. > {code:java} > [2019-06-27 12:42:04,250] WARN [ReplicaManager broker=1] Leader 1 failed to > record follower 3's position 47981 since the replica is not recognized to be > one of the assigned replicas 1,2,5 for partition visitors-0.0.1-28. Empty > records will be returned for this partition. > (kafka.server.ReplicaManager){code} > -> This is scary. I'm not sure about the severity of this, but it looks like > it may be missing records? > {code:java} > [2019-06-27 12:42:03,709] WARN [ReplicaManager broker=1] While recording the > replica LEO, the partition visitors-0.0.1-58 hasn't been created. > (kafka.server.ReplicaManager){code} > -> Here, these partitions are created. > First of all - am I supposed to be missing data here? I am assuming I don't, > and so far I don't see traces of losing anything. > If so, I'm not sure what these messages are trying to say here. Should they > really be at WARN level? If so - should the message clarify better the > different risks involved? > > -- This message was sent by Atlassian Jira (v8.3.4#803005)