[ https://issues.apache.org/jira/browse/KAFKA-13177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
kaushik srinivas resolved KAFKA-13177. -------------------------------------- Resolution: Not A Bug > partition failures and fewer shrink but a lot of isr expansions with > increased num.replica.fetchers in kafka brokers > -------------------------------------------------------------------------------------------------------------------- > > Key: KAFKA-13177 > URL: https://issues.apache.org/jira/browse/KAFKA-13177 > Project: Kafka > Issue Type: Bug > Reporter: kaushik srinivas > Assignee: kaushik srinivas > Priority: Major > > Installing 3 node kafka broker cluster (4 core cpu and 4Gi memory on k8s) > topics : 15, partitions each : 15 replication factor 3, min.insync.replicas > : 2 > producers running with acks : all > Initially the num.replica.fetchers was set to 1 (default) and we observed > very frequent ISR shrinks and expansions. So the setups were tuned with a > higher value of 4. > Once after this change was done, we see below behavior and warning msgs in > broker logs > # Over a period of 2 days, there are around 10 shrinks corresponding to 10 > partitions, but around 700 ISR expansions corresponding to almost all > partitions in the cluster(approx 50 to 60 partitions). > # we see frequent warn msg of partitions being marked as failure in the same > time span. Below is the trace --> {"type":"log", "host":"wwwwww", > "level":"WARN", "neid":"kafka-wwwwww", "system":"kafka", > "time":"2021-08-03T20:09:15.340", "timezone":"UTC", > "log":{"message":"ReplicaFetcherThread-2-1003 - > kafka.server.ReplicaFetcherThread - *[ReplicaFetcher replicaId=1001, > leaderId=1003, fetcherId=2] Partition test-16 marked as failed"}}* > > We see the above behavior continuously after increasing the > num.replica.fetchers to 4 from 1. We did increase this to improve the > replication performance and hence reduce the ISR shrinks. > But we see this strange behavior after the change. What would the above trace > indicate and is marking partitions as failed just a WARN msgs and handled by > kafka or is it something to worry about ? -- This message was sent by Atlassian Jira (v8.3.4#803005)