Thanks to Matthias for the response, I've created a KIP and started another email thread to discuss this, check following. [DISCUSS] Include min.insync.replicas in MetadataResponse to make Producer smarter in partitioning events
Matthias J. Sax <mj...@apache.org> 于2020年7月6日周一 下午12:30写道: > Arvin, > > thanks for your email. This is definitely the right channel. I am > personally not familiar enough with the producer code, but what you say > makes sense to me from a high level. > > Maybe it would be best if you would file a Jira to improve the producer > accordingly? I guess, this change would require a KIP. > > Of course, if you are interested, feel free to pick it up yourself. > > > -Matthias > > > On 6/28/20 8:53 AM, Arvin Zheng wrote: > > Hi All, > > > > Not sure if this is the right channel and thread to ask, but would like > to > > discuss a potential improvement to Java Kafka Producer. > > > > ``` > > Currently the Kafka Producer is able to identify unavailable partitions > and > > avoid routing messages to them, but the definition of an unavailable > > partitions is - the leader of the partition is not available. > > From Producer point of view, acks for sending messages can be [all, -1, > 0, > > 1] > > 1. When acks is set to either 0 or 1, leader availability is good enough > to > > determine whether we should route messages to that partition. > > 2. When acks is set to -1 or all, leader available doesn't mean we are > able > > to persist messages to that partition successfully, instead, we need to > > make sure > > a. leader is available. > > b. at least min.insync.replicas number of replicas are available > > ``` > > > > To achieve 2, what we need is to carry min.insync.replicas information > of a > > topic to the metadata response, so that Producer is able to determine if > it > > should route messages to that partition when there's no enough replicas > > available and it's acks is set to -1 or all. > > > > Advantages that I can think of > > 1. Avoid exhausting the entire Producer cache when a partition is not > > available for a long time and > > a. retries is set to a large value > > b. acks is set to all > > 2. Avoid unnecessary network tries > > > > Not sure if this is a valid case but would like to hear any opinions. > > > > Br, > > Arvin > > > >