Re: Potential Improvement for Kafka Producer

2020-07-06 Thread Arvin Zheng
Thanks to Matthias for the response, I've created a KIP and started another
email thread to discuss this, check following.
[DISCUSS] Include min.insync.replicas in MetadataResponse to make Producer
smarter in partitioning events

Matthias J. Sax  于2020年7月6日周一 下午12:30写道:

> Arvin,
>
> thanks for your email. This is definitely the right channel. I am
> personally not familiar enough with the producer code, but what you say
> makes sense to me from a high level.
>
> Maybe it would be best if you would file a Jira to improve the producer
> accordingly? I guess, this change would require a KIP.
>
> Of course, if you are interested, feel free to pick it up yourself.
>
>
> -Matthias
>
>
> On 6/28/20 8:53 AM, Arvin Zheng wrote:
> > Hi All,
> >
> > Not sure if this is the right channel and thread to ask, but would like
> to
> > discuss a potential improvement to Java Kafka Producer.
> >
> > ```
> > Currently the Kafka Producer is able to identify unavailable partitions
> and
> > avoid routing messages to them, but the definition of an unavailable
> > partitions is - the leader of the partition is not available.
> > From Producer point of view, acks for sending messages can be [all, -1,
> 0,
> > 1]
> > 1. When acks is set to either 0 or 1, leader availability is good enough
> to
> > determine whether we should route messages to that partition.
> > 2. When acks is set to -1 or all, leader available doesn't mean we are
> able
> > to persist messages to that partition successfully, instead, we need to
> > make sure
> > a. leader is available.
> > b. at least min.insync.replicas number of replicas are available
> > ```
> >
> > To achieve 2, what we need is to carry min.insync.replicas information
> of a
> > topic to the metadata response, so that Producer is able to determine if
> it
> > should route messages to that partition when there's no enough replicas
> > available and it's acks is set to -1 or all.
> >
> > Advantages that I can think of
> > 1. Avoid exhausting the entire Producer cache when a partition is not
> > available for a long time and
> > a. retries is set to a large value
> > b. acks is set to all
> > 2. Avoid unnecessary network tries
> >
> > Not sure if this is a valid case but would like to hear any opinions.
> >
> > Br,
> > Arvin
> >
>
>


Re: Potential Improvement for Kafka Producer

2020-07-06 Thread Matthias J. Sax
Arvin,

thanks for your email. This is definitely the right channel. I am
personally not familiar enough with the producer code, but what you say
makes sense to me from a high level.

Maybe it would be best if you would file a Jira to improve the producer
accordingly? I guess, this change would require a KIP.

Of course, if you are interested, feel free to pick it up yourself.


-Matthias


On 6/28/20 8:53 AM, Arvin Zheng wrote:
> Hi All,
> 
> Not sure if this is the right channel and thread to ask, but would like to
> discuss a potential improvement to Java Kafka Producer.
> 
> ```
> Currently the Kafka Producer is able to identify unavailable partitions and
> avoid routing messages to them, but the definition of an unavailable
> partitions is - the leader of the partition is not available.
> From Producer point of view, acks for sending messages can be [all, -1, 0,
> 1]
> 1. When acks is set to either 0 or 1, leader availability is good enough to
> determine whether we should route messages to that partition.
> 2. When acks is set to -1 or all, leader available doesn't mean we are able
> to persist messages to that partition successfully, instead, we need to
> make sure
> a. leader is available.
> b. at least min.insync.replicas number of replicas are available
> ```
> 
> To achieve 2, what we need is to carry min.insync.replicas information of a
> topic to the metadata response, so that Producer is able to determine if it
> should route messages to that partition when there's no enough replicas
> available and it's acks is set to -1 or all.
> 
> Advantages that I can think of
> 1. Avoid exhausting the entire Producer cache when a partition is not
> available for a long time and
> a. retries is set to a large value
> b. acks is set to all
> 2. Avoid unnecessary network tries
> 
> Not sure if this is a valid case but would like to hear any opinions.
> 
> Br,
> Arvin
> 



signature.asc
Description: OpenPGP digital signature


Potential Improvement for Kafka Producer

2020-06-28 Thread Arvin Zheng
Hi All,

Not sure if this is the right channel and thread to ask, but would like to
discuss a potential improvement to Java Kafka Producer.

```
Currently the Kafka Producer is able to identify unavailable partitions and
avoid routing messages to them, but the definition of an unavailable
partitions is - the leader of the partition is not available.
>From Producer point of view, acks for sending messages can be [all, -1, 0,
1]
1. When acks is set to either 0 or 1, leader availability is good enough to
determine whether we should route messages to that partition.
2. When acks is set to -1 or all, leader available doesn't mean we are able
to persist messages to that partition successfully, instead, we need to
make sure
a. leader is available.
b. at least min.insync.replicas number of replicas are available
```

To achieve 2, what we need is to carry min.insync.replicas information of a
topic to the metadata response, so that Producer is able to determine if it
should route messages to that partition when there's no enough replicas
available and it's acks is set to -1 or all.

Advantages that I can think of
1. Avoid exhausting the entire Producer cache when a partition is not
available for a long time and
a. retries is set to a large value
b. acks is set to all
2. Avoid unnecessary network tries

Not sure if this is a valid case but would like to hear any opinions.

Br,
Arvin