Hello George,

Thanks for the KIP. Just a few questions to help us understanding the
design details:

1) Why does the default implementation have a criterion of when it is
enabled -- i.e. after a certain number of messages have been successfully
sent -- instead of always enabled? Also how would this mechanism be
customizable in user's own implementations? The current interface APIs seem
not allow such configurations.

2) When does the "onError" function trigger? Is that triggered when the
response returned contains an error code (but maybe a retriable one, so
that the batch of records would be re-enqueued) or when the batch of
records are finally going to be dropped on the floor? If it is based on the
first case, then depending on the timeout configuration we may not trigger
the breaker in time, is that the case? Also I'm wondering if we can allow
this function to be triggered on other events, such as inflight.requests
exceeding the threshold as well.

3) How would users customize "how long would we mute a partition" and "how
many partitions of a topic in total could be muted at the same time" from
the interface?


Guozhang


On Mon, Dec 14, 2020 at 5:42 PM Guoqiang Shu <shuguoqi...@gmail.com> wrote:

>
> Hi Jun and Justin,
>
> Many thanks for taking a look at our proposal and for the pointer! We
> learned about the mechanism proposed to enhance StickyPartitioner. Both
> methods aim to exclude brokers with transient errors and prevent cluster
> wide failure. The difference lies in the criteria used to tell if a broker
> is problematic: our KIP uses the error condition for the operation; and the
> heuristic in StickyPartitioner relies on the internal state
> max.in.flight.requests.per.connection.
>
> IMHO, using final result of write operation makes the behavior simpler to
> reason about. It covers all error scenarios and potentially supports all
> implementations of Partitioner. In contrast, using intermediate state may
> trigger action prematurely, for example, when
> max.in.flight.requests.per.connection reaches the threshold (due to small
> linger.ms value)  but the buffer in producer side is still in healthy
> state. In addition, in sequential mode
> max.in.flight.requests.per.connection is set to 1 therefore cannot be
> leveraged.
>
> Finally, as Justine pointed out, having AvailablePartitions reflects
> broker status (as enabled in our KIP) will benefit optimizations of the
> Partitioner in general, so the two classes of enhancements can coexist.
>
> Cheers,
> //George//
>
> On 2020/12/08 18:19:43, Justine Olshan <jols...@confluent.io> wrote:
> > Hi George,
> > I've been looking at the discussion on improving the sticky partitioner,
> > and one of the potential issues we discussed is how we could get
> > information to the partitioner to tell it not to choose certain
> partitions.
> > Currently, the partitioner can only use availablePartitionsForTopic. I
> took
> > a quick look at your KIP and it seemed that your KIP would change what
> > partitions are returned with this method. This seems like a step in the
> > right direction for solving that issue too.
> >
> > I agree with Jun that looking at both of these issues and the proposed
> > solutions would be very helpful.
> > Justine
> >
> > On Tue, Dec 8, 2020 at 10:07 AM Jun Rao <j...@confluent.io> wrote:
> >
> > > Hi, George,
> > >
> > > Thanks for submitting the KIP. There was an earlier discussing on
> improving
> > > the sticky partitioner in the producer (
> > >
> > >
> https://lists.apache.org/thread.html/rae8d2d5587dae57ad9093a85181e0cb4256f10d1e57138ecdb3ef287%40%3Cdev.kafka.apache.org%3E
> > > ).
> > > It seems to be solving a very similar issue. It would be useful to
> analyze
> > > both approaches and see which one solves the problem better.
> > >
> > > Jun
> > >
> > > On Tue, Dec 8, 2020 at 8:05 AM georgeshu(舒国强) <george...@tencent.com>
> > > wrote:
> > >
> > > > Hello,
> > > >
> > > > We write up a KIP based on a straightforward mechanism implemented
> and
> > > > tested in order to solve a practical issue in production.
> > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-693%3A+Client-side+Circuit+Breaker+for+Partition+Write+Errors
> > > > Look forward to hearing feedback and suggestions.
> > > >
> > > > Thanks!
> > > >
> > > >
> > >
> >
>


-- 
-- Guozhang

Reply via email to