[ 
https://issues.apache.org/jira/browse/KAFKA-9800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17123377#comment-17123377
 ] 

Cheng Tan commented on KAFKA-9800:
----------------------------------

Recap the discussion in Github. We want to implement a per-request backoff for 
all types of clients.

 

Let me talk about two of my major concerns and thoughts about implementing the 
universal client exponential backoff.

*AdminClient logic redundant*

The main request flow difference btw AdminClient and normal clients (e.g. 
Producer and Consumer) would be that AdminClient wants to have a per request 
timeout while normal clients is okay with a static default timeout. Thus, 
AdminClient rewrote a quite amount of NetworkClient's functionality.

For example,
 # Wrapping the request builder into a new class {{Call}}, (the construction 
lambda adds tons of lines into the AdminClient.java, which should probably have 
been living in each AbstractRequest implementation classes files)
 # Re-writing the request queues for different request status, while normal 
clients are fully using the NetworkClient.

These logics will become redundant after we support exponential backoff in 
NetworkClient for all types of clients. Are we considering refactoring the 
AdminClient further and remove all the redundant logic which should have 
belonged to the networking layer and the AbstractRequest implementation classes?

*Flexible backoff modes*

Let's analyze the request backoff demands of all the types of clients at this 
point. In my opinion, there are simply two:
 # Requests do not need exponential backoff. These requests need to be sent 
ASAP to avoid dataflow performance degradation, such as the {{ProduceRequest}} 
and its related/preceding metadata requests.

 # Request do need exponential backoff. These requests are “second-class 
citizens” and can be throttled to avoid request storms on the broker side. Such 
as metadata related requests in AdminClient.

Now the question comes. Even when two requests are of the same request type, 
one may have to get sent ASAP while the other one may wait, depending on the 
use case. We need to think deeper about how to make a classification.

But the implementation would be simple. We can utilize the existing builder 
pattern AbstractRequest and build the request flexibly upon a given 
retry_backoff mode. For example,
 # AbstractRequest.Builder will interact with a new abstract class specifying 
the retry_backoff option, static or exponential.
 # AbstractRequest will have some new interfaces controlling the backoff.

Then, we can control if the request should have a static backoff or an 
exponential backoff when we construct each implementation instance of 
AbstractRequest.Builder.

I'll include more details in the Jira ticket and rewrite this PR. Before we 
talk more about the code details and start the new implementation, please let 
me know what you think about the AdminClient refactor and static/exponential 
retry_backoff classification rule.

 

> [KIP-580] Client Exponential Backoff Implementation
> ---------------------------------------------------
>
>                 Key: KAFKA-9800
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9800
>             Project: Kafka
>          Issue Type: New Feature
>            Reporter: Cheng Tan
>            Assignee: Cheng Tan
>            Priority: Major
>              Labels: KIP-580
>
> In {{KafkaAdminClient}}, we will have to modify the way the retry backoff is 
> calculated for the calls that have failed and need to be retried. >From the 
> current static retry backoff, we have to introduce a mechanism for all calls 
> that upon failure, the next retry time is dynamically calculated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to