[ 
https://issues.apache.org/jira/browse/KAFKA-5678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16110265#comment-16110265
 ] 

cuiyang edited comment on KAFKA-5678 at 8/2/17 4:15 AM:
--------------------------------------------------------

[~becket_qin] 
Thank you for your nice explanation . I have viewed your discussion and 
understood your point. 
What the two things I wonder to know:

  1 Whether we can reduce the risk of this issue by setting a larger value of 
"request.timeout.ms" (eg. from 10s to 30s)? I think if we set it to a large 
value, the event of "broker totally shutting down" will occur before the error 
of "request.timeout", thus Producer will receive error of  "socket connection 
reset by peer" instead "broker timeout", and Producer can make a retry 
internally without throw a exception to upper layer application. Right? 
Out applications will throw the message away once the timeout error happens, so 
the timeout error is fatal error for us.

  2 Following the latest discussion above, I want to confirm whether this issue 
happens both in the case of  broker shutdowns(regardless it is the controller 
or not ) and the case of leader switches purely without shutting down the 
broker?


was (Author: cuiyang):
[~becket_qin] 
Thank you for your nice explanation . I have viewed your discussion and 
understood your point. 
What the two things I wonder to know:

  1 Whether we can reduce the risk of this issue by setting a larger value of 
"request.timeout.ms" (eg. from 10s to 30s)? I think if we set it to a large 
value, the event of "broker totally shutting down" will occur before the error 
of "request.timeout", thus Producer will receive error of  "socket connection 
reset by peer" instead "broker timeout", and Producer can make a retry 
internally without throw a exception to upper layer application. Right? 
Out applications will throw the message away once the timeout error happens, so 
the timeout error is fatal error for us.

  2 Following the latest discussion above, I want to confirm whether this issue 
happens both in the case of  broker shutdowns(regardless it is the controller 
or not ) and the case of leader switches purely

> When the broker graceful shutdown occurs, the producer side sends timeout.
> --------------------------------------------------------------------------
>
>                 Key: KAFKA-5678
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5678
>             Project: Kafka
>          Issue Type: Improvement
>    Affects Versions: 0.9.0.0, 0.10.0.0, 0.11.0.0
>            Reporter: tuyang
>
> Test environment as follows.
> 1.Kafka version:0.9.0.1
> 2.Cluster with 3 broker which with broker id A,B,C 
> 3.Topic with 6 partitions with 2 replicas,with 2 leader partitions at each 
> broker.
> We can reproduce the problem as follows.
> 1.we send message as quickly as possible with ack -1.
> 2.if partition p0's leader is on broker A and we graceful shutdown broker 
> A,but we send a message to p0 before the leader is reelect, so the message 
> can be appended to the leader replica successful, but if the follower replica 
> not catch it as quickly as possible, so the shutting down broker will create 
> a delayProduce for this request to wait complete until request.timeout.ms .
> 3.because of the controllerShutdown request from broker A, then the p0 
> partition leader will reelect
> , then the replica on broker A will become follower before complete shut 
> down.then the delayProduce will not be trigger to complete until expire. 
> 4.if broker A shutdown cost too long, then the producer will get response 
> after request.timeout.ms, which results in increase the producer send latency 
> when we are restarting broker one by one.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to