[ 
https://issues.apache.org/jira/browse/KAFKA-5678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16107288#comment-16107288
 ] 

cuiyang edited comment on KAFKA-5678 at 7/31/17 1:14 PM:
---------------------------------------------------------

Unfortunately, this issue can still be reproduced on our Kafka cluster even if 
we have already upgraded it to 0.10.2.1. Our producer occurs "Broker timeout" 
error when we restart the brokers one by one, and the only thing our producer 
can do is throws the timeout record away because our producer is invoked by our 
Web Server.

We set the ACKs of our producer to -1,  but it seems to not working. So I think 
this issue still exists in 0.10.x version.

I also think we should return the response of DeleyProducer to producer 
immediately once leader switch happened, so producer can get to know what 
happened in time, and make a retry after "back.off" time without receiving 
request timeout.

--------  Leader Reelection  ---------------   DelayProducer timeout    
----------- Broker Shutdown complete


was (Author: cuiyang):
Unfortunately, this issue can still be reproduced on our Kafka cluster even if 
we have already upgraded it to 0.10.2.1. Our producer occurs "Broker timeout" 
error when we restart the brokers one by one, and the only thing our producer 
can do is throws the timeout record away because our producer is invoked by our 
Web Server.

We set the ACKs of our producer to -1,  but it seems to not working. So I think 
this issue still exists in 0.10.x version.

I also think we should return the response of DeleyProducer to producer 
immediately once leader switch happened, so that producer can get know what 
happened in time, and make a retry after "back.off" time without receiving 
request timeout.

--------  Leader Reelection  ---------------   DelayProducer timeout    
----------- Broker Shutdown complete

> When the broker graceful shutdown occurs, the producer side sends timeout.
> --------------------------------------------------------------------------
>
>                 Key: KAFKA-5678
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5678
>             Project: Kafka
>          Issue Type: Improvement
>    Affects Versions: 0.9.0.0, 0.10.0.0, 0.11.0.0
>            Reporter: tuyang
>
> Test environment as follows.
> 1.Kafka version:0.9.0.1
> 2.Cluster with 3 broker which with broker id A,B,C 
> 3.Topic with 6 partitions with 2 replicas,with 2 leader partitions at each 
> broker.
> We can reproduce the problem as follows.
> 1.we send message as quickly as possible with ack -1.
> 2.if partition p0's leader is on broker A and we graceful shutdown broker 
> A,but we send a message to p0 before the leader is reelect, so the message 
> can be appended to the leader replica successful, but if the follower replica 
> not catch it as quickly as possible, so the shutting down broker will create 
> a delayProduce for this request to wait complete until request.timeout.ms .
> 3.because of the controllerShutdown request from broker A, then the p0 
> partition leader will reelect
> , then the replica on broker A will become follower before complete shut 
> down.then the delayProduce will not be trigger to complete until expire. 
> 4.if broker A shutdown cost too long, then the producer will get response 
> after request.timeout.ms, which results in increase the producer send latency 
> when we are restarting broker one by one.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to