Hi David, I may be a bit confused before, just clarifying a few things:
1. As you mentioned, a client will always try to first establish the connection with a broker node before it tries to send any request to it. And after connection is established, it will either continuously send many requests (e.g. produce) for just a single request (e.g. metadata) to the broker, so these two phases are indeed different. 2. In the connected phase, connections.max.idle.ms is used to auto-disconnect the socket if no requests has been sent / received during that period of time; in the connecting phase, we always try to create the socket via "socketChannel.connect" in a non-blocking call, and then checks if the connection has been established, but all the callers of this function (in either producer or consumer) has a timeout parameter as in `selector.poll()`, and the timeout parameter is set either by calculations based on metadata.expiration.time and backoff for producer#sender, or by directly passed values from consumer#poll(timeout), so although there is no directly config controlling that, users can still control how much time in maximum to wait for inside code. I originally thought your scenarios is more on the connected phase, but now I feel you are talking about the connecting phase. For that case, I still feel currently the timeout value passed in `selector.poll()` which is controllable from user code should be sufficient? Guozhang On Sun, May 14, 2017 at 2:37 AM, 东方甲乙 <254479...@qq.com> wrote: > Hi Guozhang, > > > Sorry for the delay, thanks for the question. It seems two different > parameters to me: > connect.timeout.ms: only work for the connecting phrase, after connected > phrase this parameter is not used. > connections.max.idle.ms: currently not work in the connecting phrase > (only select return readyKeys >0) will add to the expired manager, after > connected will check if the connection is still alive in some time. > > > Even if we change the connections.max.idle.ms to work including the > connecting phrase, we can not set this parameter to a small value, such as > 5 seconds. Because the client is maybe busy sending message to other node, > it will be disconnected in 5 seconds, so the default value of > connections.max.idle.ms is setting to a larger time. We should have two > parameters to control the connecting phrase behavior and the connected > phrase behavior, do you think so? > > > Thanks, > > > David > > > > > ------------------ 原始邮件 ------------------ > 发件人: "Guozhang Wang";<wangg...@gmail.com>; > 发送时间: 2017年5月6日(星期六) 上午7:52 > 收件人: "dev@kafka.apache.org"<dev@kafka.apache.org>; > > 主题: Re: [DISCUSS] KIP-148: Add a connect timeout for client > > > > Hello David, > > Thanks for the KIP. For the described issue, I'm wondering if it can be > resolved by tuning the CONNECTIONS_MAX_IDLE_MS_CONFIG ( > connections.max.idle.ms) on the client side? Default is 9 minutes. > > > Guozhang > > On Tue, May 2, 2017 at 8:22 AM, 东方甲乙 <254479...@qq.com> wrote: > > > Hi all, > > > > Currently in our test environment, we found that after one of the broker > > node crash (reboot or os crash), the client may still be connecting to > the > > crash node to send metadata request or other request, and it needs > several > > minutes to be aware that the connection is timeout then try another node > to > > connect to send the request. Then the client may still not be aware of > the > > metadata change after several minutes. > > > > > > So I want to add a connect timeout on the client, please take a look > at: > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- > > 148%3A+Add+a+connect+timeout+for+client > > > > Regards, > > > > David > > > > > -- > -- Guozhang > -- -- Guozhang