Hi David,

The only issue with yet another config is that nearly everyone will just
leave it at the default value and it is hard to configure. But that is just
my personal opinion. If you need the flexibility, then I don't have any
objection to another config option.

The specific scenario described in the KIP where a connection doesn't get
closed cleanly and the system waits for TCP timeout - in my opinion, that
is the scenario where request.timeout.ms is most useful - to detect a
broker crash where the OS didn't shutdown cleanly. We don't currently
handle this scenario during connection establishment, as you have described
in the KIP.  For this scenario, a single config makes sense to me to ensure
that crashes are detected and handled regardless of the connection state.
The limitation in the consumer with request.timeout.ms  is something that
needs to be fixed anyway since 5 minutes is too long to detect a broker
crash in any connection state.

There could be other scenarios where you want to use different configs. It
will be useful to include any scenarios you have in mind to the KIP.



On Tue, May 23, 2017 at 11:16 AM, 东方甲乙 <254479...@qq.com> wrote:

> Hi Rajini
>
> If we only use the request.timeout for the connect and process request,
> it is not flexible for some cases, for example we need the connecting phase
> to a short time(5s), and the request for a longer time(40s).
>
> Why not add a connect timeout to handler these different cases.
>
> Thanks,
> David
>
> ------------------ 原始邮件 ------------------
> *发件人:* "Rajini Sivaram" <rajinisiva...@gmail.com>;
> *发送时间:* 2017年5月23日(星期二) 19:27
> *收件人:* "dev" <dev@kafka.apache.org>;
> *主题:* Re: Re: [DISCUSS] KIP-148: Add a connect timeout for client
>
>
> Guozhang,
>
> At the moment we don't have a connect timeout. And the behaviour suggested
> in the KIP is useful to address this.
>
> We do however have a request.timeout.ms. This is the amount of time it
> would take to detect a crashed broker if the broker crashed after a
> connection was established. Unfortunately in the consumer, this was
> increased to > 5minutes since JoinRequest can take up to
> max.poll.interval.ms, which has a default of  5 minutes. Since the
> whole point of this timeout is to detect a crashed broker, 5 minutes is too
> large.
>
> My suggestion was to use request.timeout.ms to also detect connection
> timeouts to a crashed broker - implement the behavior suggested in the KIP
> without adding a new config parameter. As Ismael has said, this will need
> to fix request.timeout.ms in the consumer.
>
>
> On Mon, May 22, 2017 at 1:23 PM, Simon Souter <sim...@cakesolutions.net>
> wrote:
>
> > The following tickets are probably relevant to this KIP:
> >
> > https://issues.apache.org/jira/browse/KAFKA-3457
> > https://issues.apache.org/jira/browse/KAFKA-1894
> > https://issues.apache.org/jira/browse/KAFKA-3834
> >
> > On 22 May 2017 at 16:30, Rajini Sivaram <rajinisiva...@gmail.com> wrote:
> >
> > > Ismael,
> > >
> > > Yes, agree. My concern was that a connection can be shutdown uncleanly
> at
> > > any time. If a client is in the middle of a request, then it times out
> > > after min(request.timeout.ms, tcp-timeout). If we add another config
> > > option
> > > connect.timeout.ms, then we will sometimes wait for min(
> > connect.timeout.ms
> > > ,
> > > tcp-timeout) and sometimes for min(request.timeout.ms, tcp-timeout),
> > > depending
> > > on connection state. One config option feels neater to me.
> > >
> > > On Mon, May 22, 2017 at 11:21 AM, Ismael Juma <ism...@juma.me.uk>
> wrote:
> > >
> > > > Rajini,
> > > >
> > > > For this to have the desired effect, we'd probably need to lower the
> > > > default request.timeout.ms for the consumer and fix the underlying
> > > reason
> > > > why it is a little over 5 minutes at the moment.
> > > >
> > > > Ismael
> > > >
> > > > On Mon, May 22, 2017 at 4:15 PM, Rajini Sivaram <
> > rajinisiva...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > Hi David,
> > > > >
> > > > > Sorry, what I meant was: Can you reuse the existing configuration
> > > option
> > > > > request.timeout,ms , instead of adding a new config and add the
> > > behaviour
> > > > > that you have proposed in the KIP for the connection phase using
> this
> > > > > timeout? I think the timeout for connection is useful. I am not
> sure
> > we
> > > > > need another configuration option to implement it.
> > > > >
> > > > > Regards,
> > > > >
> > > > > Rajini
> > > > >
> > > > >
> > > > > On Mon, May 22, 2017 at 11:06 AM, 东方甲乙 <254479...@qq.com> wrote:
> > > > >
> > > > > > Hi Rajini.
> > > > > >
> > > > > > When kafka node' machine is shutdown or network is closed, the
> > > > connecting
> > > > > > phase could not use the request.timeout.ms, because the client
> > > haven't
> > > > > > send a req yet.   And no response for the nio, the selector will
> > not
> > > > > close
> > > > > > the connect, so it will not choose other good node to get the
> > > metadata.
> > > > > >
> > > > > >
> > > > > > Thanks
> > > > > > David
> > > > > >
> > > > > > ------------------ 原始邮件 ------------------
> > > > > > *发件人:* "Rajini Sivaram" <rajinisiva...@gmail.com>;
> > > > > > *发送时间:* 2017年5月22日(星期一) 20:17
> > > > > > *收件人:* "dev" <dev@kafka.apache.org>;
> > > > > > *主题:* Re: [DISCUSS] KIP-148: Add a connect timeout for client
> > > > > >
> > > > > >
> > > > > > Hi David,
> > > > > >
> > > > > > Is there a reason why you wouldn't want to use
> request.timeout.ms
> > as
> > > > the
> > > > > > timeout parameter for connections? Then you would use the same
> > > timeout
> > > > > for
> > > > > > connected and connecting phases when shutdown is unclean. You
> could
> > > > still
> > > > > > use the timeout to ensure that next metadata request is sent to
> > > another
> > > > > > node.
> > > > > >
> > > > > > Regards,
> > > > > >
> > > > > > Rajini
> > > > > >
> > > > > > On Sun, May 21, 2017 at 9:51 AM, 东方甲乙 <254479...@qq.com> wrote:
> > > > > >
> > > > > > > Hi Guozhang,
> > > > > > >
> > > > > > >
> > > > > > > Thanks for the clarify. For the clarify 2, I think the key
> thing
> > is
> > > > not
> > > > > > > users control how much time in maximum to wait for inside code,
> > but
> > > > is
> > > > > > the
> > > > > > > network client can be aware of the connecting can't be finished
> > and
> > > > > try a
> > > > > > > good node. In the producer.sender even the selector.poll can
> > > timeout,
> > > > > but
> > > > > > > the next time is also not close the previous connecting and try
> > > > another
> > > > > > > good node.
> > > > > > >
> > > > > > >
> > > > > > > In out test env, QA shutdown one of the leader node, the
> producer
> > > > send
> > > > > > the
> > > > > > > request will timeout and close the node's connection then
> request
> > > the
> > > > > > > metadata.  But sometimes the request node is also the shutdown
> > > node.
> > > > > > When
> > > > > > > connecting the shutting down node to get the metadata, it is in
> > the
> > > > > > > connecting phase, network client mark the connecting node's
> state
> > > to
> > > > > > > CONNECTING, but if the node is shutdown,  the socket can't be
> > aware
> > > > of
> > > > > > the
> > > > > > > connecting is broken. Though the selector.poll has timeout
> > > parameter,
> > > > > but
> > > > > > > it will not close the connection, so the next
> > > > > > > time in the "networkclient.maybeUpdate" it will check if
> > > > > > > isAnyNodeConnecting, then will not connect to any good node the
> > get
> > > > the
> > > > > > > metadata.  It need about several minutes to
> > > > > > > aware the connecting is timeout and try other node.
> > > > > > >
> > > > > > >
> > > > > > > So I want to add a connect.timeout parameter,  the selector can
> > > find
> > > > > the
> > > > > > > connecting is timeout and close the connection.  It seems the
> > > > currently
> > > > > > the
> > > > > > > timeout value passed in `selector.poll()`
> > > > > > > seems can not do this.
> > > > > > >
> > > > > > >
> > > > > > > Thanks,
> > > > > > > David
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > ------------------ 原始邮件 ------------------
> > > > > > > 发件人: "Guozhang Wang";<wangg...@gmail.com>;
> > > > > > > 发送时间: 2017年5月16日(星期二) 凌晨1:51
> > > > > > > 收件人: "dev@kafka.apache.org"<dev@kafka.apache.org>;
> > > > > > >
> > > > > > > 主题: Re: [DISCUSS] KIP-148: Add a connect timeout for client
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Hi David,
> > > > > > >
> > > > > > > I may be a bit confused before, just clarifying a few things:
> > > > > > >
> > > > > > > 1. As you mentioned, a client will always try to first
> establish
> > > the
> > > > > > > connection with a broker node before it tries to send any
> request
> > > to
> > > > > it.
> > > > > > > And after connection is established, it will either
> continuously
> > > send
> > > > > > many
> > > > > > > requests (e.g. produce) for just a single request (e.g.
> metadata)
> > > to
> > > > > the
> > > > > > > broker, so these two phases are indeed different.
> > > > > > >
> > > > > > > 2. In the connected phase, connections.max.idle.ms is used to
> > > > > > > auto-disconnect the socket if no requests has been sent /
> > received
> > > > > during
> > > > > > > that period of time; in the connecting phase, we always try to
> > > create
> > > > > the
> > > > > > > socket via "socketChannel.connect" in a non-blocking call, and
> > then
> > > > > > checks
> > > > > > > if the connection has been established, but all the callers of
> > this
> > > > > > > function (in either producer or consumer) has a timeout
> parameter
> > > as
> > > > in
> > > > > > > `selector.poll()`, and the timeout parameter is set either by
> > > > > > calculations
> > > > > > > based on metadata.expiration.time and backoff for
> > producer#sender,
> > > or
> > > > > by
> > > > > > > directly passed values from consumer#poll(timeout), so although
> > > there
> > > > > is
> > > > > > no
> > > > > > > directly config controlling that, users can still control how
> > much
> > > > time
> > > > > > in
> > > > > > > maximum to wait for inside code.
> > > > > > >
> > > > > > > I originally thought your scenarios is more on the connected
> > phase,
> > > > but
> > > > > > now
> > > > > > > I feel you are talking about the connecting phase. For that
> > case, I
> > > > > still
> > > > > > > feel currently the timeout value passed in `selector.poll()`
> > which
> > > is
> > > > > > > controllable from user code should be sufficient?
> > > > > > >
> > > > > > >
> > > > > > > Guozhang
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Sun, May 14, 2017 at 2:37 AM, 东方甲乙 <254479...@qq.com>
> wrote:
> > > > > > >
> > > > > > > > Hi Guozhang,
> > > > > > > >
> > > > > > > >
> > > > > > > > Sorry for the delay, thanks for the question.  It seems two
> > > > different
> > > > > > > > parameters to me:
> > > > > > > > connect.timeout.ms: only work for the connecting phrase,
> after
> > > > > > connected
> > > > > > > > phrase this parameter is not used.
> > > > > > > > connections.max.idle.ms: currently not work in the
> connecting
> > > > phrase
> > > > > > > > (only select return readyKeys >0) will add to the expired
> > > manager,
> > > > > > after
> > > > > > > > connected will check if the connection is still alive in some
> > > time.
> > > > > > > >
> > > > > > > >
> > > > > > > > Even if we change the connections.max.idle.ms to work
> > including
> > > > the
> > > > > > > > connecting phrase, we can not set this parameter to a small
> > > value,
> > > > > such
> > > > > > > as
> > > > > > > > 5 seconds. Because the client is maybe busy sending message
> to
> > > > other
> > > > > > > node,
> > > > > > > > it will be disconnected in 5 seconds, so the default value of
> > > > > > > > connections.max.idle.ms is setting to a larger time. We
> should
> > > > have
> > > > > > two
> > > > > > > > parameters to control the connecting phrase behavior and the
> > > > > connected
> > > > > > > > phrase behavior, do you think so?
> > > > > > > >
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > >
> > > > > > > > David
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > ------------------ 原始邮件 ------------------
> > > > > > > > 发件人: "Guozhang Wang";<wangg...@gmail.com>;
> > > > > > > > 发送时间: 2017年5月6日(星期六) 上午7:52
> > > > > > > > 收件人: "dev@kafka.apache.org"<dev@kafka.apache.org>;
> > > > > > > >
> > > > > > > > 主题: Re: [DISCUSS] KIP-148: Add a connect timeout for client
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Hello David,
> > > > > > > >
> > > > > > > > Thanks for the KIP. For the described issue, I'm wondering if
> > it
> > > > can
> > > > > be
> > > > > > > > resolved by tuning the CONNECTIONS_MAX_IDLE_MS_CONFIG (
> > > > > > > > connections.max.idle.ms) on the client side? Default is 9
> > > minutes.
> > > > > > > >
> > > > > > > >
> > > > > > > > Guozhang
> > > > > > > >
> > > > > > > > On Tue, May 2, 2017 at 8:22 AM, 东方甲乙 <254479...@qq.com>
> wrote:
> > > > > > > >
> > > > > > > > > Hi all,
> > > > > > > > >
> > > > > > > > > Currently in our test environment, we found that after one
> of
> > > the
> > > > > > > broker
> > > > > > > > > node crash (reboot or os crash), the client may still be
> > > > connecting
> > > > > > to
> > > > > > > > the
> > > > > > > > > crash node to send metadata request or other request, and
> it
> > > > needs
> > > > > > > > several
> > > > > > > > > minutes to be aware that the connection is timeout then try
> > > > another
> > > > > > > node
> > > > > > > > to
> > > > > > > > > connect to send the request. Then the client may still not
> be
> > > > aware
> > > > > > of
> > > > > > > > the
> > > > > > > > > metadata change after several minutes.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > So I want to add a connect timeout on the  client,  please
> > > take a
> > > > > > look
> > > > > > > > at:
> > > > > > > > >
> > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > > > > 148%3A+Add+a+connect+timeout+for+client
> > > > > > > > >
> > > > > > > > > Regards,
> > > > > > > > >
> > > > > > > > > David
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > -- Guozhang
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > -- Guozhang
> > > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > [image: cake_logo_strap_screen 400.jpg] <http://www.cakesolutions.net>
> >
> > Simon Souter
> > (Office) 0845 617 1200
> > Houldsworth Mill, Houldsworth Street, Reddish, Stockport, SK5 6DA, UK
> > [image: twitter-circle-darkgrey.png]
> > <https://twitter.com/cakesolutions> [image:
> > facebook-circle-darkgrey.png]
> > <https://www.facebook.com/cakesolutionslimited/> [image:
> > linkedin-circle-darkgrey.png]
> > <https://www.linkedin.com/company/cake-solutions-limited>
> > [image: Reactive Applications]
> > <https://cakesolutions.sigstr.net/uc/588780e6825be936ed5682e0>
> > Company registered in the UK, No. 4184567 If you have received this
> e-mail
> > in error, please accept our apologies, destroy it immediately, and it
> would
> > be greatly appreciated if you notified the sender. It is your
> > responsibility to protect your system from viruses and any other harmful
> > code or device. We try to eliminate them from e-mails and attachments,
> but
> > we accept no liability for any which remain. We may monitor or access any
> > or all e-mails sent to us.
> > [image: Powered by Sigstr]
> > <https://cakesolutions.sigstr.net/uc/588780e6825be936ed5682e0/watermark>
> >
>
>

Reply via email to