Re: IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpitimeouts

Valentin Kulichenko Wed, 18 Jul 2018 13:53:09 -0700

Hi Stan,

I have one more point that I'm trying to clarify. You mentioned that
clientFailureDetectionTimeout is ignored by client nodes. But if so, what
happens on the client side in case of network failure (i.e. if both client
and server nodes are alive but can't communicate with each other)? It
sounds like client will disconnect after failureDetectionTimeout, while
server will remove the client after clientFailureDetectionTimeout. Is this
correct? If yes, I think this can lead to very strange behavior if those
timeouts are different.


What do you think? Is this a valid concern?

-Val

On Mon, Jul 9, 2018 at 1:33 PM Valentin Kulichenko <
valentin.kuliche...@gmail.com> wrote:

> If clientFailureDetectionTimeout is not set on server node, will it use 
> failureDetectionTimeout
> instead?
>
> Either way, this configuration seems to be a bit confusing, but I don't
> think we can change it now. Let's just make sure it's properly documented.
>
> -Val
>
> On Mon, Jul 9, 2018 at 5:47 AM Stanislav Lukyanov <stanlukya...@gmail.com>
> wrote:
>
>> Server will use its failureDetectionTimeout when talking to servers and
>> clientFailureDetectionTimeout when talking to clients.
>> E.g. a Communication link from server to server uses a
>> failureDetectionTimeout, and server to client uses a
>> clientFailureDetectionTimeout.
>>
>> Client will use its failureDetectionTimeout all the time, ignoring
>> clientFailureDetectionTimeout.
>>
>> There is even a possibility of asymmetric settings.
>> Say, server and client use the same config, failureDetectionTimeout=10
>> and clientFailureDetectionTimeout=20.
>> When these two nodes communicate, server will use timeouts of 20 seconds
>> and client will use timeout of 10 seconds.
>>
>> Stan
>>
>> From: Valentin Kulichenko
>> Sent: 6 июля 2018 г. 23:17
>> To: dev@ignite.apache.org
>> Subject: Re: IgniteConfiguration, TcpDiscoverySpi,
>> TcpCommunicationSpitimeouts
>>
>> Stan,
>>
>> Can you explain the semantics of both parameters? How do they behave when
>> set on client or on server?
>>
>> -Val
>>
>> On Fri, Jul 6, 2018 at 6:12 AM Stanislav Lukyanov <stanlukya...@gmail.com
>> >
>> wrote:
>>
>> > We could just use failureDetectionTimeout all the time I guess.
>> > The only benefit of clientFailureDetectionTimeout is that it may allow
>> > clients to be slower/on a slower network than servers.
>> >
>> > Do you think it isn’t worth to have a separate setting just for that?
>> >
>> > Thanks,
>> > Stan
>> >
>> > From: Valentin Kulichenko
>> > Sent: 5 июля 2018 г. 18:16
>> > To: dev@ignite.apache.org
>> > Subject: Re: IgniteConfiguration, TcpDiscoverySpi,
>> > TcpCommunicationSpitimeouts
>> >
>> > Stan,
>> >
>> > What is the purpose of clientFailureDetectionTimeout? Why can't we just
>> > always use failureDetectionTimeout? Is there any difference between
>> these
>> > two timeouts?
>> >
>> > -Val
>> >
>> >
>> >
>> > On Wed, Jul 4, 2018 at 7:00 AM Stanislav Lukyanov <
>> stanlukya...@gmail.com>
>> > wrote:
>> >
>> > > Hi,
>> > >
>> > > I’ve updated the proposed documentation update with a description of
>> > > metricsUpdateFrequency and a detailed description of
>> > > failureDetectionTimeout and clientFailureDetectionTimeout relations.
>> The
>> > > draft is attached to
>> https://issues.apache.org/jira/browse/IGNITE-7704.
>> > >
>> > > It seems that relation between failureDetectionTimeout and
>> > > clientFailureDetectionTimeout is currently too tricky and should also
>> be
>> > > changed in future.
>> > > The problem is that in a server-client connection the server will use
>> > > clientFailureDetectionTimeout but client will use
>> > failureDetectionTimeout.
>> > > In other words, clients ignore clientFailureDetectionTimeout and just
>> use
>> > > failureDetectionTimeout. Because of that, one has to provide different
>> > > values of failureDetectionTimeout in server and client configs which
>> > seems
>> > > confusing and inconvenient.
>> > > So I’d like to add one more point to my earlier proposal:
>> > >
>> > > 5. Always use clientFailureDetectionTimeout on clients instead of
>> > > failureDetectionTimeout
>> > > *What*: change code to use clientFailureDetectionTimeout on clients
>> > > *When*: update code and readme.io docs in 2.7
>> > >
>> > > Thanks,
>> > > Stan
>> > >
>> > > From: Valentin Kulichenko
>> > > Sent: 30 мая 2018 г. 19:09
>> > > To: dev@ignite.apache.org
>> > > Subject: Re: IgniteConfiguration, TcpDiscoverySpi,
>> > > TcpCommunicationSpitimeouts
>> > >
>> > > Stan,
>> > >
>> > > Looks like you suggest to only change the default. If so, it's OK. But
>> > > let's not change the behavior of these timeouts for the case they are
>> > > explicitly set in config.
>> > >
>> > > Thanks,
>> > > Val
>> > >
>> > > On Wed, May 30, 2018 at 1:06 AM, Stanislav Lukyanov <
>> > > stanlukya...@gmail.com>
>> > > wrote:
>> > >
>> > > > On networkTimeout: no, we don’t have anything like that in
>> > > > TcpCommunicationSpi.
>> > > >
>> > > > On socketWriteTimeout:
>> > > > First, its semantic is very close to TcpDicsoverySpi.socketTimeout
>> > (with
>> > > > the exception that communication uses NIO), and the latter defaults
>> to
>> > > > failureDetectionTimeout,
>> > > > so I think it would help to avoid confusion.
>> > > > Second, I think we can’t deprecate something without an alternative
>> > that
>> > > > would work for most users.
>> > > > On the other hand, if we do default socketWriteTimeout to
>> > > > failureDetectionTimeout then we reach a pretty decent API state
>> > > > where one only needs two properties in IgniteConfiguration neither
>> of
>> > > > which we’re considering for deprecation and removal in 3.0.
>> > > >
>> > > > Stan
>> > > >
>> > > > From: Valentin Kulichenko
>> > > > Sent: 29 мая 2018 г. 22:17
>> > > > To: dev@ignite.apache.org
>> > > > Subject: Re: IgniteConfiguration, TcpDiscoverySpi,
>> > > > TcpCommunicationSpitimeouts
>> > > >
>> > > > Stan,
>> > > >
>> > > > OK, I got confused a little :)
>> > > >
>> > > > I do agree that TcpDiscoverySpi.networkTimeout should inherit from
>> > > > IgniteConfiguration.networkTImeout if not set explicitly. Do we have
>> > the
>> > > > same setting for TcpCommunicationSpi, BTW? If yes, behavior should
>> be
>> > > > consistent.
>> > > >
>> > > > As for TcpCommunicationSpi.socketWriteTimeout, I'm not sure why you
>> > want
>> > > > to
>> > > > change its behavior. Can we just deprecate it and eventually remove,
>> > just
>> > > > as we plan to do for all timeouts from #2?
>> > > >
>> > > > -Val
>> > > >
>> > > > On Tue, May 29, 2018 at 3:50 AM, Stanislav Lukyanov <
>> > > > stanlukya...@gmail.com>
>> > > > wrote:
>> > > >
>> > > > > Val,
>> > > > >
>> > > > > Which timeouts do you mean?
>> > > > >
>> > > > > In #2 I don’t propose to change behavior.
>> > > > >
>> > > > > I propose to change behavior for a couple of settings in #3
>> though.
>> > > > > I believe the correct approach here would be to target the
>> behavior
>> > > > change
>> > > > > for 2.6,
>> > > > > but keep in mind that we’ll need to carefully analyze the impact
>> > before
>> > > > > actually making the changes.
>> > > > >
>> > > > > Thanks,
>> > > > > Stan
>> > > > >
>> > > > > From: Valentin Kulichenko
>> > > > > Sent: 29 мая 2018 г. 0:57
>> > > > > To: dev@ignite.apache.org
>> > > > > Subject: Re: IgniteConfiguration, TcpDiscoverySpi,
>> > > > > TcpCommunicationSpitimeouts
>> > > > >
>> > > > > Hi Stan,
>> > > > >
>> > > > > I'm 100% for this activity, however I don't think we should change
>> > the
>> > > > > behavior of timeouts you listed in #2 - this can lead to
>> unexpected
>> > > > > behavior for users who already use them. I would just deprecate
>> them
>> > > and
>> > > > > eventually remove.
>> > > > >
>> > > > > -Val
>> > > > >
>> > > > > On Mon, May 28, 2018 at 1:29 PM, Stanislav Lukyanov <
>> > > > > stanlukya...@gmail.com>
>> > > > > wrote:
>> > > > >
>> > > > > > Hi folks,
>> > > > > >
>> > > > > > It looks like we stopped half-way with this activity. I’d like
>> to
>> > > pick
>> > > > it
>> > > > > > up.
>> > > > > >
>> > > > > > All seem to agree that we should simplify the timeout settings.
>> > > > > > Here are the specific actions I’d like to propose:
>> > > > > >
>> > > > > > 1. Promote the use of global timeouts as the best practice
>> > > > > > *What*: update the docs to encourage users to rely on the
>> following
>> > > > > > timeouts for their “network stability” settings
>> > > > > > IgniteConfiguration.failureDetectionTimeout
>> > > > > > IgniteConfiguration.clientFailureDetectionTimeout
>> > > > > > IgniteConfiguration.networkTimeout
>> > > > > > *When*: update readme.io docs for 2.5 and Javadoc for 2.6
>> > > > > >
>> > > > > > 2. Discourage the use of finer timeouts
>> > > > > > *What*:
>> > > > > > - update the docs to discourage users to use the following
>> timeouts
>> > > and
>> > > > > > announce their upcoming deprecation and removal
>> > > > > > TcpDiscoverySpi.socketTimeout
>> > > > > > TcpDiscoverySpi.ackTimeout
>> > > > > > TcpDiscoverySpi.maxAckTimeout
>> > > > > > TcpDiscoverySpi.reconnectCount
>> > > > > > TcpCommunicationSpi.connectTimeout
>> > > > > > TcpCommunicationSpi.maxConnectTimeout
>> > > > > > TcpCommunicationSpi.reconnectCount
>> > > > > > - deprecate the properties in code
>> > > > > > - remove the properties in code
>> > > > > > *When*:
>> > > > > > - readme.io update with deprecation announcement for 2.5
>> > > > > > - @Deprecated in code + Javadoc update + respective readme.io
>> > > > rewording
>> > > > > > for 2.6
>> > > > > > - properties removal in 3.0
>> > > > > >
>> > > > > > 3. Make “orphan” timeouts rely on global timeouts, then
>> deprecate
>> > and
>> > > > > > remove
>> > > > > > *What*:
>> > > > > > Two settings currently don’t default to the global equivalents,
>> > > > although
>> > > > > > they should:
>> > > > > > - TcpCommunicationSpi.socketWriteTimeout should default to
>> > > > > > failureDetectionTimeout
>> > > > > > - TcpDiscoverySpi.networkTimeout should default to
>> > > IgniteConfiguration.
>> > > > > > networkTImeout
>> > > > > > So the course of action would be:
>> > > > > > - update the docs to explain that these timeouts have to be used
>> > for
>> > > > now,
>> > > > > > but announce their upcoming deprecation and removal
>> > > > > > - change the properties to default to their global counterparts
>> and
>> > > > > > deprecate them in code
>> > > > > > - remove the properties in code
>> > > > > > *When*:
>> > > > > > - readme.io update with deprecation announcement for 2.5
>> > > > > > - changing defaults + @Deprecated in code + Javadoc update +
>> > > respective
>> > > > > > readme.io rewording for 2.6
>> > > > > > - properties removal in 3.0
>> > > > > >
>> > > > > > 4. Don’t touch other timeouts
>> > > > > > Other timeouts, like TcpDiscoverySpi.joinTimeout or
>> > > > TcpCommunicationSpi.
>> > > > > idleConnectionTimeout,
>> > > > > > are orthogonal to the whole
>> > > > > > “network stability” theme discussed above, and don’t have to be
>> > > > changed.
>> > > > > >
>> > > > > > Finally, I’ve prepared a draft of the docs page that may be used
>> > as a
>> > > > > base
>> > > > > > for the readme.io update.
>> > > > > > This email is pretty long already, so please find the draft
>> > attached
>> > > to
>> > > > > > the JIRA issue
>> > > > > > https://issues.apache.org/jira/browse/IGNITE-7704.
>> > > > > >
>> > > > > > Please share your thoughts.
>> > > > > >
>> > > > > > Thanks,
>> > > > > > Stan
>> > > > > >
>> > > > > > From: Alexey Popov
>> > > > > > Sent: 1 марта 2018 г. 17:01
>> > > > > > To: dev@ignite.apache.org
>> > > > > > Subject: IgniteConfiguration, TcpDiscoverySpi,
>> TcpCommunicationSpi
>> > > > > timeouts
>> > > > > >
>> > > > > > Hi Igniters,
>> > > > > >
>> > > > > > We often see similar questions from users and customers related
>> to
>> > > > > > IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpi
>> timeouts
>> > > and
>> > > > > > their
>> > > > > > relations. And we see several side-effects after incorrect
>> timeout
>> > > > > > configuration.
>> > > > > >
>> > > > > > I tried to briefly describe these timeout settings (please see
>> > below)
>> > > > and
>> > > > > > found out that the most of them do not have sense in terms of
>> > cluster
>> > > > > > functions/operations and could not be explained to the users.
>> > > > > >
>> > > > > > I propose to deprecate most of them and leave only the timeouts
>> we
>> > > can
>> > > > > > explain in common terms ( (setFailureDetectionTimeout,
>> > > > setNetworkTimeout,
>> > > > > > setJoinTimeout and some others).
>> > > > > >
>> > > > > > Please let me know your thoughts.
>> > > > > >
>> > > > > > Thanks,
>> > > > > > Alexey
>> > > > > >
>> > > > > > GLOBAL:
>> > > > > >
>> > > > > > IgniteConfiguration.setNetworkTimeout:
>> > > > > > It is a global timeout for high-level operations where a
>> network is
>> > > > > > involved. For instance, IgniteMessaging delivery uses this
>> timeout
>> > or
>> > > > > > DiscoverySpi handshake.
>> > > > > >
>> > > > > > IgniteConfiguration.setFailureDetectionTimeout:
>> > > > > > It is a global timeout for detecting failures at IgniteSpi
>> > > > > implementations
>> > > > > > (including DiscoverySpi and CommunicationSpi).
>> > > > > > The failure detection algorithm actually limits a range of
>> simple
>> > > > network
>> > > > > > operations related to a single logical operation (for instance,
>> a
>> > > > > reliable
>> > > > > > delivery of some DiscoverySpi message within a cluster).
>> > > > > > Failure detection timeout is a cumulative timeout for a socket
>> > > > > connection,
>> > > > > > sending and receiving data bytes and all possible socket retries
>> > (if
>> > > > some
>> > > > > > failure happens).
>> > > > > > This timeout is intended to simplify the failure detection
>> > condition
>> > > > > from a
>> > > > > > user perspective.
>> > > > > >
>> > > > > > IgniteConfiguration.setClientFailureDetectionTimeout: - it is a
>> > > > special
>> > > > > > case
>> > > > > > for DiscoverySpi client-node Ignite.
>> > > > > >
>> > > > > > TCP DISCOVERY SPI:
>> > > > > >
>> > > > > > If you need more control over failure detection algorithm for
>> > > > > > TcpDiscoverySpi you can explicitly use the following low-level
>> > > options
>> > > > > > (that
>> > > > > > will disable failureDetectoinTimeout logic):
>> > > > > >
>> > > > > > 1. TcpDiscoverySpi.setConnectTimeout - socket connection timeout
>> > > > > > 2. TcpDiscoverySpi.setReconnectCount - number of reconnect
>> attempts
>> > > > used
>> > > > > > when establishing connection with the remote node and sending
>> > > messages
>> > > > to
>> > > > > > it
>> > > > > > 3. TcpDiscoverySpi.setSocketTimeout - socket write timeout. The
>> > write
>> > > > > > operation will be repeated getReconnectCount() times if it
>> exceeds
>> > > this
>> > > > > > timeout
>> > > > > > 4. TcpDiscoverySpi.setAckTimeout - message acknowledgment
>> timeout.
>> > > If a
>> > > > > > message acknowledgment is not received within this timeout,
>> sending
>> > > is
>> > > > > > considered as failed and SPI will try to repeat send operation.
>> It
>> > is
>> > > > > > automatically doubled for simultaneous retries up to
>> > getMaxAckTimeout
>> > > > > > value.
>> > > > > > 5. TcpDiscoverySpi.setMaxAckTimeout - maximum connection
>> timeout,
>> > if
>> > > > the
>> > > > > > getAckTimeout reaches getMaxAckTimeout then SPI give up sending
>> > > retries
>> > > > > >
>> > > > > > Another important TcpDiscoverySpi timeouts:
>> > > > > >
>> > > > > > TcpDiscoverySpi.setJoinTimeout - It is a timeout for join
>> process
>> > > when
>> > > > a
>> > > > > > new/restarted node joins a cluster. The node tries to connect to
>> > all
>> > > > > > available IP addresses provided by ipFinder within this timeout.
>> > > > > > If the timeout is exceeded, the node will give up and throw an
>> > > > exception
>> > > > > > from Ignition.start().
>> > > > > >
>> > > > > > TcpDiscoverySpi.setNetworkTimeout - timeout for high-level
>> > operations
>> > > > > like
>> > > > > > handshake. It looks like it should be deprecated and the
>> > > > > > IgniteConfiguration.getNetworkTimeout should be used here.
>> > > > > >
>> > > > > > TCP COMMUNICATION SPI:
>> > > > > >
>> > > > > > If you need more control over failure detection algorithm for
>> > > > > > TcpCommunicationSpi you can explicitly use the following
>> low-level
>> > > > > options
>> > > > > > (that will disable failureDetectoinTimeout logic):
>> > > > > >
>> > > > > > 1. TcpCommunicationSpi.setConnectTimeout - socket connection
>> > timeout,
>> > > > > will
>> > > > > > be automatically doubled for simultaneous retries (up to
>> > > > > getReconnectCount)
>> > > > > > related to a single logical operation
>> > > > > > 2. TcpCommunicationSpi.setMaxConnectTimeout - maximum connection
>> > > > > timeout,
>> > > > > > the higher limit of getReconnectCount-times doubled
>> > getConnectTimeout
>> > > > > > 3. TcpCommunicationSpi.setReconnectCount - number of reconnect
>> > > > attempts
>> > > > > > used
>> > > > > > when establishing connection with the remote node and sending
>> > > messages
>> > > > to
>> > > > > > it
>> > > > > >
>> > > > > > Another important TcpCommunicationSpi timeouts:
>> > > > > >
>> > > > > > TcpDiscoverySpi.setSocketWriteTimeout - timeout to send a
>> message.
>> > > > > > TcpDiscoverySpi.setIdleConnectionTimeout - maximum idle
>> connection
>> > > > > timeout
>> > > > > > upon which a connection will be closed.
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > --
>> > > > > > Sent from:
>> http://apache-ignite-developers.2346864.n4.nabble.com/
>> > > > > >
>> > > > > >
>> > > > >
>> > > > >
>> > > >
>> > > >
>> > >
>> > >
>> >
>> >
>>
>>

Re: IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpitimeouts

Reply via email to