Hi folks,

It looks like we stopped half-way with this activity. I’d like to pick it up.

All seem to agree that we should simplify the timeout settings.
Here are the specific actions I’d like to propose:

1. Promote the use of global timeouts as the best practice
*What*: update the docs to encourage users to rely on the following timeouts 
for their “network stability” settings
IgniteConfiguration.failureDetectionTimeout
IgniteConfiguration.clientFailureDetectionTimeout 
IgniteConfiguration.networkTimeout
*When*: update readme.io docs for 2.5 and Javadoc for 2.6

2. Discourage the use of finer timeouts
*What*:
- update the docs to discourage users to use the following timeouts and 
announce their upcoming deprecation and removal
TcpDiscoverySpi.socketTimeout
TcpDiscoverySpi.ackTimeout
TcpDiscoverySpi.maxAckTimeout
TcpDiscoverySpi.reconnectCount
TcpCommunicationSpi.connectTimeout
TcpCommunicationSpi.maxConnectTimeout
TcpCommunicationSpi.reconnectCount
- deprecate the properties in code
- remove the properties in code
*When*:
- readme.io update with deprecation announcement for 2.5
- @Deprecated in code + Javadoc update + respective readme.io rewording for 2.6
- properties removal in 3.0

3. Make “orphan” timeouts rely on global timeouts, then deprecate and remove
*What*:
Two settings currently don’t default to the global equivalents, although they 
should:
- TcpCommunicationSpi.socketWriteTimeout should default to 
failureDetectionTimeout
- TcpDiscoverySpi.networkTimeout should default to 
IgniteConfiguration.networkTImeout
So the course of action would be:
- update the docs to explain that these timeouts have to be used for now, but 
announce their upcoming deprecation and removal
- change the properties to default to their global counterparts and deprecate 
them in code
- remove the properties in code
*When*:
- readme.io update with deprecation announcement for 2.5
- changing defaults + @Deprecated in code + Javadoc update + respective 
readme.io rewording for 2.6
- properties removal in 3.0

4. Don’t touch other timeouts
Other timeouts, like TcpDiscoverySpi.joinTimeout or 
TcpCommunicationSpi.idleConnectionTimeout, are orthogonal to the whole
“network stability” theme discussed above, and don’t have to be changed.

Finally, I’ve prepared a draft of the docs page that may be used as a base for 
the readme.io update.
This email is pretty long already, so please find the draft attached to the 
JIRA issue 
https://issues.apache.org/jira/browse/IGNITE-7704.

Please share your thoughts.

Thanks,
Stan

From: Alexey Popov
Sent: 1 марта 2018 г. 17:01
To: dev@ignite.apache.org
Subject: IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpi timeouts

Hi Igniters,

We often see similar questions from users and customers related to
IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpi timeouts and their
relations. And we see several side-effects after incorrect timeout
configuration.

I tried to briefly describe these timeout settings (please see below) and
found out that the most of them do not have sense in terms of cluster
functions/operations and could not be explained to the users.

I propose to deprecate most of them and leave only the timeouts we can
explain in common terms ( (setFailureDetectionTimeout, setNetworkTimeout,
setJoinTimeout and some others).

Please let me know your thoughts.

Thanks,
Alexey

GLOBAL:

IgniteConfiguration.setNetworkTimeout:
It is a global timeout for high-level operations where a network is
involved. For instance, IgniteMessaging delivery uses this timeout or
DiscoverySpi handshake.

IgniteConfiguration.setFailureDetectionTimeout:
It is a global timeout for detecting failures at IgniteSpi implementations
(including DiscoverySpi and CommunicationSpi).
The failure detection algorithm actually limits a range of simple network
operations related to a single logical operation (for instance, a reliable
delivery of some DiscoverySpi message within a cluster).
Failure detection timeout is a cumulative timeout for a socket connection,
sending and receiving data bytes and all possible socket retries (if some
failure happens). 
This timeout is intended to simplify the failure detection condition from a
user perspective.

IgniteConfiguration.setClientFailureDetectionTimeout: - it is a special case
for DiscoverySpi client-node Ignite.

TCP DISCOVERY SPI:

If you need more control over failure detection algorithm for
TcpDiscoverySpi you can explicitly use the following low-level options (that
will disable failureDetectoinTimeout logic):

1. TcpDiscoverySpi.setConnectTimeout - socket connection timeout
2. TcpDiscoverySpi.setReconnectCount - number of reconnect attempts used
when establishing connection with the remote node and sending messages to it
3. TcpDiscoverySpi.setSocketTimeout - socket write timeout. The write
operation will be repeated getReconnectCount() times if it exceeds this
timeout
4. TcpDiscoverySpi.setAckTimeout - message acknowledgment timeout. If a
message acknowledgment is not received within this timeout, sending is
considered as failed and SPI will try to repeat send operation. It is
automatically doubled for simultaneous retries up to getMaxAckTimeout value.
5. TcpDiscoverySpi.setMaxAckTimeout - maximum connection timeout, if the
getAckTimeout reaches getMaxAckTimeout then SPI give up sending retries

Another important TcpDiscoverySpi timeouts:

TcpDiscoverySpi.setJoinTimeout - It is a timeout for join process when a
new/restarted node joins a cluster. The node tries to connect to all
available IP addresses provided by ipFinder within this timeout.
If the timeout is exceeded, the node will give up and throw an exception
from Ignition.start().

TcpDiscoverySpi.setNetworkTimeout - timeout for high-level operations like
handshake. It looks like it should be deprecated and the
IgniteConfiguration.getNetworkTimeout should be used here.

TCP COMMUNICATION SPI:

If you need more control over failure detection algorithm for
TcpCommunicationSpi you can explicitly use the following low-level options
(that will disable failureDetectoinTimeout logic):

1. TcpCommunicationSpi.setConnectTimeout - socket connection timeout, will
be automatically doubled for simultaneous retries (up to getReconnectCount)
related to a single logical operation 
2. TcpCommunicationSpi.setMaxConnectTimeout - maximum connection timeout,
the higher limit of getReconnectCount-times doubled getConnectTimeout
3. TcpCommunicationSpi.setReconnectCount - number of reconnect attempts used
when establishing connection with the remote node and sending messages to it

Another important TcpCommunicationSpi timeouts:

TcpDiscoverySpi.setSocketWriteTimeout - timeout to send a message.
TcpDiscoverySpi.setIdleConnectionTimeout - maximum idle connection timeout
upon which a connection will be closed.




--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/

Reply via email to