Guys, I think we can (1) make grid configuration significantly easier and (2) speed up failure detection.
Here are disco SPI configuration properties which are responsible for failure detection: - reconnectCount, - sockTimeout, - networkTImeout, - ackTImeout, - maxAckTimeout, - heartbeatFrequency - maxMissedHearbeats Same for communication SPI - reconnectCount, - maxConnTimeout, - connTimeout 10 or even more properties. We did it to address half-opened sockets problem (which is pretty common for cloud environment) and GC pauses which may happen on cluster nodes - we can increase ack timeouts to prevent them By setting value for these props I set timeout for failure detection. Why do we need such great number of parameters instead of having 1 on IgniteConfiguration - nodeResponseThreshold (or failureDetectionThreshold - can anyone propose better name?). All other parameters will be calculated automatically (I think user can still set some of them for full control over situation - need to decide if this is needed.) Ticket filed - https://issues.apache.org/jira/browse/IGNITE-752 Thoughts? --Yakov
