it would be great! - does sockTimeout affect all the server sockets involved in Ignite node? (E.g. there are sockets in discovery, in Hadoop job tracker, in IGFS interface, even in shmem handshake.) - to reduce GC pauses G1 collector can potentially be helpful. Is there any experience with it in Ignite?
--ivan On Wed, Apr 15, 2015 at 12:25 AM, Yakov Zhdanov <[email protected]> wrote: > Guys, > > I think we can (1) make grid configuration significantly easier and (2) > speed up failure detection. > > Here are disco SPI configuration properties which are responsible for > failure detection: > > - reconnectCount, > - sockTimeout, > - networkTImeout, > - ackTImeout, > - maxAckTimeout, > - heartbeatFrequency > - maxMissedHearbeats > > Same for communication SPI > > - reconnectCount, > - maxConnTimeout, > - connTimeout > > 10 or even more properties. > > We did it to address half-opened sockets problem (which is pretty common > for cloud environment) and GC pauses which may happen on cluster nodes - we > can increase ack timeouts to prevent them > > By setting value for these props I set timeout for failure detection. Why > do we need such great number of parameters instead of having 1 on > IgniteConfiguration - nodeResponseThreshold (or failureDetectionThreshold - > can anyone propose better name?). > > All other parameters will be calculated automatically (I think user can > still set some of them for full control over situation - need to decide if > this is needed.) > > Ticket filed - https://issues.apache.org/jira/browse/IGNITE-752 > > Thoughts? > > --Yakov >
