[
https://issues.apache.org/jira/browse/IGNITE-752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yakov Zhdanov updated IGNITE-752:
---------------------------------
Priority: Blocker (was: Critical)
> Speed up failure detection
> --------------------------
>
> Key: IGNITE-752
> URL: https://issues.apache.org/jira/browse/IGNITE-752
> Project: Ignite
> Issue Type: Bug
> Reporter: Yakov Zhdanov
> Assignee: Denis Magda
> Priority: Blocker
> Fix For: sprint-5
>
>
> I think we can (1) make grid configuration significantly easier and (2) speed
> up failure detection.
> Here are disco SPI configuration properties which are responsible for failure
> detection:
> # reconnectCount,
> # sockTimeout,
> # networkTImeout,
> # ackTImeout,
> # maxAckTimeout,
> # heartbeatFrequency
> # maxMissedHearbeats
> Same for communication SPI
> # reconnectCount,
> # maxConnTimeout,
> # connTimeout
> So, we have 10 or even more properties.
> We did it to address half-opened sockets problem (which is pretty common for
> cloud environment) and GC pauses which may happen on cluster nodes - we can
> increase ack timeouts to prevent them from being kicked off the topology.
> By setting value for these props I set timeout for failure detection. Why do
> we need such great number of parameters instead of having 1 on
> IgniteConfiguration - nodeResponseThreshold (or failureDetectionThreshold -
> can anyone propose better name?).
> All other parameters will be calculated automatically (I think user can still
> set some of them for full control over situation - need to decide if this is
> needed.)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)