[ https://issues.apache.org/jira/browse/IGNITE-20914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Roman Puchkovskiy reassigned IGNITE-20914: ------------------------------------------ Assignee: Roman Puchkovskiy > Make ScaleCube's metadataTimeout configurable > --------------------------------------------- > > Key: IGNITE-20914 > URL: https://issues.apache.org/jira/browse/IGNITE-20914 > Project: Ignite > Issue Type: Improvement > Reporter: Roman Puchkovskiy > Assignee: Roman Puchkovskiy > Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > > ScaleCube's MembershipProtocolImpl fetches node's metadata periodically > (using GetMetaDataRequest). If it does not get a response before > metadataTimeout expires, it seems to think that the node is not alive anymore > and generates a REMOVED event: > [2023-11-17T00:20:22,153][WARN ][sc-cluster-3345-2][MembershipProtocol] > [default:sqllogic1:1ca7b2f5308489d@10.233.107.205:3345][updateMembership][MEMBERSHIP_GOSSIP] > Skipping to add/update member: \{m: > default:sqllogic0:6a78c57fcd0a496d@10.233.107.205:3344, s: ALIVE, inc: 9}, > due to failed fetchMetadata call (cause: > java.util.concurrent.TimeoutException: Did not observe any item or terminal > signal within 1000ms in 'source(MonoDefer)' (and no fallback has been > configured)) > [2023-11-17T00:20:29,189][INFO ][sc-cluster-3345-2][MembershipProtocol] > [default:sqllogic1:1ca7b2f5308489d@10.233.107.205:3345] Member left without > notification: default:sqllogic0:6a78c57fcd0a496d@10.233.107.205:3344 > [2023-11-17T00:20:29,190][INFO ][sc-cluster-3345-2][MembershipProtocol] > [default:sqllogic1:1ca7b2f5308489d@10.233.107.205:3345][publishEvent] > MembershipEvent[type=REMOVED, > member=default:sqllogic0:6a78c57fcd0a496d@10.233.107.205:3344, > oldMetadata=1e61c6c8-154, newMetadata=null, > timestamp=2023-11-17T00:20:29.189Z] > We should avoid this. It seems that 1 second might be too small for a node > under load. > We should make this configurable via Ignite configuration. > Also, it probably makes sense to set a higher default (like 10 seconds). The > reason for the latter is that, if the timeout expires, a node is removed from > the physical topology and cannot return there without a restart (this is what > our connection establishment protocol requires), so this timeout is critical > for stability of Ignite (while it is probably not critical for an average > ScaleCube-based application). -- This message was sent by Atlassian Jira (v8.20.10#820010)