[ https://issues.apache.org/jira/browse/IGNITE-16568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17493534#comment-17493534 ]
SHIBAEV Valera commented on IGNITE-16568: ----------------------------------------- I could not see how to determine which pod will be first. I think we need to write some kind of marker on K8s ignite service that said "pod_N" is first and others should wait till _TcpDiscoveryKubernetesIpFinder#getRegisteredAddresses()_ not empty. eg by adding _metadata.labels.pods: [pod_N]_ > Kubernetes cluster might split apart on initialization > ------------------------------------------------------ > > Key: IGNITE-16568 > URL: https://issues.apache.org/jira/browse/IGNITE-16568 > Project: Ignite > Issue Type: Bug > Components: networking > Affects Versions: 2.11.1 > Reporter: Alexandr Shapkin > Priority: Major > Labels: Kubernetes > > The issue is mostly about Kubernetes/Openshift deployment but could also > affect other scenarios relying on external services (AWS?). > Consider the following case: multiple nodes (PODs) were started > simultaneously and all of them are trying to locate if there are other nodes > available using > *_TcpDiscoveryKubernetesIpFinder._* that just returns a set of registered > IPs. Since there is no delay or retry attempt, all nodes could be returned > with an empty IPs list and decide to be a coordinator, i.e. to start multiple > independent grids. > > Proposed changes: extend TcpDiscoveryKubernetesIpFinder with either a > configurable delay or repetitions counter to check if there is a non-empty > list of available IPs. -- This message was sent by Atlassian Jira (v8.20.1#820001)