Ivan, It seems that if a server notices that an existing connection to a client cannot be used anymore then the server can expect that the client will establish a new one. Is it just out of current iteration scope? Or are there still other fundamental problems?
2020-06-29 16:32 GMT+03:00, Ivan Bessonov <bessonov...@gmail.com>: > Hi Ivan, > > sure, TCP connections are lazy. So, if a connection is not already opened > then node (trying to send a message) will initiate connection opening. > It's also possible that the opened connection is spontaneously closed for > some reason. Otherwise you are right, everything is as you described. > > There's also a tie breaker when two nodes connect to each other at the > same time. Only one of them will succeed and it depends on internal > discovery order, which you can't control basically. > > пн, 29 июн. 2020 г. в 16:01, Ivan Pavlukhin <vololo...@gmail.com>: > >> Hi Ivan, >> >> Sorry for a possibly naive question. As I understand we are talking >> about order of establishing client-server connections. And I suppose >> that in some environments (e.g. cloud) servers cannot directly >> establish connections with clients. But TCP connections are >> bidirectional and we still can send messages in both directions. Could >> you please provide an example case in which servers have to initiate >> new connections to clients? >> >> 2020-06-29 13:08 GMT+03:00, Ivan Bessonov <bessonov...@gmail.com>: >> > Hi igniters, Hi Raymond, >> > >> > that was a really good point. I will try to address it as much as I >> > can. >> > >> > First of all, this new mode will be configurable for now. As Val >> suggested, >> > "TcpCommunicationSpi#forceClientToServerConnections" will be a new >> > setting to trigger this behavior. Disabled by default. >> > >> > About issues with K8S deployments - I'm not an expert, but from what >> > I've >> > heard, sometimes servers and client nodes are not in the same >> environments. >> > For example, there is an Ignite cluster and user tries to start client >> node >> > in >> > isolated K8S pod. In this case clients cannot properly resolve their >> > own >> > addresses >> > and send it to servers, making it impossible for servers to connect to >> such >> > clients. >> > Or, in other words, clients are used as if they were thin. >> > >> > In your case everything is fine, clients and servers share the same >> network >> > and can resolve each other's addresses. >> > >> > Now, CQ issue [1]. You can pass a custom event filter when you register >> > a >> > new >> > continuous query. But, depending on the setup, the class of this filter >> may >> > not >> > be in the classpath of the server node that holds the data and invokes >> that >> > filter. >> > There are two solutions to the problem: >> > - server fails to resolve class name and fails to register CQ; >> > - or server can have p2p deployment enabled. Let's assume that it was a >> > client >> > node that requested CQ. In this case the server will try to download >> > "class" file >> > directly from the node that sent the filter object in the first place. >> Due >> > to a poor >> > design decision it will be done synchronously while registering the >> query, >> > and >> > query registration is happening in "discovery" thread. In normal >> > circumstances >> > the server will load the class and finish query registration, it's just >> > a >> > little bit slow. >> > >> > Second case is not compatible with a new >> > "forceClientToServerConnections" >> > setting. I'm not sure that I need to go into all technical details, but >> the >> > result of >> > such procedure is a cluster that cannot process any discovery messages >> > during >> > TCP connection timeout, we're talking about tens of seconds or maybe >> > even >> > several minutes depending on the settings and the environment. All this >> > time the >> > server will be in a "deadlock" state inside of the "discovery" thread. >> > It >> > means that >> > some cluster operations will be unavailable during this period, like >> > new >> > node joining >> > or starting a new cache. Node failures will not be processed properly >> > as >> > well. For >> > me it's hard to predict real behavior until we reproduce the situation >> in a >> > live >> > environment. I saw this in tests only. >> > >> > I hope that my message clarifies the situation, or at least doesn't >> > cause >> > more >> > confusion. These changes will not affect your infrastructure or your >> Ignite >> > installations, they are aimed at adding more flexibility to other ways >> > of >> > using Ignite. >> > >> > [1] https://issues.apache.org/jira/browse/IGNITE-13156 >> > >> > >> > >> > сб, 27 июн. 2020 г. в 09:54, Raymond Wilson <raymond_wil...@trimble.com >> >: >> > >> >> I have just caught up with this discussion and wanted to outline a set >> of >> >> use >> >> cases we have that rely on server nodes communicating with client >> >> nodes. >> >> >> >> Firstly, I'd like to confirm my mental model of server & client nodes >> >> within >> >> a grid (ignoring thin clients for now): >> >> >> >> A grid contains a set of nodes somewhat arbitrarily labelled 'server' >> and >> >> 'client' where the distinction of a 'server' node is that it is >> >> responsible >> >> for containing data (in-memory only, or also with persistence). Apart >> >> from >> >> that distinction, all nodes are essentially peers in the grid and may >> use >> >> the messaging fabric, compute layer and other grid features on an >> >> equal >> >> footing. >> >> >> >> In our solution we leverage these capabilities to build and >> >> orchestrate >> >> complex analytics queries that utilise compute functions that are >> >> initiated >> >> in three distinct ways: client -> client, client -> server and server >> >> -> >> >> client, and where all three styles of initiation are using within a >> >> single >> >> analytics request made to the grid it self. I can go into more detail >> >> about >> >> the exact sequencing of these activities if you like, but it may be >> >> sufficient to know they are used to reason about the problem statement >> >> and >> >> proposals outlined here. >> >> >> >> Our infrastructure is deployed to Kubernetes using EKS on AWS, and all >> >> three >> >> relationships between client and server nodes noted above function >> >> well >> >> (caveat: we do see odd things though such as long pauses on critical >> >> worker >> >> threads, and occasional empty topology warnings when locating client >> >> nodes >> >> to send requests to). We also use continuous queries in three contexts >> >> (all >> >> within server nodes). >> >> >> >> If this thread is suggesting changing the functional relationship >> between >> >> server and client nodes then this may have impacts on our architecture >> >> and >> >> implementation that we will need to consider. >> >> >> >> This thread has highlighted issues with K8s deployments and also CQ >> >> issues. >> >> The suggestion is that Server to Client just doesn't work on K8s, >> >> which >> >> does >> >> not agree with our experience of it working. I'd also like to >> >> understand >> >> better the bounds of the issue with CQ: When does it not work and what >> >> are >> >> the symptoms we would see if there was an issue with the way we are >> using >> >> it, or the K8s infrastructure we deploy to? >> >> >> >> Thanks, >> >> Raymond. >> >> >> >> >> >> >> >> >> >> -- >> >> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/ >> >> >> > >> > >> > -- >> > Sincerely yours, >> > Ivan Bessonov >> > >> >> >> -- >> >> Best regards, >> Ivan Pavlukhin >> > > > -- > Sincerely yours, > Ivan Bessonov > -- Best regards, Ivan Pavlukhin