Re: [DISCUSSION] New Ignite settings for IGNITE-12438 and IGNITE-13013

Ivan Pavlukhin Mon, 29 Jun 2020 07:31:04 -0700

Ivan,

It seems that if a server notices that an existing connection to a
client cannot be used anymore then the server can expect that the
client will establish a new one. Is it just out of current iteration
scope? Or are there still other fundamental problems?


2020-06-29 16:32 GMT+03:00, Ivan Bessonov <bessonov...@gmail.com>:
> Hi Ivan,
>
> sure, TCP connections are lazy. So, if a connection is not already opened
> then node (trying to send a message) will initiate connection opening.
> It's also possible that the opened connection is spontaneously closed for
> some reason. Otherwise you are right, everything is as you described.
>
> There's also a tie breaker when two nodes connect to each other at the
> same time. Only one of them will succeed and it depends on internal
> discovery order, which you can't control basically.
>
> пн, 29 июн. 2020 г. в 16:01, Ivan Pavlukhin <vololo...@gmail.com>:
>
>> Hi Ivan,
>>
>> Sorry for a possibly naive question. As I understand we are talking
>> about order of establishing client-server connections. And I suppose
>> that in some environments (e.g. cloud) servers cannot directly
>> establish connections with clients. But TCP connections are
>> bidirectional and we still can send messages in both directions. Could
>> you please provide an example case in which servers have to initiate
>> new connections to clients?
>>
>> 2020-06-29 13:08 GMT+03:00, Ivan Bessonov <bessonov...@gmail.com>:
>> > Hi igniters, Hi Raymond,
>> >
>> > that was a really good point. I will try to address it as much as I
>> > can.
>> >
>> > First of all, this new mode will be configurable for now. As Val
>> suggested,
>> > "TcpCommunicationSpi#forceClientToServerConnections" will be a new
>> > setting to trigger this behavior. Disabled by default.
>> >
>> > About issues with K8S deployments - I'm not an expert, but from what
>> > I've
>> > heard, sometimes servers and client nodes are not in the same
>> environments.
>> > For example, there is an Ignite cluster and user tries to start client
>> node
>> > in
>> > isolated K8S pod. In this case clients cannot properly resolve their
>> > own
>> > addresses
>> > and send it to servers, making it impossible for servers to connect to
>> such
>> > clients.
>> > Or, in other words, clients are used as if they were thin.
>> >
>> > In your case everything is fine, clients and servers share the same
>> network
>> > and can resolve each other's addresses.
>> >
>> > Now, CQ issue [1]. You can pass a custom event filter when you register
>> > a
>> > new
>> > continuous query. But, depending on the setup, the class of this filter
>> may
>> > not
>> > be in the classpath of the server node that holds the data and invokes
>> that
>> > filter.
>> > There are two solutions to the problem:
>> > - server fails to resolve class name and fails to register CQ;
>> > - or server can have p2p deployment enabled. Let's assume that it was a
>> > client
>> > node that requested CQ. In this case the server will try to download
>> > "class" file
>> > directly from the node that sent the filter object in the first place.
>> Due
>> > to a poor
>> > design decision it will be done synchronously while registering the
>> query,
>> > and
>> > query registration is happening in "discovery" thread. In normal
>> > circumstances
>> > the server will load the class and finish query registration, it's just
>> > a
>> > little bit slow.
>> >
>> > Second case is not compatible with a new
>> > "forceClientToServerConnections"
>> > setting. I'm not sure that I need to go into all technical details, but
>> the
>> > result of
>> > such procedure is a cluster that cannot process any discovery messages
>> > during
>> > TCP connection timeout, we're talking about tens of seconds or maybe
>> > even
>> > several minutes depending on the settings and the environment. All this
>> > time the
>> > server will be in a "deadlock" state inside of the "discovery" thread.
>> > It
>> > means that
>> > some cluster operations will be unavailable during this period, like
>> > new
>> > node joining
>> > or starting a new cache. Node failures will not be processed properly
>> > as
>> > well. For
>> > me it's hard to predict real behavior until we reproduce the situation
>> in a
>> > live
>> > environment. I saw this in tests only.
>> >
>> > I hope that my message clarifies the situation, or at least doesn't
>> > cause
>> > more
>> > confusion. These changes will not affect your infrastructure or your
>> Ignite
>> > installations, they are aimed at adding more flexibility to other ways
>> > of
>> > using Ignite.
>> >
>> > [1] https://issues.apache.org/jira/browse/IGNITE-13156
>> >
>> >
>> >
>> > сб, 27 июн. 2020 г. в 09:54, Raymond Wilson <raymond_wil...@trimble.com
>> >:
>> >
>> >> I have just caught up with this discussion and wanted to outline a set
>> of
>> >> use
>> >> cases we have that rely on server nodes communicating with client
>> >> nodes.
>> >>
>> >> Firstly, I'd like to confirm my mental model of server & client nodes
>> >> within
>> >> a grid (ignoring thin clients for now):
>> >>
>> >> A grid contains a set of nodes somewhat arbitrarily labelled 'server'
>> and
>> >> 'client' where the distinction of a 'server' node is that it is
>> >> responsible
>> >> for containing data (in-memory only, or also with persistence). Apart
>> >> from
>> >> that distinction, all nodes are essentially peers in the grid and may
>> use
>> >> the messaging fabric, compute layer and other grid features on an
>> >> equal
>> >> footing.
>> >>
>> >> In our solution we leverage these capabilities to build and
>> >> orchestrate
>> >> complex analytics queries that utilise compute functions that are
>> >> initiated
>> >> in three distinct ways: client -> client, client -> server and server
>> >> ->
>> >> client, and where all three styles of initiation are using within a
>> >> single
>> >> analytics request made to the grid it self. I can go into more detail
>> >> about
>> >> the exact sequencing of these activities if you like, but it may be
>> >> sufficient to know they are used to reason about the problem statement
>> >> and
>> >> proposals outlined here.
>> >>
>> >> Our infrastructure is deployed to Kubernetes using EKS on AWS, and all
>> >> three
>> >> relationships between client and server nodes noted above function
>> >> well
>> >> (caveat: we do see odd things though such as long pauses on critical
>> >> worker
>> >> threads, and occasional empty topology warnings when locating client
>> >> nodes
>> >> to send requests to). We also use continuous queries in three contexts
>> >> (all
>> >> within server nodes).
>> >>
>> >> If this thread is suggesting changing the functional relationship
>> between
>> >> server and client nodes then this may have impacts on our architecture
>> >> and
>> >> implementation that we will need to consider.
>> >>
>> >> This thread has highlighted issues with K8s deployments and also CQ
>> >> issues.
>> >> The suggestion is that Server to Client just doesn't work on K8s,
>> >> which
>> >> does
>> >> not agree with our experience of it working. I'd also like to
>> >> understand
>> >> better the bounds of the issue with CQ: When does it not work and what
>> >> are
>> >> the symptoms we would see if there was an issue with the way we are
>> using
>> >> it, or the K8s infrastructure we deploy to?
>> >>
>> >> Thanks,
>> >> Raymond.
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
>> >>
>> >
>> >
>> > --
>> > Sincerely yours,
>> > Ivan Bessonov
>> >
>>
>>
>> --
>>
>> Best regards,
>> Ivan Pavlukhin
>>
>
>
> --
> Sincerely yours,
> Ivan Bessonov
>


-- 

Best regards,
Ivan Pavlukhin

Re: [DISCUSSION] New Ignite settings for IGNITE-12438 and IGNITE-13013

Reply via email to