I think what Dan did was pass in a socket factory that would connect to his 
gateway instead of the requested server.  Doing it like that would require a 
lot less code change than what you’re currently doing and would get past the 
unit test problem.

 

I can point you to where you’d need to make changes for the Ping operatio:.  
PingOpImpl would need to send the ServerLocation it’s trying to reach.  
PingOp.execute() gets that as a parameter and PingOpImpl.sendMessage() writes 
it to the server.  The Ping command class’s cmdExecute would need to read that 
data if serverConnection.getClientVersion() is Version.GEODE_1_13_0 or later.  
Then it would have to compare the server location it read to that server’s 
coordinates and, if not equal, find the server with those coordinates and send 
a new DistributionMessage to it with the client’s identity.  There are plenty 
of DistributionMessage classes around to look at as precedents.  You send the 
message with 
serverConnection.getCache().getDistributionManager().putOutgoing(message).

 

You can PM me any time.  Dan could answer questions about his gateway work.

                                                                                
                                                            

 

From: Alberto Bustamante Reyes <alberto.bustamante.re...@est.tech>
Date: Monday, March 23, 2020 at 2:18 PM
To: Bruce Schuchardt <bschucha...@pivotal.io>, Dan Smith <dsm...@pivotal.io>, 
"dev@geode.apache.org" <dev@geode.apache.org>
Cc: Jacob Barrett <jbarr...@pivotal.io>, Anilkumar Gingade 
<aging...@pivotal.io>, Charlie Black <cbl...@pivotal.io>
Subject: RE: WAN replication issue in cloud native environments

 

Thanks for your answer and your comment in the wiki Bruce. I will take a closer 
look at what you mentioned, it is not clear enough for me how to implement it.

 

BTW, I forgot to set a deadline for the wiki review, I hope that Thursday 26th 
March is enough to receive comments. 

De: Bruce Schuchardt <bschucha...@pivotal.io>
Enviado: jueves, 19 de marzo de 2020 16:30
Para: Alberto Bustamante Reyes <alberto.bustamante.re...@est.tech>; Dan Smith 
<dsm...@pivotal.io>; dev@geode.apache.org <dev@geode.apache.org>
Cc: Jacob Barrett <jbarr...@pivotal.io>; Anilkumar Gingade 
<aging...@pivotal.io>; Charlie Black <cbl...@pivotal.io>
Asunto: Re: WAN replication issue in cloud native environments 

 

I wonder if an approach similar to the SNI hostname PoolFactory changes would 
work for this non-TLS gateway.  The client needs to differentiate between the 
different servers so that it doesn’t declare all of them dead should one of 
them fail.  If the pool knew about the gateway it could direct all traffic 
there and the servers wouldn’t need to set a hostname-for-clients.

 

It’s not an ideal solution since the gateway wouldn’t know which server the 
client wanted to contact and there are sure to be other problems like creating 
a backup queue for subscriptions.  But that’s the case with the 
hostname-for-clients approach, too.

 

 

From: Alberto Bustamante Reyes <alberto.bustamante.re...@est.tech>
Date: Wednesday, March 18, 2020 at 8:35 AM
To: Dan Smith <dsm...@pivotal.io>, "dev@geode.apache.org" <dev@geode.apache.org>
Cc: Bruce Schuchardt <bschucha...@pivotal.io>, Jacob Barrett 
<jbarr...@pivotal.io>, Anilkumar Gingade <aging...@pivotal.io>, Charlie Black 
<cbl...@pivotal.io>
Subject: RE: WAN replication issue in cloud native environments

 

Hi all,

 

As Bruce suggested me, I have created a wiki page describing the problem we are 
trying to solve: 
https://cwiki.apache.org/confluence/display/GEODE/Allow+same+host+and+port+for+all+gateway+receivers

 

Please let me know if further clarifications are needed.

 

Also, I have closed the PR I have been using until now, and created a new one 
with the current status of the solution, with one commit per issue described in 
the wiki: https://github.com/apache/geode/pull/4824

 

Thanks in advance!

De: Alberto Bustamante Reyes <alberto.bustamante.re...@est.tech>
Enviado: lunes, 9 de marzo de 2020 11:24
Para: Dan Smith <dsm...@pivotal.io>
Cc: dev@geode.apache.org <dev@geode.apache.org>; Bruce Schuchardt 
<bschucha...@pivotal.io>; Jacob Barrett <jbarr...@pivotal.io>; Anilkumar 
Gingade <aging...@pivotal.io>; Charlie Black <cbl...@pivotal.io>
Asunto: RE: WAN replication issue in cloud native environments 

 

Thanks for point that out Dan. Sorry for the misunderstanding, as I only found 
that "affinity" (setServerAffinityLocation method) on the client code I thought 
you were talking about it.
Anyway, I did some more tests and it does not solve our problem...

I tried configuring the service affinity on k8s, but it breaks the first part 
of the solution (the changes implemented on LocatorLoadSnapshot that solves the 
problem of the replication) and senders do not connect to other receivers when 
the one they were connected to is down.

The only alternative we have in mind to try to solve the ping problem is to 
keep on investigating if changing the ping task creation could be a solution 
(the changes implemented are clearly breaking something, so the solution is not 
complete yet).






________________________________
De: Dan Smith <dsm...@pivotal.io>
Enviado: jueves, 5 de marzo de 2020 21:03
Para: Alberto Bustamante Reyes <alberto.bustamante.re...@est.tech>
Cc: dev@geode.apache.org <dev@geode.apache.org>; Bruce Schuchardt 
<bschucha...@pivotal.io>; Jacob Barrett <jbarr...@pivotal.io>; Anilkumar 
Gingade <aging...@pivotal.io>; Charlie Black <cbl...@pivotal.io>
Asunto: Re: WAN replication issue in cloud native environments

I think there is some confusion here.

The client side class ExecutablePool has a method called 
setServerAffinityLocation. It looks like that is used for some internal 
transaction code to make sure transactions go to the same server. I don't think 
it makes any sense for the gateway to be messing with this setting.

What I was talking about was session affinity in your proxy server. For 
example, if you are using k8s, session affinity as defined in this page - 
https://kubernetes.io/docs/concepts/services-networking/service/

"If you want to make sure that connections from a particular client are passed 
to the same Pod each time, you can select the session affinity based on the 
client’s IP addresses by setting service.spec.sessionAffinity to “ClientIP” 
(the default is “None”)"

I think setting session affinity might help your use case, because it sounds 
like you are having issues with the proxy directing pings to a different server 
than the data.

-Dan

On Thu, Mar 5, 2020 at 4:20 AM Alberto Bustamante Reyes 
<alberto.bustamante.re...@est.tech> wrote:
I think that was what I did when I tried, but I realized I had a failure in the 
code. Now that I have tried again, reverting the change of executing ping by 
endpoint, and applying the server affinity, the connections are much more 
stable! Looks promising 🙂

I suppose that if I want to introduce this change, setting the server affinity 
in the gateway sender should be introduced as a new option in the sender 
configuration, right?
________________________________
De: Dan Smith <dsm...@pivotal.io<mailto:dsm...@pivotal.io>>
Enviado: jueves, 5 de marzo de 2020 4:41
Para: Alberto Bustamante Reyes <alberto.bustamante.re...@est.tech>
Cc: dev@geode.apache.org<mailto:dev@geode.apache.org> 
<dev@geode.apache.org<mailto:dev@geode.apache.org>>; Bruce Schuchardt 
<bschucha...@pivotal.io<mailto:bschucha...@pivotal.io>>; Jacob Barrett 
<jbarr...@pivotal.io<mailto:jbarr...@pivotal.io>>; Anilkumar Gingade 
<aging...@pivotal.io<mailto:aging...@pivotal.io>>; Charlie Black 
<cbl...@pivotal.io<mailto:cbl...@pivotal.io>>
Asunto: Re: WAN replication issue in cloud native environments

Oh, sorry, I meant server affinity with the proxy itself. So that it will 
always route traffic from the same gateway sender to the same gateway receiver. 
Hopefully that would ensure that pings go to the same receiver data is sent to.

-Dan

On Wed, Mar 4, 2020, 1:31 AM Alberto Bustamante Reyes 
<alberto.bustamante.re...@est.tech> wrote:
I have tried setting the server affinity on the gateway sender's pool in 
AbstractGatewaySender class, when the server location is set, but I dont see 
any difference on the behavior of the connections.

I did not mention that the connections are reset every 5 seconds due to 
"java.io.EOFException: The connection has been reset while reading the header". 
But I dont know yet what is causing it.

________________________________
De: Dan Smith <dsm...@pivotal.io<mailto:dsm...@pivotal.io>>
Enviado: martes, 3 de marzo de 2020 18:07
Para: dev@geode.apache.org<mailto:dev@geode.apache.org> 
<dev@geode.apache.org<mailto:dev@geode.apache.org>>
Cc: Bruce Schuchardt <bschucha...@pivotal.io<mailto:bschucha...@pivotal.io>>; 
Jacob Barrett <jbarr...@pivotal.io<mailto:jbarr...@pivotal.io>>; Anilkumar 
Gingade <aging...@pivotal.io<mailto:aging...@pivotal.io>>; Charlie Black 
<cbl...@pivotal.io<mailto:cbl...@pivotal.io>>
Asunto: Re: WAN replication issue in cloud native environments

> We are currently working on other issue related to this change: gw
senders pings are not reaching the gw receivers, so ClientHealthMonitor
closes the connections. I saw that the ping tasks are created by
ServerLocation, so I have tried to solve the issue by changing it to be
done by Endpoint. This change is not finished yet, as in its current status
it causes the closing of connections from gw servers to gw receivers every
5 seconds.

Are you using session affinity? I think you probably will need to since
pings can go over different connections than the data connection.

-Dan

On Tue, Mar 3, 2020 at 3:44 AM Alberto Bustamante Reyes
<alberto.bustamante.re...@est.tech> wrote:

> Hi Bruce,
>
> Thanks for your comments, but we are not planning to use TLS, so Im afraid
> the PR you are working on will not solve this problem.
>
> The origin of this issue is that we would like to be able to configure all
> gw receivers with the same "hostname-for-senders" value. The reason is that
> we will run a multisite Geode cluster, having each site on a different
> cloud environment, so using just one hostname makes configuration much more
> easier.
>
> When we tried to configure the cluster in this way, we experienced an
> issue with the replication. Using the same hostname-for-senders parameter
> causes that different servers have equals ServerLocation objects, so if one
> receiver is down, the others are considered down too. With the change
> suggested by Jacob this problem is solved, and replication works fine.
>
> We are currently working on other issue related to this change: gw senders
> pings are not reaching the gw receivers, so ClientHealthMonitor closes the
> connections. I saw that the ping tasks are created by ServerLocation, so I
> have tried to solve the issue by changing it to be done by Endpoint. This
> change is not finished yet, as in its current status it causes the closing
> of connections from gw servers to gw receivers every 5 seconds.
>
> Why you dont like the idea of using the InternalDistributedMember for
> distinguish server locations? Are you thinking about other alternative? In
> this use case, two different gw receivers will have the same
> ServerLocation, so we need to distinguish them.
>
> BR/
>
> Alberto B.
>
> ________________________________
> De: Bruce Schuchardt <bschucha...@pivotal.io<mailto:bschucha...@pivotal.io>>
> Enviado: lunes, 2 de marzo de 2020 20:20
> Para: dev@geode.apache.org<mailto:dev@geode.apache.org> 
> <dev@geode.apache.org<mailto:dev@geode.apache.org>>; Jacob Barrett <
> jbarr...@pivotal.io<mailto:jbarr...@pivotal.io>>
> Cc: Anilkumar Gingade <aging...@pivotal.io<mailto:aging...@pivotal.io>>; 
> Charlie Black <
> cbl...@pivotal.io<mailto:cbl...@pivotal.io>>
> Asunto: Re: WAN replication issue in cloud native environments
>
> I'm coming to this conversation late and probably am missing a lot of
> context.  Is the point of this to be to direct senders to some common
> gateway that all of the gateway receivers are configured to advertise?
> I've been working on a PR to support redirection of connections for
> client/server and gateway communications to a common address and put the
> destination host name in the SNIHostName TLS parameter.  Then you won't
> have to tell servers about the common host name - just tell clients what
> the gateway is and they'll connect to it & tell it what the target host
> name is via the SNIHostName.  However, that only works if SSL is enabled.
>
> PR 4743 is a step toward this approach and changes TcpClient and
> SocketCreator to take an unresolved host address.  After this is merged
> another change will allow folks to set a gateway host/port that will be
> used to form connections and insert the destination hostname into the
> SNIHostName SSLParameter.
>
> I would really like us to avoid including InternalDistributedMembers in
> equality checks for server-locations.  To-date we've only held these
> identifiers in Endpoints and other places for debugging purposes and have
> used ServerLocation to identify servers.
>
> On 1/27/20, 8:56 AM, "Alberto Bustamante Reyes"
> <alberto.bustamante.re...@est.tech> wrote:
>
>     Hi again,
>
>     Status update: the simplification of the maps suggested by Jacob made
> useless the new proposed class containing the ServerLocation and the member
> id. With this refactoring, replication is working in the scenario we have
> been discussing in this conversation. Thats great, and I think the code can
> be merged into develop if there are no extra comments in the PR.
>
>     But this does not mean we can say that Geode is able to work properly
> when using gw receivers with the same ip + port. We have seen that when
> working with this configuration, there is a problem with the pings sent
> from gw senders (that acts as clients) to the gw receivers (servers). The
> pings are reaching just one of the receivers, so the sender-receiver
> connection is finally closed by the ClientHealthMonitor.
>
>     Do you have any suggestion about how to handle this issue? My first
> idea was to identify where the connection is created, to check if the
> sender could be aware in some way there are more than one server to which
> the ping should be sent, but Im not sure if it could be possible. Or if the
> alternative could be to change the ClientHealthMonitor to be "clever"
> enough to not close connections in this case. Any comment is welcome 🙂
>
>     Thanks,
>
>     Alberto B.
>
>     ________________________________
>     De: Jacob Barrett <jbarr...@pivotal.io<mailto:jbarr...@pivotal.io>>
>     Enviado: miércoles, 22 de enero de 2020 19:01
>     Para: Alberto Bustamante Reyes <alberto.bustamante.re...@est.tech>
>     Cc: dev@geode.apache.org<mailto:dev@geode.apache.org> 
> <dev@geode.apache.org<mailto:dev@geode.apache.org>>; Anilkumar Gingade <
> aging...@pivotal.io<mailto:aging...@pivotal.io>>; Charlie Black 
> <cbl...@pivotal.io<mailto:cbl...@pivotal.io>>
>     Asunto: Re: WAN replication issue in cloud native environments
>
>
>
>     On Jan 22, 2020, at 9:51 AM, Alberto Bustamante Reyes
> <alberto.bustamante.re...@est.tech<mailto:
> alberto.bustamante.re...@est.tech>> wrote:
>
>     Thanks Naba & Jacob for your comments!
>
>
>
>     @Naba: I have been implementing a solution as you suggested, and I
> think it would be convenient if the client knows the memberId of the server
> it is connected to.
>
>     (current code is here: https://github.com/apache/geode/pull/4616 )
>
>     For example, in:
>
>     LocatorLoadSnapshot::getReplacementServerForConnection(ServerLocation
> currentServer, String group, Set<ServerLocation> excludedServers)
>
>     In this method, client has sent the ServerLocation , but if that
> object does not contain the memberId, I dont see how to guarantee that the
> replacement that will be returned is not the same server the client is
> currently connected.
>     Inside that method, this other method is called:
>
>
>     Given that your setup is masquerading multiple members behind the same
> host and port (ServerLocation) it doesn’t matter. When the pool opens a new
> socket to the replacement server it will be to the shared hostname and port
> and the Kubenetes service at that host and port will just pick a backend
> host. In the solution we suggested we preserved that behavior since the k8s
> service can’t determine which backend member to route the connection to
> based on the member id.
>
>
>     LocatorLoadSnapshot::isCurrentServerMostLoaded(currentServer,
> groupServers)
>
>     where groupServers is a "Map<ServerLocationAndMemberId, LoadHolder>"
> object. If the keys of that map have the same host and port, they are only
> different on the memberId. But as you dont know it (you just have
> currentServer which contains host and port), you cannot get the correct
> LoadHolder value, so you cannot know if your server is the most loaded.
>
>     Again, given your use case the behavior of this method is lost when a
> new connection is establish by the pool through the shared hostname anyway.
>
>     @Jacob: I think the solution finally implies that client have to know
> the memberId, I think we could simplify the maps.
>
>     The client isn’t keeping these load maps, the locator is, and the
> locator knows all the member ids. The client end only needs to know the
> host/port combination. In your example where the wan replication (a client
> to the remote cluster) connects to the shared host/port service and get
> randomly routed to one of the backend servers in that service.
>
>     All of this locator balancing code is unnecessarily in this model
> where something else is choosing the final destination. The goal of our
> proposed changes was to recognize that all we need is to make sure the
> locator keeps the shared ServerLocation alive in its responses to clients
> by tracking the members associated and reducing that set to the set of unit
> ServerLocations. In your case that will always reduce to 1 ServerLocation
> for N number of members, as long as 1 member is still up.
>
>     -Jake
>
>
>
>
>
>

Reply via email to