> if one gw receiver stops, the locator will publish to any remote locator that there are no receivers up.
I am not sure if locators proactively update remote locators about change in receivers list rather I think the senders figures this out on connection issues. But I see the problem that local-site locators have only one member in the list of receivers that they maintain as all receivers register with a single <hostname:port> address. One idea I had earlier is to statically set receivers list to locators (just like remote-locators property) which are exchanged with gw-senders. This way we can introduce a boolean flag to turn off wan discovery and use the statically configured addresses. This can be also useful for remote-locators if they are behind a service. Sai On Thu, Dec 5, 2019 at 2:33 AM Alberto Bustamante Reyes <alberto.bustamante.re...@est.tech> wrote: > Thanks Charlie, but the issue is not about connectivity. Summarizing the > issue, the problem is that if you have two or more gw receivers that are > started with the same value of "hostname-for-senders", "start-port" and > "end-port" (being "start-port" and "end-port" equal) parameters, if one gw > receiver stops, the locator will publish to any remote locator that there > are no receivers up. > > And this use case is likely to happen on cloud-native environments, as > described. > > BR/ > > Alberto B. > ________________________________ > De: Charlie Black <cbl...@pivotal.io> > Enviado: miércoles, 4 de diciembre de 2019 18:11 > Para: dev@geode.apache.org <dev@geode.apache.org> > Asunto: Re: WAN replication issue in cloud native environments > > Alberto, > > Something else to think about SNI based routing. I believe Mario might be > working on adding SNI to Geode - he at least had a proposal that he > e-mailed out. > > Basics are the destination host is in the SNI field and the proxy can > inspect and route the request to the right service instance. Plus we > have the option to not terminate the SSL at the proxy. > > Full disclosure - I haven't tried out SNI based routing myself and it is > something that I thought could work as I was reading about it. From the > whiteboard I have done I think this will do ingress and egress just fine. > Potentially easier then port mapping and `hostname for clients` playing > around. > > Just something to think about. > > Charlie > > > On Wed, Dec 4, 2019 at 3:19 AM Alberto Bustamante Reyes > <alberto.bustamante.re...@est.tech> wrote: > > > Hi Jacob, > > > > Yes,we are using LoadBalancer service type. But note the problem is not > > the transport layer but on Geode as GW senders are complaining > > “sender-2-parallel : Could not connect due to: There are no active > > servers.” when one of the servers in the receiving cluster is killed. > > > > So, there is still one server alive in the receiving cluster but GW > sender > > does not know it and the locator is not able to inform about its > existence. > > Looking at the code it seems internal data structures (maps) holding the > > profiles use object whose equality check relies only on hostname and > port. > > This makes it impossible to differentiate servers when the same > > “hostname-for-senders” and port are used. When the killed server comes > back > > up, the locator profiles are updated (internal map back to size()=1 > > although 2+ servers are there) and GW senders happily reconnect. > > > > The solution with the Geode as-is would be to expose each GW receiver on > a > > different port outside of k8s cluster, this includes creating N > Kubernetes > > services for N GW receivers in addition to updating the service mesh > > configuration (if it is used, firewalls etc…). Declarative nature of > > kubernetes means we must know the ports in advance hence start-port and > > end-port when creating each GW receiver must be equal and we should have > > some well-known > > algorithm when creating GW receivers across servers. For example: > server-0 > > port 5000, server-1 port 5001, server-2 port 5002 etc…. So, all GW > > receivers must be wired individually and we must turn off Geode’s random > > port allocation. > > > > But we are exploring the possibility for Geode to handle this > cloud-native > > configuration a bit better. Locators should be capable of holding GW > > receiver information although they are hidden behind same hostname and > port. > > This is a code change in Geode and we would like to have community > opinion > > on it. > > > > Some obvious impacts with the legacy behavior would be when locator picks > > a server on behalf of the client (GW sender in this case) it does so > based > > on the server load. When sender connects and considering all servers are > > using same VIP:PORT it is load balancer that will decide where the > > connection will end up, but likely not on the one selected by locator. So > > here we ignore the locator instructions. Since GW senders normally do not > > create huge number of connections this probably shall not unbalance > cluster > > too much. But this is an impact worth considering. Custom load metrics > > would also be ignored by GW senders. Opinions? > > > > Additional impact that comes to mind is GW sender load-balance command > and > > how it’s execution would be affected. > > > > Thanks! > > > > Alberto B. > > > > ________________________________ > > De: Jacob Barrett <jbarr...@pivotal.io> > > Enviado: viernes, 29 de noviembre de 2019 13:06 > > Para: dev@geode.apache.org <dev@geode.apache.org> > > Asunto: Re: WAN replication issue in cloud native environments > > > > > > > > > On Nov 29, 2019, at 3:14 AM, Alberto Bustamante Reyes > > <alberto.bustamante.re...@est.tech> wrote: > > > > > > The reason for such a setup is deploying Geode cluster on a Kubernetes > > cluster where all GW receivers are reachable from the outside world on > the > > same VIP and port. > > > > Are you using LoadBalancer Service type? > > > > > Other kinds of configuration (different hostname and/or different port > > for each GW receiver) are not cheap from OAM and resources perspective in > > cloud native environments and also limit some important use-cases (like > > scaling). > > > > If you could somehow configure host and port for sender (code > modification > > required) would exposing each port through the LoadBalancer be too > > expensive too? > > > > > The problem experienced is that shutting down one server is stopping > > replication to this cluster until the server is up again. We suspect this > > is because Geode incorrectly assumes there are no more alive servers when > > just one of them is down (since they share hostname-for-senders and > port). > > > > Sees like at the worst case when it tries to reconnect the LB should give > > it a live server and it think the single server is back up. > > > > -Jake > > > > > > -- > Charlie Black | cbl...@pivotal.io >