Re: WAN replication issue in cloud native environments

Sai Boorlagadda Fri, 06 Dec 2019 08:34:12 -0800

> if one gw receiver stops, the locator will publish to any remote locator
that there are no receivers up.


I am not sure if locators proactively update remote locators about change
in receivers list rather I think the senders figures this out on connection
issues.
But I see the problem that local-site locators have only one member in the
list of receivers that they maintain as all receivers register with a
single <hostname:port> address.

One idea I had earlier is to statically set receivers list to locators
(just like remote-locators property) which are exchanged with gw-senders.
This way we can introduce a boolean flag to turn off wan discovery and use
the statically configured addresses. This can be also useful for
remote-locators if they are behind a service.

Sai

On Thu, Dec 5, 2019 at 2:33 AM Alberto Bustamante Reyes
<alberto.bustamante.re...@est.tech> wrote:

> Thanks Charlie, but the issue is not about connectivity. Summarizing the
> issue, the problem is that if you have two or more gw receivers that are
> started with the same value of "hostname-for-senders", "start-port" and
> "end-port" (being "start-port" and "end-port" equal) parameters, if one gw
> receiver stops, the locator will publish to any remote locator that there
> are no receivers up.
>
> And this use case is likely to happen on cloud-native environments, as
> described.
>
> BR/
>
> Alberto B.
> ________________________________
> De: Charlie Black <cbl...@pivotal.io>
> Enviado: miércoles, 4 de diciembre de 2019 18:11
> Para: dev@geode.apache.org <dev@geode.apache.org>
> Asunto: Re: WAN replication issue in cloud native environments
>
> Alberto,
>
> Something else to think about SNI based routing.   I believe Mario might be
> working on adding SNI to Geode - he at least had a proposal that he
> e-mailed out.
>
> Basics are the destination host is in the SNI field and the proxy can
> inspect and route the request to the right service instance.     Plus we
> have the option to not terminate the SSL at the proxy.
>
> Full disclosure - I haven't tried out SNI based routing myself and it is
> something that I thought could work as I was reading about it.   From the
> whiteboard I have done I think this will do ingress and egress just fine.
> Potentially easier then port mapping and `hostname for clients` playing
> around.
>
> Just something to think about.
>
> Charlie
>
>
> On Wed, Dec 4, 2019 at 3:19 AM Alberto Bustamante Reyes
> <alberto.bustamante.re...@est.tech> wrote:
>
> > Hi Jacob,
> >
> > Yes,we are using LoadBalancer service type. But note the problem is not
> > the transport layer but on Geode as GW senders are complaining
> > “sender-2-parallel : Could not connect due to: There are no active
> > servers.” when one of the servers in the receiving cluster is killed.
> >
> > So, there is still one server alive in the receiving cluster but GW
> sender
> > does not know it and the locator is not able to inform about its
> existence.
> > Looking at the code it seems internal data structures (maps) holding the
> > profiles use object whose equality check relies only on hostname and
> port.
> > This makes it impossible to differentiate servers when the same
> > “hostname-for-senders” and port are used. When the killed server comes
> back
> > up, the locator profiles are updated (internal map back to size()=1
> > although 2+ servers are there) and GW senders happily reconnect.
> >
> > The solution with the Geode as-is would be to expose each GW receiver on
> a
> > different port outside of k8s cluster, this includes creating N
> Kubernetes
> > services for N GW receivers in addition to updating the service mesh
> > configuration (if it is used, firewalls etc…). Declarative nature of
> > kubernetes means we must know the ports in advance hence start-port and
> > end-port when creating each GW receiver must be equal and we should have
> > some well-known
> > algorithm when creating GW receivers across servers. For example:
> server-0
> > port 5000, server-1 port 5001, server-2 port 5002 etc…. So, all GW
> > receivers must be wired individually and we must turn off Geode’s random
> > port allocation.
> >
> > But we are exploring the possibility for Geode to handle this
> cloud-native
> > configuration a bit better. Locators should be capable of holding GW
> > receiver information although they are hidden behind same hostname and
> port.
> > This is a code change in Geode and we would like to have community
> opinion
> > on it.
> >
> > Some obvious impacts with the legacy behavior would be when locator picks
> > a server on behalf of the client (GW sender in this case) it does so
> based
> >  on the server load. When sender connects and considering all servers are
> > using same VIP:PORT it is load balancer that will decide where the
> > connection will end up, but likely not on the one selected by locator. So
> > here we ignore the locator instructions. Since GW senders normally do not
> > create huge number of connections this probably shall not unbalance
> cluster
> > too much. But this is an impact worth considering. Custom load metrics
> > would also be ignored by GW senders. Opinions?
> >
> > Additional impact that comes to mind is GW sender load-balance command
> and
> > how it’s execution would be affected.
> >
> > Thanks!
> >
> > Alberto B.
> >
> > ________________________________
> > De: Jacob Barrett <jbarr...@pivotal.io>
> > Enviado: viernes, 29 de noviembre de 2019 13:06
> > Para: dev@geode.apache.org <dev@geode.apache.org>
> > Asunto: Re: WAN replication issue in cloud native environments
> >
> >
> >
> > > On Nov 29, 2019, at 3:14 AM, Alberto Bustamante Reyes
> > <alberto.bustamante.re...@est.tech> wrote:
> > >
> > > The reason for such a setup is deploying Geode cluster on a Kubernetes
> > cluster where all GW receivers are reachable from the outside world on
> the
> > same VIP and port.
> >
> > Are you using LoadBalancer Service type?
> >
> > > Other kinds of configuration (different hostname and/or different port
> > for each GW receiver) are not cheap from OAM and resources perspective in
> > cloud native environments and also limit some important use-cases (like
> > scaling).
> >
> > If you could somehow configure host and port for sender (code
> modification
> > required) would exposing each port through the LoadBalancer be too
> > expensive too?
> >
> > > The problem experienced is that shutting down one server is stopping
> > replication to this cluster until the server is up again. We suspect this
> > is because Geode incorrectly assumes there are no more alive servers when
> > just one of them is down (since they share hostname-for-senders and
> port).
> >
> > Sees like at the worst case when it tries to reconnect the LB should give
> > it a live server and it think the single server is back up.
> >
> > -Jake
> >
> >
>
> --
> Charlie Black | cbl...@pivotal.io
>

Re: WAN replication issue in cloud native environments

Reply via email to