Hello Dev,
Hope you are doing well !
As we all know, Airavata's popularity is growing as a middleware
provider for HPC clusters, it is time to upgrade our architecture to
meet the demands, described below is one such area which needs attention
and followed by some plausible solutions.
API gateway, which provides abstraction and security to several
underlying micro-services is a single point of failure for accessing the
middleware functionality, it needs to be addressed by introducing a load
balancer and a fault tolerant Software Defined Environment (SDE). We are
trying to implement some solutions and try out the popular stacks, below
is a brief description of the same:
*Environment (SDE)* : An AWS SDE with one AutoScaling group, containing
two spot instances to deploy the api-gateway, fault tolerance is handled
inherently with AutoScaling feature i.e, in the event of failure the a
new instance is spawned automatically with all the data needed to start
the server upon startup, for more info on this, a detailed wiki is
written :
https://github.com/airavata-courses/spring17-API-Server/wiki/Environment-(SDE).
Please note that this environment is meant for development and not
production ready, more features will be added later.
*Load-balancing*: As stage is set for deploying load balancer, below are
some of the plausible combinations we think can be suitable for our
scenario,
1. *Consul* (https://www.consul.io/) for service discovery and *Consul
Template + HAproxy* for load balancing.
2. *Consul + Fabio:* Fabio is a open source software router/load
balancer that directly interacts with Consul to load balance
services. It dynamically updates the services and doesn't require
restart for configuration changes. In that sense it provides true
zero downtime.
3. *Serf + HAproxy: *Serf is one of the core algorithms used in Consul,
in a sense that all servers that have serf installed create a mesh
network and each member is aware of every other member. This is a
highly available network with no masters or slaves, only peers. So
there is no single point of failure.
Your valuable feedback is needed on above mentioned stacks, we are
trying to setup the first two options to compare the results, will keep
you all updated on the progress.
Thanks and best regards,
*Anuj Bhandar*
MS Computer Science
Indiana University Bloomington