Hello Dev,

Hope you are doing well !

As we all know, Airavata's popularity is growing as a middleware provider for HPC clusters, it is time to upgrade our architecture to meet the demands, described below is one such area which needs attention and followed by some plausible solutions.

API gateway, which provides abstraction and security to several underlying micro-services is a single point of failure for accessing the middleware functionality, it needs to be addressed by introducing a load balancer and a fault tolerant Software Defined Environment (SDE). We are trying to implement some solutions and try out the popular stacks, below is a brief description of the same:

*Environment (SDE)* : An AWS SDE with one AutoScaling group, containing two spot instances to deploy the api-gateway, fault tolerance is handled inherently with AutoScaling feature i.e, in the event of failure the a new instance is spawned automatically with all the data needed to start the server upon startup, for more info on this, a detailed wiki is written : https://github.com/airavata-courses/spring17-API-Server/wiki/Environment-(SDE). Please note that this environment is meant for development and not production ready, more features will be added later.

*Load-balancing*: As stage is set for deploying load balancer, below are some of the plausible combinations we think can be suitable for our scenario,

1. *Consul* (https://www.consul.io/) for service discovery and *Consul
   Template + HAproxy* for load balancing.
2. *Consul + Fabio:* Fabio is a open source software router/load
   balancer that directly interacts with Consul to load balance
   services. It dynamically updates the services  and doesn't require
   restart for configuration changes. In that sense it provides true
   zero downtime.
3. *Serf + HAproxy: *Serf is one of the core algorithms used in Consul,
   in a sense that all servers that have serf installed create a mesh
   network and each member is aware of every other member. This is a
   highly available network with no masters or slaves, only peers. So
   there is no single point of failure.

Your valuable feedback is needed on above mentioned stacks, we are trying to setup the first two options to compare the results, will keep you all updated on the progress.

Thanks and best regards,

*Anuj Bhandar*
MS Computer Science
Indiana University Bloomington

Reply via email to