[ https://issues.apache.org/jira/browse/ATLAS-511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hemanth Yamijala updated ATLAS-511: ----------------------------------- Attachment: ATLAS-511-1.patch Attaching a patch for review. Over the first patch, the changes are fixes to POST/DELETE redirects and a fix in {{DefaultMetadataService}} for ACTIVE-PASSIVE-ACTIVE transition, javadocs. Review board link: https://reviews.apache.org/r/44890/ > Ability to run multiple instances of Atlas Server with automatic failover to > one active server > ---------------------------------------------------------------------------------------------- > > Key: ATLAS-511 > URL: https://issues.apache.org/jira/browse/ATLAS-511 > Project: Atlas > Issue Type: Sub-task > Reporter: Hemanth Yamijala > Assignee: Hemanth Yamijala > Attachments: ATLAS-511-1.patch, ATLAS-511.patch, HADesign.pdf > > > One of the most important components that only supports active-standby mode > currently is the Atlas server which hosts the API / UI for Atlas. As > described in the [HA > Documentation|http://atlas.incubator.apache.org/0.6.0-incubating/HighAvailability.html], > we currently are limited to running only one instance of the Atlas server > behind a proxy service. If the running instance goes down, a manual process > is required to bring up another instance. > In this JIRA, we propose to have an ability to run multiple Atlas server > instances. However, as a first step, only one of them will be actively > processing requests. To have a consistent terminology, let us call that > server the *master*. Any requests sent to the other servers will be > redirected to the master. > When the master suffers a partition, one of the other servers must > automatically become the master and start processing requests. What this mode > brings us over the current system is the ability to automatically failover > the Atlas server instance without any manual intervention. Note that this > can be arguably called an [active/active > setup|https://en.wikipedia.org/wiki/High-availability_cluster] > ATLAS-488 raised to support multiple active Atlas server instances. While > that would be ideal, we have to learn more about the underlying system > behavior before we can get there, and hopefully we can take smaller steps to > improve the system systematically. The method proposed here is similar to > what is adopted in many other Hadoop components including HDFS NameNode, > HBase HMaster etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)