[ https://issues.apache.org/jira/browse/LIVY-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17006882#comment-17006882 ]
Bikas Saha commented on LIVY-718: --------------------------------- {quote}JDBC part maintains lots of results/metadata/information on LivyServer {quote} Might be ok to do it in phases where the JDBC support is not present initially (like its mentioned in the document also). {quote}there're M server and N session. In designate solution, there're N connections. In stateless solution, there're M x N connections {quote} Yes that would be the worst case where for some reason all servers are connected to all spark drivers. But with clients and load balancers maintaining sticky sessions one would typically not expect that to happen. At the end of the day the number of connections needed (ie not dropped because of inactivity) would be proportional to the number of clients. Also, this does not preclude the possibility of allowing for pseudo designated servers where the clients could be redirected to preferred servers (using similar consistent hashing). Or if a server is already connected to a driver then other servers could redirect clients to that server. Or if a server is overloaded then it could request the client to choose a different server (dynamic load balancing). These variations can be allowed (over time) while at the same time having the flexibility to not need it as servers scale up and down based on actual client load. Requiring a single designated server increases the complexity while not providing the same flexibility. > Support multi-active high availability in Livy > ---------------------------------------------- > > Key: LIVY-718 > URL: https://issues.apache.org/jira/browse/LIVY-718 > Project: Livy > Issue Type: Epic > Components: RSC, Server > Reporter: Yiheng Wang > Priority: Major > > In this JIRA we want to discuss how to implement multi-active high > availability in Livy. > Currently, Livy only supports single node recovery. This is not sufficient in > some production environments. In our scenario, the Livy server serves many > notebook and JDBC services. We want to make Livy service more fault-tolerant > and scalable. > There're already some proposals in the community for high availability. But > they're not so complete or just for active-standby high availability. So we > propose a multi-active high availability design to achieve the following > goals: > # One or more servers will serve the client requests at the same time. > # Sessions are allocated among different servers. > # When one node crashes, the affected sessions will be moved to other active > services. > Here's our design document, please review and comment: > https://docs.google.com/document/d/1bD3qYZpw14_NuCcSGUOfqQ0pqvSbCQsOLFuZp26Ohjc/edit?usp=sharing > -- This message was sent by Atlassian Jira (v8.3.4#803005)