[ 
https://issues.apache.org/jira/browse/LIVY-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013965#comment-17013965
 ] 

Saisai Shao commented on LIVY-718:
----------------------------------

[~bikassaha] The merged two sub-tasks, they're actually required by both 
solutions (the one you proposed and another [~yihengw] proposed). That's why I 
merged it beforehand, they're not the key differences for two solutions.

Actually the proposal [~yihengw] made is just the mid-term solution compared to 
stateless Livy Server, the key difference is to:

1. change the time when RSCDriver and Livy Server get reconnection.
2. Refactor the most of the current code to make Livy Server stateless.

I'm more concerned about the 2nd point, because it has lots of works to do, and 
could easily introduce regressions. So IMHO, I think we could move on with the 
current mid-term proposal. If someone else want to pursue a stateless solution, 
they could simply continue based on our current solution, that would take less 
efforts compared to start from scratch.

Just my two cents.


> Support multi-active high availability in Livy
> ----------------------------------------------
>
>                 Key: LIVY-718
>                 URL: https://issues.apache.org/jira/browse/LIVY-718
>             Project: Livy
>          Issue Type: Epic
>          Components: RSC, Server
>            Reporter: Yiheng Wang
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> In this JIRA we want to discuss how to implement multi-active high 
> availability in Livy.
> Currently, Livy only supports single node recovery. This is not sufficient in 
> some production environments. In our scenario, the Livy server serves many 
> notebook and JDBC services. We want to make Livy service more fault-tolerant 
> and scalable.
> There're already some proposals in the community for high availability. But 
> they're not so complete or just for active-standby high availability. So we 
> propose a multi-active high availability design to achieve the following 
> goals:
> # One or more servers will serve the client requests at the same time.
> # Sessions are allocated among different servers.
> # When one node crashes, the affected sessions will be moved to other active 
> services.
> Here's our design document, please review and comment:
> https://docs.google.com/document/d/1bD3qYZpw14_NuCcSGUOfqQ0pqvSbCQsOLFuZp26Ohjc/edit?usp=sharing
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to