Thanks all for coming! We are able to go over the SCM design today and reach 
agreements for most of the ideas for SCM HA design.

There are also some points we want to dig deeper afterwards.


  1.  Since there are multiple standy SCM followers, how do we select the best 
candidate for the next leader?

Li: we could track versions of DataNodes reports to select the SCM with most 
up-to-date view of DataNodes

Xiaoyu: It could be very complex with large cluster since DataNodes could 
report different versions to all SCMs.

Nanda: We will look deeper into the approach, and HDFS may not have this issue 
since it only has one standy for HA.



  1.  Mukul: What happen if there is one stale SCM leader and one new SCM 
leader? How to make sure they won’t cause duplication for state replication?

Nanda: we could have term to flag the latest SCM leader and we need to make all 
SCM operations are idempotent for safety.



  1.  Xiaoyu: Token support: SCM Token operations. Tokens issued by one SCM can 
be honored by others. It shall follow the same approach as other operations in 
SCM.



  1.  Bharat: More scenarios should be discussed on how to use snapshots for 
failovers.



  1.  Bharat: Does SCM Ratis server start when safemode doesn’t exit?
Li: Yes, so that follower SCMs could use the time to replay the states in 
RaftLog to catch up with the leader states.

Now that we reach the fundemental idea and some principles of SCM HA, we will 
start the implementation and dig deeper into some of the issues along the way.
We will also have separate docs to discuss some more complex issues like how to 
select the best candidate for next SCM leader.

-Li

From: [email protected]
When: 11:00 AM - 12:00 PM May 14, 2020
Subject: Design review for SCM HA
Location: https://cloudera.zoom.us/j/98354045335


SCM HA design review by Li Cheng and Nanda

SCM HA JIRA: https://issues.apache.org/jira/browse/HDDS-2823

SCM HA Design doc: 
https://docs.google.com/document/d/1vr_z6mQgtS1dtI0nANoJlzvF1oLV-AtnNJnxAgg69rM/edit?usp=sharing

Meeting link: https://cloudera.zoom.us/j/98354045335

Welcome to comment on JIRA and doc prior to the review.

Thanks,
Li

Reply via email to