Thanks all for coming! We are able to go over the SCM design today and reach agreements for most of the ideas for SCM HA design.
There are also some points we want to dig deeper afterwards. 1. Since there are multiple standy SCM followers, how do we select the best candidate for the next leader? Li: we could track versions of DataNodes reports to select the SCM with most up-to-date view of DataNodes Xiaoyu: It could be very complex with large cluster since DataNodes could report different versions to all SCMs. Nanda: We will look deeper into the approach, and HDFS may not have this issue since it only has one standy for HA. 1. Mukul: What happen if there is one stale SCM leader and one new SCM leader? How to make sure they won’t cause duplication for state replication? Nanda: we could have term to flag the latest SCM leader and we need to make all SCM operations are idempotent for safety. 1. Xiaoyu: Token support: SCM Token operations. Tokens issued by one SCM can be honored by others. It shall follow the same approach as other operations in SCM. 1. Bharat: More scenarios should be discussed on how to use snapshots for failovers. 1. Bharat: Does SCM Ratis server start when safemode doesn’t exit? Li: Yes, so that follower SCMs could use the time to replay the states in RaftLog to catch up with the leader states. Now that we reach the fundemental idea and some principles of SCM HA, we will start the implementation and dig deeper into some of the issues along the way. We will also have separate docs to discuss some more complex issues like how to select the best candidate for next SCM leader. -Li From: [email protected] When: 11:00 AM - 12:00 PM May 14, 2020 Subject: Design review for SCM HA Location: https://cloudera.zoom.us/j/98354045335 SCM HA design review by Li Cheng and Nanda SCM HA JIRA: https://issues.apache.org/jira/browse/HDDS-2823 SCM HA Design doc: https://docs.google.com/document/d/1vr_z6mQgtS1dtI0nANoJlzvF1oLV-AtnNJnxAgg69rM/edit?usp=sharing Meeting link: https://cloudera.zoom.us/j/98354045335 Welcome to comment on JIRA and doc prior to the review. Thanks, Li
