Todd Lipcon has posted comments on this change. Change subject: design-docs: multi-master for 1.0 release ......................................................................
Patch Set 4: (9 comments) http://gerrit.cloudera.org:8080/#/c/2527/4/docs/design-docs/multi-master-1.0.md File docs/design-docs/multi-master-1.0.md: Line 74: 4. Replica instance. Includes a link to the tserver, the state of the replica it's true there is a TabletReplica map, but this is a struct that is never pointed-to (it's by-value in the map). I would probably just describe this as part of the information of the Tablet instance. eg ("general metadata about the tablet, along with its current set of replicas") Line 215: It would be nice to implement this for Kudu 1.0, but it's not a strict > Yeah I don't think this feature would be driven by a pressing need. agreed, we haven't seen scalability/perf issues on the current design, so should be out of scope Line 246: #### Follower masters must treat heartbeats as no-ops seems contradictory - you aren't treating it quite as a no-op -- you're treating it only as a liveness indicator, right? Line 275: master replicates via Raft before triggering a state change. It doesn't I think there's another alternative: on startup, the tserver sends a heartbeat to all masters (or the "register" RPC or whatever). It waits until it has received a response from a majority of them. The masters respond including the term of the last committed operation (or maybe just the current term). The tservers can initialize their current term based on the majority response here. In other words, we already know that by the guarantees of raft, if there is a new leader elected, and we talk to a majority, at least one of those majority would know about the newest leader. This avoids having to do any persistent state on the TS side. Line 278: allow for replicating no-ops (i.e. log entries that needn't be persisted). > Actually, we do have this in Raft; heartbeats serve this function. I think leasing on its own would be enough to be safe. In the example in the doc: - Master 1 (M1) is leader. - M1 loses leadership to M2, but thinks it is still leader - M2 sends a 'create tablet' command to TS -- before doing so, M2 must be fully a leader (meaning that it has waited at least a lease period) - TS restarts - TS contacts M1 -- M1 checks its lease _before_ sending a 'Delete' call. Since we already assured that M2 waited a lease period, then we know that M1 has also lost its lease, and won't be able to send Delete. Lease algorithms like this are tolerant of clocks being unsynchronized and message delays, just so long as the clock _rates_ are within some percentage tolerance. Line 327: To fix KUDU-1353, the per-tablet replica locations should be rebuilt on Maybe we should do away with this information and just use the RaftConfigPB for the tablet, and assume that the tservers have what we think they have? I'm not sure the TabletReplica map buys us much, and it gets confusing about when to use that vs just trusting the config Line 417: 1. Adding a new global in-memory unordered_set to "lock" a table's name > what does this "lock" actually do? Prevent visibility? We want visibility o yea, I tend to agree that for many operations, an in-progress state is useful to expose Line 420: LockManager for this, as it has no real tablet dependencies per some conversation we had last week, I would rather avoid adding more locks unless it really buys us a lot. (where "a lot" != avoiding some super rarely-experienced semantic which doesnt crash but is just strange) Line 423: #### Background task scanning should we evaluate getting rid of the "backgrund tasks" thread and instead treat it more like the way we do alter/delete, with a "retrying task" that just runs and retries until successful? and "kick off" these tasks with a single scan when we become active? -- To view, visit http://gerrit.cloudera.org:8080/2527 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iad76012977a45370b72a04d608371cecf90442ef Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar Dembo <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Dan Burkert <[email protected]> Gerrit-Reviewer: David Ribeiro Alves <[email protected]> Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-HasComments: Yes
