Uma Maheswara Rao G created HDDS-12864: ------------------------------------------
Summary: All commit semantics in replication writes Key: HDDS-12864 URL: https://issues.apache.org/jira/browse/HDDS-12864 Project: Apache Ozone Issue Type: New Feature Components: Ozone Client, Ozone Datanode Reporter: Uma Maheswara Rao G Assignee: Swaminathan Balachandran In Ozone replication(Raft based) has the semantics of majority commit. While this has the advantage of not suffering from a slow node, this will allow the system to move forward with majority replication for a short duration of time without having to have all 3 replicas consistently committed. Catching a slow replica is dependent on a raft. Since Ozone has a variable length of blocks, it is just fine to close the container/block when some of the replicas are not acknowledged in time. So, it does need to recopy the content to new nodes, instead, it can just move forward with a new pipeline. With this advantage, we should provide all commit semantics to make sure all replicas are consistently committed to a length that the client got acknowledgments for. There are two/three areas where we do the majority of commits today: 1. Client falls back to the majority commits in watchForCommit if all commits fail. 2. Leader DN always waits for the majority quorum for transactions 3. The leader only waits for self applyTransaction completion. Making the above scenarios streamlines and achieving all commits can bring all replicas into a consistent state at any point in time, on write acks. As a side note: Today, in EC, we already do ALL COMMIT like protocol in write path. Which avoids QUASI_CLOSED state altogether as replicas always have at the minimum length of data, as it is acknowledged to the clients. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org