[ 
https://issues.apache.org/jira/browse/HDDS-12864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G reassigned HDDS-12864:
------------------------------------------

    Assignee: Sergey Soldatov  (was: Swaminathan Balachandran)

> All commit semantics in replication writes 
> -------------------------------------------
>
>                 Key: HDDS-12864
>                 URL: https://issues.apache.org/jira/browse/HDDS-12864
>             Project: Apache Ozone
>          Issue Type: New Feature
>          Components: Ozone Client, Ozone Datanode
>            Reporter: Uma Maheswara Rao G
>            Assignee: Sergey Soldatov
>            Priority: Major
>
> In Ozone replication(Raft based) has the semantics of majority commit. While 
> this has the advantage of not suffering from a slow node, this will allow the 
> system to move forward with majority replication for a short duration of time 
> without having to have all 3 replicas consistently committed. Catching a slow 
> replica is dependent on a raft.
> Since Ozone has a variable length of blocks, it is just fine to close the 
> container/block when some of the replicas are not acknowledged in time. So, 
> it does need to recopy the content to new nodes, instead, it can just move 
> forward with a new pipeline.
> With this advantage, we should provide all commit semantics to make sure all 
> replicas are consistently committed to a length that the client got 
> acknowledgments for.
> There are two/three areas where we do the majority of commits today:
> 1. Client falls back to the majority commits in watchForCommit if all commits 
> fail.
> 2. Leader DN always waits for the majority quorum for transactions
> 3. The leader only waits for self applyTransaction completion.
> Making the above scenarios streamlines and achieving all commits can bring 
> all replicas into a consistent state at any point in time, on write acks.
> As a side note: Today, in EC, we already do ALL COMMIT like protocol in write 
> path. Which avoids QUASI_CLOSED state altogether as replicas always have at 
> the minimum length of data, as it is acknowledged to the clients. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to