[ 
https://issues.apache.org/jira/browse/HDDS-1175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831257#comment-16831257
 ] 

Anu Engineer commented on HDDS-1175:
------------------------------------

[~hanishakoneru] Sorry for commenting so late. I have not been looking at HA 
patches. I have a concern here.

bq. On OM leader, we run a periodic role checked to verify its leader status.
This means that, at the end of the day, it is possible that we do not "know" 
for sure if we are the leader. This suffers from the issue of time of check vs. 
time of access issue. One OM might think that it is a leader when it really is 
not.

Many other systems have used a notion of "Leader Lease" to avoid this problem. 
I have been thinking another way to solve this issue is to read from any 2 
nodes, and if they value of the key does not agree, we can use the later 
version of the key.

Without one of these approaches, OM HA will weaken the current set of strict 
serializability guarantees of OM ( that is OM without HA). Thought I will flag 
this here, for your consideration.


> Serve read requests directly from RocksDB
> -----------------------------------------
>
>                 Key: HDDS-1175
>                 URL: https://issues.apache.org/jira/browse/HDDS-1175
>             Project: Hadoop Distributed Data Store
>          Issue Type: Sub-task
>          Components: Ozone Manager
>            Reporter: Hanisha Koneru
>            Assignee: Hanisha Koneru
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: HDDS-1175.001.patch
>
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> We can directly server read requests from the OM's RocksDB instead of going 
> through the Ratis server. OM should first check its role and only if it is 
> the leader can it server read requests. 
> There can be a scenario where an OM can lose its Leader status but not know 
> about the new election in the ring. This OM could server stale reads for the 
> duration of the heartbeat timeout but this should be acceptable (similar to 
> how Standby Namenode could possibly server stale reads till it figures out 
> the new status).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to