Supratim Deka created HDDS-1595:
-----------------------------------

             Summary: Handling IO Failures on the Datanode
                 Key: HDDS-1595
                 URL: https://issues.apache.org/jira/browse/HDDS-1595
             Project: Hadoop Distributed Data Store
          Issue Type: Improvement
          Components: Ozone Datanode
            Reporter: Supratim Deka
         Attachments: Raft IO v2.svg

This Jira covers all the changes required to handle IO Failures on the 
Datanode. Handling an IO failure on the Datanode involves detecting failures as 
they happen and propagating the failure to the appropriate component in the 
system - possibly the Client and/or the SCM based on the type of failure.

At a high-level, IO Failure handling has the following goals:
1. Prevent Inconsistencies and corruption - due to non-handling or mishandling 
of failures.
2. Prevent any data loss - timely detection of failure and propagate correct 
error back to the initiator instead of silently dropping the data while the 
client assumes the operation is committed.
3. Contain the disruption in the system - if a disk volume fails on a DN, 
operations to the other nodes and volumes should not be affected.

Details pertaining to design and changes required are covered in the attached 
pdf document.
A sequence diagram used to analyse the Datanode IO Path is also attached, in 
svg format.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to