[ 
https://issues.apache.org/jira/browse/HDFS-664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029786#comment-13029786
 ] 

Bharath Mundlapudi commented on HDFS-664:
-----------------------------------------

Is this Jira similar to this:
https://issues.apache.org/jira/browse/HDFS-1362



> Add a way to efficiently replace a disk in a live datanode
> ----------------------------------------------------------
>
>                 Key: HDFS-664
>                 URL: https://issues.apache.org/jira/browse/HDFS-664
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: data-node
>    Affects Versions: 0.22.0
>            Reporter: Steve Loughran
>         Attachments: HDFS-664.0-20-3-rc2.patch.1, HDFS-664.patch
>
>
> In clusters where the datanode disks are hot swappable, you need to be able 
> to swap out a disk on a live datanode without taking down the datanode. You 
> don't want to decommission the whole node as that is overkill. on a system 
> with 4 1TB HDDs, giving 3 TB of datanode storage, a decommissioning and 
> restart will consume up to 6 TB of bandwidth. If a single disk were swapped 
> in then there would only be 1TB of data to recover over the network. More 
> importantly, if that data could be moved to free space on the same machine, 
> the recommissioning could take place at disk rates, not network speeds. 
> # Maybe have a way of decommissioning a single disk on the DN; the files 
> could be moved to space on the other disks or the other machines in the rack.
> # There may not be time to use that option, in which case pulling out the 
> disk would be done with no warning, a new disk inserted.
> # The DN needs to see that a disk has been replaced (or react to some ops 
> request telling it this), and start using the new disk again -pushing back 
> data, rebuilding the balance. 
> To complicate the process, assume there is a live TT on the system, running 
> jobs against the data. The TT would probably need to be paused while the work 
> takes place, any ongoing work handled somehow. Halting the TT and then 
> restarting it after the replacement disk went in is probably simplest. 
> The more disks you add to a node, the more this scenario becomes a need.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to