[ 
https://issues.apache.org/jira/browse/HDFS-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931209#action_12931209
 ] 

Todd Lipcon commented on HDFS-1362:
-----------------------------------

We had a brief meeting this morning to discuss this JIRA. To summarize for the 
community:

- Having the ability to add/remove volumes via RPC has the issue that the 
changes are not reflected in the config file, so we risk that an admin may add 
a volume but forgot to modify the config. The next time the cluster is 
restarted, the volume will be missing and cause problems.
- We discussed that the primary use case for this feature is restoring a volume 
after it has failed. The other use case (adding a new volume to a DN that has 
not suffered any issues) is rather rare.
- So, rather than providing add/list/remove APIs, we decided to simply add a 
"refresh" API. There were two options suggested here:
1. Make use of the new HADOOP-7001 interface for reconfiguring daemons. In this 
case an admin could modify the config file to add new volumes, and then refresh 
the config to have the DN pick up new volumes, or re-add failed volumes. The 
potential issue here is that, even if the configuration has not changed, we 
still want the "refresh" to do something, so maybe this is not the right place.
2. Add a new RPC and command line tool, something like "dfsadmin 
-restoreDNStorage <datanode IP:port>". This would not re-read the conf file, 
but rather just re-check any failed volumes to see if they are newly available. 
This could alternatively be triggered by a new DN servlet or something if it's 
simpler.

- We also discussed pluggability (HDFS-1405). Tom and I were of the opinion 
that this feature is generally useful and don't see any compelling reason to 
make it a plugin. We should just improve FSDataset directly instead of 
extending it into a new java class.
- Regarding the new feature of copying blocks from volume to volume in the case 
that one volume has gone read-only, we decided that we should defer this to a 
separate JIRA to be implemented after this is complete. That will make this one 
smaller and easier to review.

> Provide volume management functionality for DataNode
> ----------------------------------------------------
>
>                 Key: HDFS-1362
>                 URL: https://issues.apache.org/jira/browse/HDFS-1362
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: data-node
>            Reporter: Wang Xu
>            Assignee: Wang Xu
>         Attachments: HDFS-1362.txt, Provide_volume_management_for_DN_v1.pdf
>
>
> The current management unit in Hadoop is a node, i.e. if a node failed, it 
> will be kicked out and all the data on the node will be replicated.
> As almost all SATA controller support hotplug, we add a new command line 
> interface to datanode, thus it can list, add or remove a volume online, which 
> means we can change a disk without node decommission. Moreover, if the failed 
> disk still readable and the node has enouth space, it can migrate data on the 
> disks to other disks in the same node.
> A more detailed design document will be attached.
> The original version in our lab is implemented against 0.20 datanode 
> directly, and is it better to implemented it in contrib? Or any other 
> suggestion?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to