[jira] Commented: (HADOOP-681) Adminstrative hook to pull live nodes out of a HDFS cluster

dhruba borthakur (JIRA) Fri, 01 Dec 2006 06:35:47 -0800

    [ 
http://issues.apache.org/jira/browse/HADOOP-681?page=comments#action_12454915 ] 
            
dhruba borthakur commented on HADOOP-681:
-----------------------------------------


6. I would prefer to keep the name "adminstate" unless you strongly insist 
otherwise. There isn't any impact to the system when no nodes are being 
decommissioned.

7. The DatanodeDescriptor is persisted in the fsimage whereas DatanodeReport is 
an object used by the ClientProtocol. In theory, I would like to keep them 
separate. The first one is an on-disk structure whereas the second one is a 
over-the-wire-protocol structure. In fact, at a later date, I was planning on 
changing the webUI to use Datanode Report. 

  My theory is to keep two different object for two different purposes, 
especially because their serialization requirements are different. The first 
oject DatanodeDescriptor gets written to disk while the second one 
DatanodeReport gets passed back to the client. In the DatanodeReport, I can 
visualize that other parameters will be added in the future, e.g. we could be 
adding computed statistics into the Report. These computed statistics might 
have to computed using a variety of namenode-data structures (instead of just 
using DatanodeDescriptor).

> Adminstrative hook to pull live nodes out of a HDFS cluster
> -----------------------------------------------------------
>
>                 Key: HADOOP-681
>                 URL: http://issues.apache.org/jira/browse/HADOOP-681
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>    Affects Versions: 0.8.0
>            Reporter: dhruba borthakur
>         Assigned To: dhruba borthakur
>         Attachments: nodedecommission2.patch
>
>
> Introduction
> ------------
> An administrator sometimes needs to bring down a datanode for scheduled 
> maintenance. It would be nice if HDFS can be informed about this event. On 
> receipt of this event, HDFS can take steps so that HDFS data is not lost when 
> the node goes down at a later time.
> Architecture
> -----------
> In the existing architecture, a datanode can be in one of two states: dead or 
> alive. A datanode is alive if its heartbeats are being processed by the 
> namenode. Otherwise that datanode is in dead state. We extend the 
> architecture to introduce the concept of a tranquil state for a datanode.
> A datanode is in tranquil state if:
>     - it cannot be a target for replicating any blocks
>     - any block replica that it currently contains does not count towards the 
> target-replication-factor of that block
> Thus, a node that is in tranquil state can be brought down without impacting 
> the guarantees provided by HDFS.
> The tranquil state is not persisted across namenode restarts. If the namenode 
> restarts then that datanode will go back to being in the dead or alive state.
> The datanode is completely transparent to the fact that it has been labeled 
> as being in tranquil state. It can continue to heartbeat and serve read 
> requests for datablocks.
> DFSShell Design
> -----------------------
> We extend the DFS Shell utility to specify a list of nodes to the namenode.
>     hadoop dfs -tranquil {set|clear|get} datanodename1 [,datanodename2]
> The DFSShell utility sends this list to the namenode. This DFSShell command 
> invoked with the "set" option completes when the list is transferred to the 
> namenode. This command is non-blocking; it returns before the datanode is 
> actually in the tranquil state. The client can then query the state by 
> re-issuing the command with the "get" option. This option will indicate 
> whether the datanode is in tranquil state or is "being tranquiled". The 
> "clear" option is used to transition a tranquil datanode to the alive state. 
> The "clear" option is a no-op if the datanode is not in the "tranquil" state.
> ClientProtocol Design
> --------------------
> The ClientProtocol is the protocol exported by the namenode for its client.
> This protocol is extended to incorporate three new methods:
>    ClientProtocol.setTranquil(String[] datanodes)
>    ClientProtocol.getTranquil(String datanode)
>    ClientProtocol.clearTranquil(String[] datanodes)
> The ProtocolVersion is incremented to prevent conversations between 
> imcompatible clients and servers. An old DFSShell cannot talk to the new 
> NameNode and vice-versa.
> NameNode Design
> -------------------------
> The namenode does the bulk of the work for supporting this new feature.
> The DatanodeInfo object has a new private member named "state". It also has 
> three new member functions:
>     datanodeInfo.tranquilStarted(): start the process of tranquilization
>     datanodeInfo.tranquilCompleted(): node is not in tranquil state
>     datanodeInfo.clearTranquil() : remove tranquilization from node
> The namenode exposes a new API to set and clear tranquil states for a 
> datanode. On receipt of a "set tranquil" command, it invokes 
> datanodeInfo.tranquilStarted().
> The FSNamesystem.chooseTarget() method skips over datanodes that are marked 
> as being in the "tranquil" state. This ensures that tranquil-datanodes are 
> never chosen as targets of replication. The namenode does *not* record
> this operation in either the FsImage or the EditLogs.
> The namenode puts all the blocks from a being-tranquiled node into the 
> neededReplication data structure. Necessary code changes are made to ensure 
> that these blocks get replicated by the regular replication method. As of 
> now, the regular replication code does not distinguish between these blocks 
> and the blocks that are replication candidates because some other datanode 
> might have died. It might be prudent to give different (lower?) weightage to 
> this type of replication requests, but that exercise is deferred to a later 
> date. In this design, replication requests generated because of a node going 
> to a tranquil state are not distinguished from replication requests generated 
> by a datanode going to the dead state.
> The DatanodeInfo object has another new private member named 
> "pendingTranquilCount". This field stores the remaining number of blocks that 
> still remain to be replicated. This field is valid only if the node is in the 
> ets being-tranquiled state.  On receipt of every 'n' heartbeats from the 
> being-tranquiled datanode, the namenode calculates the amount of data that is 
> still remaining to be replicated and updates the "pendingTranquilCount". in 
> the DatanodeInfo.When all the replications complete, the datanode is marked 
> as tranquiled. The number 'n' is selected in such a way that the average 
> heartbeat processing time does not increase appreciably.
> It is possible that the namenode might stop receving heartbeats from a 
> datanode that is being-tranquiled. In this case,   the tranquil flag of the 
> datanode gets cleared. It transitions to the dead state and the normal 
> processing for alive-to-dead transition occurs here.
> Web Interface
> -------------------
> The dfshealth.jsp displays the live nodes, dead nodes, being-tranquiled and 
> tranquil nodes. For nodes in the being-tranquiled state, it displays the 
> percentage of tranquilization completed till now.
> Issues
> --------
> 1. If a request for tranquilization starts getting processed and there aren't 
> enough space available in DFS to complete the necessary replication, then 
> that node might remain in the being-tranquiled state for a long long time. 
> This is not necessarily a bad thing but is there a better option?
> 2. We have opted for not storing cluster configuration information in the 
> persistent image of the file system. (The tranquil state of a datanode may be 
> lost if the namenode restarts).
>  

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-681) Adminstrative hook to pull live nodes out of a HDFS cluster

Reply via email to