[jira] Commented: (HADOOP-681) Adminstrative hook to pull live nodes out of a HDFS cluster

Konstantin Shvachko (JIRA) Mon, 06 Nov 2006 13:42:01 -0800

    [ 
http://issues.apache.org/jira/browse/HADOOP-681?page=comments#action_12447543 ] 
            
Konstantin Shvachko commented on HADOOP-681:
--------------------------------------------


1. I'd replace 3 new methods in ClientProtocol by one with a 3 valued parameter.
2. You do not need a new member for counting already re-replicated blocks.
You can count blocks in the DatanodeDescriptor.
When the set of blocks is empty replication is finished.
3. Name-node can send shutdown command to the data-node after that
(in reply to the heartbeat). The command is implemented but has not been used 
yet.
4. Inside name-node DatanodeDescriptor should be used instead of DatanodeInfo.

> Adminstrative hook to pull live nodes out of a HDFS cluster
> -----------------------------------------------------------
>
>                 Key: HADOOP-681
>                 URL: http://issues.apache.org/jira/browse/HADOOP-681
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>    Affects Versions: 0.8.0
>            Reporter: dhruba borthakur
>         Assigned To: dhruba borthakur
>
> Introduction
> ------------
> An administrator sometimes needs to bring down a datanode for scheduled 
> maintenance. It would be nice if HDFS can be informed about this event. On 
> receipt of this event, HDFS can take steps so that HDFS data is not lost when 
> the node goes down at a later time.
> Architecture
> -----------
> In the existing architecture, a datanode can be in one of two states: dead or 
> alive. A datanode is alive if its heartbeats are being processed by the 
> namenode. Otherwise that datanode is in dead state. We extend the 
> architecture to introduce the concept of a tranquil state for a datanode.
> A datanode is in tranquil state if:
>     - it cannot be a target for replicating any blocks
>     - any block replica that it currently contains does not count towards the 
> target-replication-factor of that block
> Thus, a node that is in tranquil state can be brought down without impacting 
> the guarantees provided by HDFS.
> The tranquil state is not persisted across namenode restarts. If the namenode 
> restarts then that datanode will go back to being in the dead or alive state.
> The datanode is completely transparent to the fact that it has been labeled 
> as being in tranquil state. It can continue to heartbeat and serve read 
> requests for datablocks.
> DFSShell Design
> -----------------------
> We extend the DFS Shell utility to specify a list of nodes to the namenode.
>     hadoop dfs -tranquil {set|clear|get} datanodename1 [,datanodename2]
> The DFSShell utility sends this list to the namenode. This DFSShell command 
> invoked with the "set" option completes when the list is transferred to the 
> namenode. This command is non-blocking; it returns before the datanode is 
> actually in the tranquil state. The client can then query the state by 
> re-issuing the command with the "get" option. This option will indicate 
> whether the datanode is in tranquil state or is "being tranquiled". The 
> "clear" option is used to transition a tranquil datanode to the alive state. 
> The "clear" option is a no-op if the datanode is not in the "tranquil" state.
> ClientProtocol Design
> --------------------
> The ClientProtocol is the protocol exported by the namenode for its client.
> This protocol is extended to incorporate three new methods:
>    ClientProtocol.setTranquil(String[] datanodes)
>    ClientProtocol.getTranquil(String datanode)
>    ClientProtocol.clearTranquil(String[] datanodes)
> The ProtocolVersion is incremented to prevent conversations between 
> imcompatible clients and servers. An old DFSShell cannot talk to the new 
> NameNode and vice-versa.
> NameNode Design
> -------------------------
> The namenode does the bulk of the work for supporting this new feature.
> The DatanodeInfo object has a new private member named "state". It also has 
> three new member functions:
>     datanodeInfo.tranquilStarted(): start the process of tranquilization
>     datanodeInfo.tranquilCompleted(): node is not in tranquil state
>     datanodeInfo.clearTranquil() : remove tranquilization from node
> The namenode exposes a new API to set and clear tranquil states for a 
> datanode. On receipt of a "set tranquil" command, it invokes 
> datanodeInfo.tranquilStarted().
> The FSNamesystem.chooseTarget() method skips over datanodes that are marked 
> as being in the "tranquil" state. This ensures that tranquil-datanodes are 
> never chosen as targets of replication. The namenode does *not* record
> this operation in either the FsImage or the EditLogs.
> The namenode puts all the blocks from a being-tranquiled node into the 
> neededReplication data structure. Necessary code changes are made to ensure 
> that these blocks get replicated by the regular replication method. As of 
> now, the regular replication code does not distinguish between these blocks 
> and the blocks that are replication candidates because some other datanode 
> might have died. It might be prudent to give different (lower?) weightage to 
> this type of replication requests, but that exercise is deferred to a later 
> date. In this design, replication requests generated because of a node going 
> to a tranquil state are not distinguished from replication requests generated 
> by a datanode going to the dead state.
> The DatanodeInfo object has another new private member named 
> "pendingTranquilCount". This field stores the remaining number of blocks that 
> still remain to be replicated. This field is valid only if the node is in the 
> ets being-tranquiled state.  On receipt of every 'n' heartbeats from the 
> being-tranquiled datanode, the namenode calculates the amount of data that is 
> still remaining to be replicated and updates the "pendingTranquilCount". in 
> the DatanodeInfo.When all the replications complete, the datanode is marked 
> as tranquiled. The number 'n' is selected in such a way that the average 
> heartbeat processing time does not increase appreciably.
> It is possible that the namenode might stop receving heartbeats from a 
> datanode that is being-tranquiled. In this case,   the tranquil flag of the 
> datanode gets cleared. It transitions to the dead state and the normal 
> processing for alive-to-dead transition occurs here.
> Web Interface
> -------------------
> The dfshealth.jsp displays the live nodes, dead nodes, being-tranquiled and 
> tranquil nodes. For nodes in the being-tranquiled state, it displays the 
> percentage of tranquilization completed till now.
> Issues
> --------
> 1. If a request for tranquilization starts getting processed and there aren't 
> enough space available in DFS to complete the necessary replication, then 
> that node might remain in the being-tranquiled state for a long long time. 
> This is not necessarily a bad thing but is there a better option?
> 2. We have opted for not storing cluster configuration information in the 
> persistent image of the file system. (The tranquil state of a datanode may be 
> lost if the namenode restarts).
>  

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-681) Adminstrative hook to pull live nodes out of a HDFS cluster

Reply via email to