[jira] Commented: (HADOOP-681) Adminstrative hook to pull live nodes out of a HDFS cluster

dhruba borthakur (JIRA) Thu, 30 Nov 2006 11:26:45 -0800

    [ 
http://issues.apache.org/jira/browse/HADOOP-681?page=comments#action_12454736 ] 
            
dhruba borthakur commented on HADOOP-681:
-----------------------------------------


Hi Konstantin,

Thanks for your comments. My comments marked with <****>

1. Index: src/java/org/apache/hadoop/dfs/ClientProtocol.java
    public static final long versionID = 4L; You should write a comment 
          <****> Done.


2. Index: src/java/org/apache/hadoop/dfs/DFSClient.java
    may be it is better to use DatanodeID instead of mere String
         <****>The user can specify a name or a name:port. The code will match 
both formats. That's the reason
                  the user input is accepted as a string rather than a 
DatanodeID.

3. Index: src/java/org/apache/hadoop/dfs/DFSAdmin.java
    decommission() does not document the return value; boolean mode is never 
used
    final String safeModeUsage is never used decommission usage does not 
specify data-node parameters,
          <****> Done


4. Index: src/java/org/apache/hadoop/dfs/FSNamesystem.java
    decommissionInProgress() should start with is***(); replicationInProgress() 
should start with is***()
    in startDecommission() and stopDecommission() it is better to call public 
method getName()
    Block decommissionblocks[] should be decommissionBlocks.
         <****> Done


5. Index: src/java/org/apache/hadoop/dfs/DatanodeDescriptor.java
    decommissioned() should start with is***()
    I don't think these constants are used anywhere in the code. They could be 
confused with the enum 
    values having the same names. public static final int NORMAL = 0;
         <****> Done


6. I propose to rename AdminStates to DecommissionState and eliminate NORMAL 
state replacing it by null, 
    where applicable. with a clear (imo) semantics: no decommission - no state.
        <****> I have, on purpose, kept the API to be a little more generic. In 
the future, there could be more
                 adminstrative states of datanodes, e.g. ReadOnly Datanodes (?)


7. Index: src/java/org/apache/hadoop/dfs/DatanodeReport.java
    I don't think this class should be introduced at all.
    DatanodeReport effectively returns an entire DatanodeDescriptor.
        <****> There is one subtle difference between DatanodeReport and 
DatanodeInfo: their serialization 
                  methods. The Namenode serializes DatanodeInfo while writing 
it to the FsImage. I do not want 
                 any FSImage format change at present. The persistentcy of 
adminState will be bundled in with 
                  other FsImage changes at a later date. The DatanodeReport 
class has to report the 
                 adminState to the UI, thus it has to serialize the adminState 
too.

Please let me know if my counter-proposals sound ok.

> Adminstrative hook to pull live nodes out of a HDFS cluster
> -----------------------------------------------------------
>
>                 Key: HADOOP-681
>                 URL: http://issues.apache.org/jira/browse/HADOOP-681
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>    Affects Versions: 0.8.0
>            Reporter: dhruba borthakur
>         Assigned To: dhruba borthakur
>         Attachments: nodedecommission2.patch
>
>
> Introduction
> ------------
> An administrator sometimes needs to bring down a datanode for scheduled 
> maintenance. It would be nice if HDFS can be informed about this event. On 
> receipt of this event, HDFS can take steps so that HDFS data is not lost when 
> the node goes down at a later time.
> Architecture
> -----------
> In the existing architecture, a datanode can be in one of two states: dead or 
> alive. A datanode is alive if its heartbeats are being processed by the 
> namenode. Otherwise that datanode is in dead state. We extend the 
> architecture to introduce the concept of a tranquil state for a datanode.
> A datanode is in tranquil state if:
>     - it cannot be a target for replicating any blocks
>     - any block replica that it currently contains does not count towards the 
> target-replication-factor of that block
> Thus, a node that is in tranquil state can be brought down without impacting 
> the guarantees provided by HDFS.
> The tranquil state is not persisted across namenode restarts. If the namenode 
> restarts then that datanode will go back to being in the dead or alive state.
> The datanode is completely transparent to the fact that it has been labeled 
> as being in tranquil state. It can continue to heartbeat and serve read 
> requests for datablocks.
> DFSShell Design
> -----------------------
> We extend the DFS Shell utility to specify a list of nodes to the namenode.
>     hadoop dfs -tranquil {set|clear|get} datanodename1 [,datanodename2]
> The DFSShell utility sends this list to the namenode. This DFSShell command 
> invoked with the "set" option completes when the list is transferred to the 
> namenode. This command is non-blocking; it returns before the datanode is 
> actually in the tranquil state. The client can then query the state by 
> re-issuing the command with the "get" option. This option will indicate 
> whether the datanode is in tranquil state or is "being tranquiled". The 
> "clear" option is used to transition a tranquil datanode to the alive state. 
> The "clear" option is a no-op if the datanode is not in the "tranquil" state.
> ClientProtocol Design
> --------------------
> The ClientProtocol is the protocol exported by the namenode for its client.
> This protocol is extended to incorporate three new methods:
>    ClientProtocol.setTranquil(String[] datanodes)
>    ClientProtocol.getTranquil(String datanode)
>    ClientProtocol.clearTranquil(String[] datanodes)
> The ProtocolVersion is incremented to prevent conversations between 
> imcompatible clients and servers. An old DFSShell cannot talk to the new 
> NameNode and vice-versa.
> NameNode Design
> -------------------------
> The namenode does the bulk of the work for supporting this new feature.
> The DatanodeInfo object has a new private member named "state". It also has 
> three new member functions:
>     datanodeInfo.tranquilStarted(): start the process of tranquilization
>     datanodeInfo.tranquilCompleted(): node is not in tranquil state
>     datanodeInfo.clearTranquil() : remove tranquilization from node
> The namenode exposes a new API to set and clear tranquil states for a 
> datanode. On receipt of a "set tranquil" command, it invokes 
> datanodeInfo.tranquilStarted().
> The FSNamesystem.chooseTarget() method skips over datanodes that are marked 
> as being in the "tranquil" state. This ensures that tranquil-datanodes are 
> never chosen as targets of replication. The namenode does *not* record
> this operation in either the FsImage or the EditLogs.
> The namenode puts all the blocks from a being-tranquiled node into the 
> neededReplication data structure. Necessary code changes are made to ensure 
> that these blocks get replicated by the regular replication method. As of 
> now, the regular replication code does not distinguish between these blocks 
> and the blocks that are replication candidates because some other datanode 
> might have died. It might be prudent to give different (lower?) weightage to 
> this type of replication requests, but that exercise is deferred to a later 
> date. In this design, replication requests generated because of a node going 
> to a tranquil state are not distinguished from replication requests generated 
> by a datanode going to the dead state.
> The DatanodeInfo object has another new private member named 
> "pendingTranquilCount". This field stores the remaining number of blocks that 
> still remain to be replicated. This field is valid only if the node is in the 
> ets being-tranquiled state.  On receipt of every 'n' heartbeats from the 
> being-tranquiled datanode, the namenode calculates the amount of data that is 
> still remaining to be replicated and updates the "pendingTranquilCount". in 
> the DatanodeInfo.When all the replications complete, the datanode is marked 
> as tranquiled. The number 'n' is selected in such a way that the average 
> heartbeat processing time does not increase appreciably.
> It is possible that the namenode might stop receving heartbeats from a 
> datanode that is being-tranquiled. In this case,   the tranquil flag of the 
> datanode gets cleared. It transitions to the dead state and the normal 
> processing for alive-to-dead transition occurs here.
> Web Interface
> -------------------
> The dfshealth.jsp displays the live nodes, dead nodes, being-tranquiled and 
> tranquil nodes. For nodes in the being-tranquiled state, it displays the 
> percentage of tranquilization completed till now.
> Issues
> --------
> 1. If a request for tranquilization starts getting processed and there aren't 
> enough space available in DFS to complete the necessary replication, then 
> that node might remain in the being-tranquiled state for a long long time. 
> This is not necessarily a bad thing but is there a better option?
> 2. We have opted for not storing cluster configuration information in the 
> persistent image of the file system. (The tranquil state of a datanode may be 
> lost if the namenode restarts).
>  

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-681) Adminstrative hook to pull live nodes out of a HDFS cluster

Reply via email to