[ 
https://issues.apache.org/jira/browse/HADOOP-1912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536388
 ] 

Raghu Angadi commented on HADOOP-1912:
--------------------------------------

Pretty much looks fine.

# I could not find throttling test. 
# Regd throttler : each connection is individually throttled. I think ideally 
we should use one throttler that is used by all connections. This will make 
sure we use up allowed b/w when ever possible. In the current scheme, transfer 
rate betwen A & B can not use extra b/w if another connection between B & C 
cannot use its quota (because C has many connections). Also when throttler is 
shared, small  blocks can not escape below the the radar. 
## Please make ThrottlerBase package private so that it can be used by 
HADOOP-2012
# minor : in FSNamesystem.java : {code}
        ///
        if( priSet.contains(delNodeHint)) {
          cur = delNodeHint;
        } else if(addedNode != null && !priSet.contains(addedNode)){
          cur = delNodeHint;
         }
/// Can be replaced by
       if (   addedNode != null || priSet.contains(delNodeHint) ) {
          cur = delNodeHint;
      }
{code}
# minor : it increases allocation in addBlock() in FSNameSystem.java. Is the 
current implementation more correct?

> Datanode should support block replacement
> -----------------------------------------
>
>                 Key: HADOOP-1912
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1912
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>    Affects Versions: 0.14.1
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>         Attachments: replace.patch, replace1.patch, replace2.patch, 
> replace3.patch
>
>
> This jira Data Node's support for rebalancing (HADOOP-1652). When a balancer 
> decides to move a block B from Source S to Destination D. It also chooses a 
> proxy source PS, which contains a replica of B, to speed up block copy.  The 
> block placement is carried in the following steps:
> 1. A block copy command is sent to datanode PS in the format of  
> "OP_BLOCK_COPY <block_id_of_B> <source S> <destination D>". It requests PS to 
> copy B to datanode D.
> 2. PS then transfers block B to datanode D with a block replacement command 
> to D in the format of "OP_BLOCK_REPLACEMENT <block_id_of_B> <source S> 
> <data_of_B>". 
> 3. Datanode D writes the block B to its disk and then sends a name node a 
> blockReceived RPC informing the namenode that a block B is received and 
> please delete a replica of B from source S if there is any excessive replica.
> 4. The namenode then adds datanode D to block B's map and removes an exesive 
> replicas of B in favor of datanode S.
> In addition, each data node has a limited bandwidth for rebalancing. The 
> default value for the bandwidth is 5MB/s. Throttling is done at both source & 
> destination sides. Each data node limits maximum number of concurrent data 
> transfers (including both sending and receiving) for the rebalancing purpose 
> to be 5. In the worst case, each data transfer has a limited bandwidth of 
> 1MB/s. Each sender & receiver has a Throttler. The primary method of the 
> class is "throttle( int numOfBytes )". The parameter numOfBytes indicates the 
> total number of bytes that the caller has sent or received since the last 
> throttle is called. The method calculates the caller's I/O rate. If the rate 
> is faster than the bandwidth limit, it sleeps to slow down the data transfer. 
> After it wakes up, it adjusts its bandwidth limit if the number of concurrent 
> data transfers is changed. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to