Dear All,

I would like to propose Storage Policy Satisfier(SPS) feature merge into trunk. 
We have been working on this feature from last several months. This feature 
received the contributions from different companies. All of the feature 
development happened smoothly and collaboratively in JIRAs.

Detailed design document is available in JIRA: 
Storage-Policy-Satisfier-in-HDFS-June-20-2017.pdf<https://issues.apache.org/jira/secure/attachment/12873642/Storage-Policy-Satisfier-in-HDFS-June-20-2017.pdf>
Test report attached to JIRA: 
HDFS-SPS-TestReport-20170708.pdf<https://issues.apache.org/jira/secure/attachment/12876256/HDFS-SPS-TestReport-20170708.pdf>

Short Description of the feature:-
   Storage Policy Satisfier feature is to aim the distributed HDFS applications 
to schedule the block movements easily.
   When storage policy change happened, user can invoke the 
satisfyStoragePolicy api to trigger the block storage movements.
   Block movement tasks will be assigned to datanodes and movements will happen 
distributed fashion.
   Block level movement tracking also has been distributed to Dns to avoid the 
load on Namenodes.
   A co-ordinator Datanode tracks all the blocks associated to a 
blockCollection and send the consolidated final results to Namenode.
   If movement result is failure, Namenode will re-schedule the block movements.

Development branch is: HDFS-10285
No of JIRAs Resolved: 38
Pending JIRAs: 4 (I don’t think they are blockers for merge)

We have posted combined patch for easy merge reviews. Jenkins job test results 
looking good on the combined patch.
Quick stats on combined Patch:
  67 files changed, 7001 insertions(+), 45 deletions(-)
  Added/modified testcases= ~70


Thanks to all helpers namely Andrew Wang, Anoop Sam John, Du Jingcheng , Ewan 
Higgs, Jing Zhao, Kai Zheng,  Rakesh R, Ramakrishna , Surendra Singh Lilhore , 
Uma Maheswara Rao G, Wei Zhou , Yuanbo Liu. Without these members effort, this 
feature might not have reached to this state.

We will continue work on the following future work items:

  1.  Presently user has to do set & satisfy policy in separate RPC calls. The 
idea is to provide a hybrid API dfs#setStoragePolicy(src, policy) which should 
do set and satisfy in one RPC call to namenode (Reference HDFS-11669)
  2.  Presently BlockStorageMovementCommand sends all the blocks under a 
trackID over single heartbeat response. If blocks are many under a given 
trackID (For example: a file contains many blocks) then that bulk information 
goes to DN in a single network call and come with a lot of overhead. One idea 
is to Use smaller batches of BlockMovingInfo into the block storage movement 
command (Reference HDFS-11125)
  3.  Build a mechanism to throttle the number of concurrent moves at the 
datanode.
  4.  Allow specifying initial delay in seconds before the source file is 
scheduled for satisfying the storage policy. For example in HBase, the interval 
between archive (move files between different storages) and delete file is not 
large. In that case it may not be required to immediately scheduling satisfy 
policy task.
  5.  SPS related metrics to be covered.

So, I feel this branch is ready for merge into trunk. Please provide your 
feedbacks. If there are no objections, I will proceed for voting.

Regards,
Uma & Rakesh

Reply via email to