[ 
https://issues.apache.org/jira/browse/HDFS-15294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-15294:
-----------------------------
    Description: 
This jira introduces a new HDFS federation balance tool to balance data across 
different federation namespaces. It uses Distcp to copy data from the source 
path to the target path.

The process is:
 1. Use distcp and snapshot diff to sync data between src and dst until they 
are the same.
 2. Update mount table in Router if we specified RBF mode.
 3. Deal with src data, move to trash, delete or skip them.

This  

The patch is too big to review, so I split it into 2 patches:

Phase 1 / The State Machine(BalanceProcedureScheduler): Including the 
abstraction of job and scheduler model.   <See HDFS-15340>
{code:java}
org.apache.hadoop.hdfs.procedure.BalanceProcedureScheduler;
org.apache.hadoop.hdfs.procedure.BalanceProcedureConfigKeys;
org.apache.hadoop.hdfs.procedure.BalanceProcedure;
org.apache.hadoop.hdfs.procedure.BalanceJob;
org.apache.hadoop.hdfs.procedure.BalanceJournal;
org.apache.hadoop.hdfs.procedure.HDFSJournal;
{code}
Phase 2 / The DistCpFedBalance: It's an implementation of BalanceJob.    <See 
HDFS-15346>
{code:java}
org.apache.hadoop.hdfs.server.federation.procedure.MountTableProcedure;
org.apache.hadoop.tools.DistCpFedBalance;
org.apache.hadoop.tools.DistCpProcedure;
org.apache.hadoop.tools.FedBalance;
org.apache.hadoop.tools.FedBalanceConfigs;
org.apache.hadoop.tools.FedBalanceContext;
org.apache.hadoop.tools.TrashProcedure;
{code}

  was:
This jira introduces a new balance command 'fedbalance' that is ran by the 
administrator. The process is:
 1. Use distcp and snapshot diff to sync data between src and dst until they 
are the same.
 2. Update mount table in Router.
 3. Delete the src to trash.

 

The patch is too big to review, so I split it into 2 patches:

Phase 1 / The State Machine(BalanceProcedureScheduler): Including the 
abstraction of job and scheduler model.   <See HDFS-15340>
{code:java}
org.apache.hadoop.hdfs.procedure.BalanceProcedureScheduler;
org.apache.hadoop.hdfs.procedure.BalanceProcedureConfigKeys;
org.apache.hadoop.hdfs.procedure.BalanceProcedure;
org.apache.hadoop.hdfs.procedure.BalanceJob;
org.apache.hadoop.hdfs.procedure.BalanceJournal;
org.apache.hadoop.hdfs.procedure.HDFSJournal;
{code}
Phase 2 / The DistCpFedBalance: It's an implementation of BalanceJob.    <See 
HDFS-15346>
{code:java}
org.apache.hadoop.hdfs.server.federation.procedure.MountTableProcedure;
org.apache.hadoop.tools.DistCpFedBalance;
org.apache.hadoop.tools.DistCpProcedure;
org.apache.hadoop.tools.FedBalance;
org.apache.hadoop.tools.FedBalanceConfigs;
org.apache.hadoop.tools.FedBalanceContext;
org.apache.hadoop.tools.TrashProcedure;
{code}


> Federation balance tool
> -----------------------
>
>                 Key: HDFS-15294
>                 URL: https://issues.apache.org/jira/browse/HDFS-15294
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Jinglun
>            Assignee: Jinglun
>            Priority: Major
>         Attachments: BalanceProcedureScheduler.png, HDFS-15294.001.patch, 
> HDFS-15294.002.patch, HDFS-15294.003.patch, HDFS-15294.003.reupload.patch, 
> HDFS-15294.004.patch, HDFS-15294.005.patch, HDFS-15294.006.patch, 
> HDFS-15294.007.patch, distcp-balance.pdf, distcp-balance.v2.pdf
>
>
> This jira introduces a new HDFS federation balance tool to balance data 
> across different federation namespaces. It uses Distcp to copy data from the 
> source path to the target path.
> The process is:
>  1. Use distcp and snapshot diff to sync data between src and dst until they 
> are the same.
>  2. Update mount table in Router if we specified RBF mode.
>  3. Deal with src data, move to trash, delete or skip them.
> This  
> The patch is too big to review, so I split it into 2 patches:
> Phase 1 / The State Machine(BalanceProcedureScheduler): Including the 
> abstraction of job and scheduler model.   <See HDFS-15340>
> {code:java}
> org.apache.hadoop.hdfs.procedure.BalanceProcedureScheduler;
> org.apache.hadoop.hdfs.procedure.BalanceProcedureConfigKeys;
> org.apache.hadoop.hdfs.procedure.BalanceProcedure;
> org.apache.hadoop.hdfs.procedure.BalanceJob;
> org.apache.hadoop.hdfs.procedure.BalanceJournal;
> org.apache.hadoop.hdfs.procedure.HDFSJournal;
> {code}
> Phase 2 / The DistCpFedBalance: It's an implementation of BalanceJob.    <See 
> HDFS-15346>
> {code:java}
> org.apache.hadoop.hdfs.server.federation.procedure.MountTableProcedure;
> org.apache.hadoop.tools.DistCpFedBalance;
> org.apache.hadoop.tools.DistCpProcedure;
> org.apache.hadoop.tools.FedBalance;
> org.apache.hadoop.tools.FedBalanceConfigs;
> org.apache.hadoop.tools.FedBalanceContext;
> org.apache.hadoop.tools.TrashProcedure;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to