[ https://issues.apache.org/jira/browse/HDFS-15294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yiqun Lin updated HDFS-15294: ----------------------------- Description: This jira introduces a new HDFS federation balance tool to balance data across different federation namespaces. It uses Distcp to copy data from the source path to the target path. The process is: 1. Use distcp and snapshot diff to sync data between src and dst until they are the same. 2. Update mount table in Router if we specified RBF mode. 3. Deal with src data, move to trash, delete or skip them. This The patch is too big to review, so I split it into 2 patches: Phase 1 / The State Machine(BalanceProcedureScheduler): Including the abstraction of job and scheduler model. <See HDFS-15340> {code:java} org.apache.hadoop.hdfs.procedure.BalanceProcedureScheduler; org.apache.hadoop.hdfs.procedure.BalanceProcedureConfigKeys; org.apache.hadoop.hdfs.procedure.BalanceProcedure; org.apache.hadoop.hdfs.procedure.BalanceJob; org.apache.hadoop.hdfs.procedure.BalanceJournal; org.apache.hadoop.hdfs.procedure.HDFSJournal; {code} Phase 2 / The DistCpFedBalance: It's an implementation of BalanceJob. <See HDFS-15346> {code:java} org.apache.hadoop.hdfs.server.federation.procedure.MountTableProcedure; org.apache.hadoop.tools.DistCpFedBalance; org.apache.hadoop.tools.DistCpProcedure; org.apache.hadoop.tools.FedBalance; org.apache.hadoop.tools.FedBalanceConfigs; org.apache.hadoop.tools.FedBalanceContext; org.apache.hadoop.tools.TrashProcedure; {code} was: This jira introduces a new balance command 'fedbalance' that is ran by the administrator. The process is: 1. Use distcp and snapshot diff to sync data between src and dst until they are the same. 2. Update mount table in Router. 3. Delete the src to trash. The patch is too big to review, so I split it into 2 patches: Phase 1 / The State Machine(BalanceProcedureScheduler): Including the abstraction of job and scheduler model. <See HDFS-15340> {code:java} org.apache.hadoop.hdfs.procedure.BalanceProcedureScheduler; org.apache.hadoop.hdfs.procedure.BalanceProcedureConfigKeys; org.apache.hadoop.hdfs.procedure.BalanceProcedure; org.apache.hadoop.hdfs.procedure.BalanceJob; org.apache.hadoop.hdfs.procedure.BalanceJournal; org.apache.hadoop.hdfs.procedure.HDFSJournal; {code} Phase 2 / The DistCpFedBalance: It's an implementation of BalanceJob. <See HDFS-15346> {code:java} org.apache.hadoop.hdfs.server.federation.procedure.MountTableProcedure; org.apache.hadoop.tools.DistCpFedBalance; org.apache.hadoop.tools.DistCpProcedure; org.apache.hadoop.tools.FedBalance; org.apache.hadoop.tools.FedBalanceConfigs; org.apache.hadoop.tools.FedBalanceContext; org.apache.hadoop.tools.TrashProcedure; {code} > Federation balance tool > ----------------------- > > Key: HDFS-15294 > URL: https://issues.apache.org/jira/browse/HDFS-15294 > Project: Hadoop HDFS > Issue Type: New Feature > Reporter: Jinglun > Assignee: Jinglun > Priority: Major > Attachments: BalanceProcedureScheduler.png, HDFS-15294.001.patch, > HDFS-15294.002.patch, HDFS-15294.003.patch, HDFS-15294.003.reupload.patch, > HDFS-15294.004.patch, HDFS-15294.005.patch, HDFS-15294.006.patch, > HDFS-15294.007.patch, distcp-balance.pdf, distcp-balance.v2.pdf > > > This jira introduces a new HDFS federation balance tool to balance data > across different federation namespaces. It uses Distcp to copy data from the > source path to the target path. > The process is: > 1. Use distcp and snapshot diff to sync data between src and dst until they > are the same. > 2. Update mount table in Router if we specified RBF mode. > 3. Deal with src data, move to trash, delete or skip them. > This > The patch is too big to review, so I split it into 2 patches: > Phase 1 / The State Machine(BalanceProcedureScheduler): Including the > abstraction of job and scheduler model. <See HDFS-15340> > {code:java} > org.apache.hadoop.hdfs.procedure.BalanceProcedureScheduler; > org.apache.hadoop.hdfs.procedure.BalanceProcedureConfigKeys; > org.apache.hadoop.hdfs.procedure.BalanceProcedure; > org.apache.hadoop.hdfs.procedure.BalanceJob; > org.apache.hadoop.hdfs.procedure.BalanceJournal; > org.apache.hadoop.hdfs.procedure.HDFSJournal; > {code} > Phase 2 / The DistCpFedBalance: It's an implementation of BalanceJob. <See > HDFS-15346> > {code:java} > org.apache.hadoop.hdfs.server.federation.procedure.MountTableProcedure; > org.apache.hadoop.tools.DistCpFedBalance; > org.apache.hadoop.tools.DistCpProcedure; > org.apache.hadoop.tools.FedBalance; > org.apache.hadoop.tools.FedBalanceConfigs; > org.apache.hadoop.tools.FedBalanceContext; > org.apache.hadoop.tools.TrashProcedure; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org