[ https://issues.apache.org/jira/browse/HDFS-15346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17127489#comment-17127489 ]
Yiqun Lin edited comment on HDFS-15346 at 6/7/20, 4:35 AM: ----------------------------------------------------------- Some more detailed review comments: *HdfsConstants.java* Can we rename DOT_SNAPSHOT_SEPARATOR_DIR to the more readable name DOT_SNAPSHOT_DIR_SEPARATOR? *DistCpFedBalance.java* # It would good to print the fed context that created from input options, so that we will know final options that we passed in. {noformat} +. // --> print fed balancer context + // Construct the balance job. + BalanceJob.Builder<BalanceProcedure> builder = new BalanceJob.Builder<>(); + DistCpProcedure dcp = + new DistCpProcedure(DISTCP_PROCEDURE, null, delayDuration, context); + builder.nextProcedure(dcp); {noformat} # We can replace this system out in LOG instance, {noformat} + for (BalanceJob job : jobs) { + if (!job.isJobDone()) { + unfinished++; + } + System.out.println(job); + } {noformat} *DistCpProcedure.java* # The message in IOException(src + " doesn't exist.") not correctly described, should be 'src + " should be the directory."' # For each stage change, can we add aN additional output log, like this: {noformat} + if (srcFs.exists(new Path(src, HdfsConstants.DOT_SNAPSHOT_DIR))) { + throw new IOException(src + " shouldn't enable snapshot."); + } LOG.info("Stage updated from {} to {}.", stage.name(), Stage.INIT_DISTCP.name()) + stage = Stage.INIT_DISTCP; + } {noformat} # Here we reset permission to 0, that means no any operation is allowed? Is this expected, why not is 400 (only allow read)? The comment said that 'cancelling the x permission of the source path.' makes me confused. {noformat} srcFs.setPermission(src, FsPermission.createImmutable((short) 0)); {noformat} # I prefer to throw IOException rather than doing delete operation in cleanUpBeforeInitDistcp. cleanUpBeforeInitDistcp is expected to be the final pre-check function before submitting ditcp job. And let admin users to check and do delete operation manually by themself. {noformat} + private void initialCheckBeforeInitDistcp() throws IOException { + if (dstFs.exists(dst)) { + throw IOException(....); + } + srcFs.allowSnapshot(src); + if (srcFs.exists(new Path(src, + HdfsConstants.DOT_SNAPSHOT_SEPARATOR_DIR + CURRENT_SNAPSHOT_NAME))) { throw IOException(....); + } {noformat} *FedBalanceConfigs.java* Can we move all keys from BalanceProcedureConfigKeys to this class? We don't need two duplicated Config class. One follow-up task I am thinking that we can have a separated config file something named fedbalance-default.xml for fedbalance tool, like ditcp-default.xml for distcp tool now. I don't prefer to add all tool config settings into hdfs-default.xml. *FedBalanceContext.java* Override the toString method in FedBalanceContext to help us know the input options that actually be used. *MountTableProcedure.java* The for loop can just break once we find the first source path that matched. {noformat} + for (MountTable result : results) { + if (mount.equals(result.getSourcePath())) { + existingEntry = result; break; + } + } {noformat} *TrashProcedure.java* {noformat} + /** + * Delete source path to trash. + */ + void moveToTrash() throws IOException { + Path src = context.getSrc(); + if (srcFs.exists(src)) { + switch (context.getTrashOpt()) { + case TRASH: + conf.setFloat(FS_TRASH_INTERVAL_KEY, 1); + if (!Trash.moveToAppropriateTrash(srcFs, src, conf)) { + throw new IOException("Failed move " + src + " to trash."); + } + break; + case DELETE: + if (!srcFs.delete(src, true)) { + throw new IOException("Failed delete " + src); + } + LOG.info("{} is deleted.", src); + break; + default: + break; + } + } + } {noformat} For above lines, two review comments: # Can we add SKIP option check as well and throw unexpected option error? {noformat} case SKIP: break; + default: + throw new IOException("Unexpected trash option=" + context.getTrashOpt()); + } {noformat} # FS_TRASH_INTERVAL_KEY defined with 1 is too small, that means we the trash will be deleted after 1 minute. Can you increased this to 60? Also please add necessary comment in trash option description to say the default trash behavior when trash is disabled in server side and client side value will be used. was (Author: linyiqun): Some more detailed review comments: *HdfsConstants.java* Can we rename DOT_SNAPSHOT_SEPARATOR_DIR to the more readable name DOT_SNAPSHOT_DIR_SEPARATOR? *DistCpFedBalance.java* # It would good to print the fed context that created from input options, so that we will know final options that we passed in. {noformat} +. // --> print fed balancer context + // Construct the balance job. + BalanceJob.Builder<BalanceProcedure> builder = new BalanceJob.Builder<>(); + DistCpProcedure dcp = + new DistCpProcedure(DISTCP_PROCEDURE, null, delayDuration, context); + builder.nextProcedure(dcp); {noformat} # We can replace this system out in LOG instance, {noformat} + for (BalanceJob job : jobs) { + if (!job.isJobDone()) { + unfinished++; + } + System.out.println(job); + } {noformat} *DistCpProcedure.java* # The message in IOException(src + " doesn't exist.") not correctly described, should be 'src + " should be the directory."' # For each stage change, can we add aN additional output log, like this: {noformat} + if (srcFs.exists(new Path(src, HdfsConstants.DOT_SNAPSHOT_DIR))) { + throw new IOException(src + " shouldn't enable snapshot."); + } LOG.info("Stage updated from {} to {}.", stage.name(), Stage.INIT_DISTCP.name()) + stage = Stage.INIT_DISTCP; + } {noformat} # Here we reset permission to 0, that means no any operation is allowed? Is this expected, why not is 400 (only allow read)? The comment said that 'cancelling the x permission of the source path.' makes me confused. {noformat} srcFs.setPermission(src, FsPermission.createImmutable((short) 0)); {noformat} # I prefer to throw IOException rather than doing delete operation in cleanUpBeforeInitDistcp. cleanUpBeforeInitDistcp is expected to be the final pre-check function before submit ditcp job. {noformat} + private void initialCheckBeforeInitDistcp() throws IOException { + if (dstFs.exists(dst)) { + throw IOException(....); + } + srcFs.allowSnapshot(src); + if (srcFs.exists(new Path(src, + HdfsConstants.DOT_SNAPSHOT_SEPARATOR_DIR + CURRENT_SNAPSHOT_NAME))) { throw IOException(....); + } {noformat} *FedBalanceConfigs.java* Can we move all keys from BalanceProcedureConfigKeys to this class? We don't need two duplicated Config class. One follow-up task I am thinking that we can have a separated config file something named fedbalance-default.xml for fedbalance tool, like ditcp-default.xml for distcp tool now. I don't prefer to add all tool config settings into hdfs-default.xml. *FedBalanceContext.java* Override the toString method in FedBalanceContext to help us know the input options that actually be used. *MountTableProcedure.java* The for loop can just break once we find the first source path that matched. {noformat} + for (MountTable result : results) { + if (mount.equals(result.getSourcePath())) { + existingEntry = result; break; + } + } {noformat} *TrashProcedure.java* {noformat} + /** + * Delete source path to trash. + */ + void moveToTrash() throws IOException { + Path src = context.getSrc(); + if (srcFs.exists(src)) { + switch (context.getTrashOpt()) { + case TRASH: + conf.setFloat(FS_TRASH_INTERVAL_KEY, 1); + if (!Trash.moveToAppropriateTrash(srcFs, src, conf)) { + throw new IOException("Failed move " + src + " to trash."); + } + break; + case DELETE: + if (!srcFs.delete(src, true)) { + throw new IOException("Failed delete " + src); + } + LOG.info("{} is deleted.", src); + break; + default: + break; + } + } + } {noformat} For above lines, two review comments: # Can we add SKIP option check as well and throw unexpected option error? {noformat} case SKIP: break; + default: + throw new IOException("Unexpected trash option=" + context.getTrashOpt()); + } {noformat} # FS_TRASH_INTERVAL_KEY defined with 1 is too small, that means we the trash will be deleted after 1 minute. Can you increased this to 60? Also please add necessary comment in trash option description to say the default trash behavior when trash is disabled in server side and client side value will be used. > RBF: DistCpFedBalance implementation > ------------------------------------ > > Key: HDFS-15346 > URL: https://issues.apache.org/jira/browse/HDFS-15346 > Project: Hadoop HDFS > Issue Type: Sub-task > Reporter: Jinglun > Assignee: Jinglun > Priority: Major > Attachments: HDFS-15346.001.patch, HDFS-15346.002.patch, > HDFS-15346.003.patch, HDFS-15346.004.patch, HDFS-15346.005.patch, > HDFS-15346.006.patch, HDFS-15346.007.patch > > > Patch in HDFS-15294 is too big to review so we split it into 2 patches. This > is the second one. Detail can be found at HDFS-15294. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org