[
https://issues.apache.org/jira/browse/HDFS-15346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17127489#comment-17127489
]
Yiqun Lin edited comment on HDFS-15346 at 6/7/20, 4:35 AM:
-----------------------------------------------------------
Some more detailed review comments:
*HdfsConstants.java*
Can we rename DOT_SNAPSHOT_SEPARATOR_DIR to the more readable name
DOT_SNAPSHOT_DIR_SEPARATOR?
*DistCpFedBalance.java*
# It would good to print the fed context that created from input options, so
that we will know final options that we passed in.
{noformat}
+. // --> print fed balancer context
+ // Construct the balance job.
+ BalanceJob.Builder<BalanceProcedure> builder = new
BalanceJob.Builder<>();
+ DistCpProcedure dcp =
+ new DistCpProcedure(DISTCP_PROCEDURE, null, delayDuration, context);
+ builder.nextProcedure(dcp);
{noformat}
# We can replace this system out in LOG instance,
{noformat}
+ for (BalanceJob job : jobs) {
+ if (!job.isJobDone()) {
+ unfinished++;
+ }
+ System.out.println(job);
+ }
{noformat}
*DistCpProcedure.java*
# The message in IOException(src + " doesn't exist.") not correctly described,
should be 'src + " should be the directory."'
# For each stage change, can we add aN additional output log, like this:
{noformat}
+ if (srcFs.exists(new Path(src, HdfsConstants.DOT_SNAPSHOT_DIR))) {
+ throw new IOException(src + " shouldn't enable snapshot.");
+ }
LOG.info("Stage updated from {} to {}.", stage.name(),
Stage.INIT_DISTCP.name())
+ stage = Stage.INIT_DISTCP;
+ }
{noformat}
# Here we reset permission to 0, that means no any operation is allowed? Is
this expected, why not is 400 (only allow read)? The comment said that
'cancelling the x permission of the source path.' makes me confused.
{noformat}
srcFs.setPermission(src, FsPermission.createImmutable((short) 0));
{noformat}
# I prefer to throw IOException rather than doing delete operation in
cleanUpBeforeInitDistcp. cleanUpBeforeInitDistcp is expected to be the final
pre-check function before submitting ditcp job. And let admin users to check
and do delete operation manually by themself.
{noformat}
+ private void initialCheckBeforeInitDistcp() throws IOException {
+ if (dstFs.exists(dst)) {
+ throw IOException(....);
+ }
+ srcFs.allowSnapshot(src);
+ if (srcFs.exists(new Path(src,
+ HdfsConstants.DOT_SNAPSHOT_SEPARATOR_DIR + CURRENT_SNAPSHOT_NAME))) {
throw IOException(....);
+ }
{noformat}
*FedBalanceConfigs.java*
Can we move all keys from BalanceProcedureConfigKeys to this class? We don't
need two duplicated Config class. One follow-up task I am thinking that we can
have a separated config file something named fedbalance-default.xml for
fedbalance tool, like ditcp-default.xml for distcp tool now. I don't prefer to
add all tool config settings into hdfs-default.xml.
*FedBalanceContext.java*
Override the toString method in FedBalanceContext to help us know the input
options that actually be used.
*MountTableProcedure.java*
The for loop can just break once we find the first source path that matched.
{noformat}
+ for (MountTable result : results) {
+ if (mount.equals(result.getSourcePath())) {
+ existingEntry = result;
break;
+ }
+ }
{noformat}
*TrashProcedure.java*
{noformat}
+ /**
+ * Delete source path to trash.
+ */
+ void moveToTrash() throws IOException {
+ Path src = context.getSrc();
+ if (srcFs.exists(src)) {
+ switch (context.getTrashOpt()) {
+ case TRASH:
+ conf.setFloat(FS_TRASH_INTERVAL_KEY, 1);
+ if (!Trash.moveToAppropriateTrash(srcFs, src, conf)) {
+ throw new IOException("Failed move " + src + " to trash.");
+ }
+ break;
+ case DELETE:
+ if (!srcFs.delete(src, true)) {
+ throw new IOException("Failed delete " + src);
+ }
+ LOG.info("{} is deleted.", src);
+ break;
+ default:
+ break;
+ }
+ }
+ }
{noformat}
For above lines, two review comments:
# Can we add SKIP option check as well and throw unexpected option error?
{noformat}
case SKIP:
break;
+ default:
+ throw new IOException("Unexpected trash option=" +
context.getTrashOpt());
+ }
{noformat}
# FS_TRASH_INTERVAL_KEY defined with 1 is too small, that means we the trash
will be deleted after 1 minute. Can you increased this to 60? Also please add
necessary comment in trash option description to say the default trash behavior
when trash is disabled in server side and client side value will be used.
was (Author: linyiqun):
Some more detailed review comments:
*HdfsConstants.java*
Can we rename DOT_SNAPSHOT_SEPARATOR_DIR to the more readable name
DOT_SNAPSHOT_DIR_SEPARATOR?
*DistCpFedBalance.java*
# It would good to print the fed context that created from input options, so
that we will know final options that we passed in.
{noformat}
+. // --> print fed balancer context
+ // Construct the balance job.
+ BalanceJob.Builder<BalanceProcedure> builder = new
BalanceJob.Builder<>();
+ DistCpProcedure dcp =
+ new DistCpProcedure(DISTCP_PROCEDURE, null, delayDuration, context);
+ builder.nextProcedure(dcp);
{noformat}
# We can replace this system out in LOG instance,
{noformat}
+ for (BalanceJob job : jobs) {
+ if (!job.isJobDone()) {
+ unfinished++;
+ }
+ System.out.println(job);
+ }
{noformat}
*DistCpProcedure.java*
# The message in IOException(src + " doesn't exist.") not correctly described,
should be 'src + " should be the directory."'
# For each stage change, can we add aN additional output log, like this:
{noformat}
+ if (srcFs.exists(new Path(src, HdfsConstants.DOT_SNAPSHOT_DIR))) {
+ throw new IOException(src + " shouldn't enable snapshot.");
+ }
LOG.info("Stage updated from {} to {}.", stage.name(),
Stage.INIT_DISTCP.name())
+ stage = Stage.INIT_DISTCP;
+ }
{noformat}
# Here we reset permission to 0, that means no any operation is allowed? Is
this expected, why not is 400 (only allow read)? The comment said that
'cancelling the x permission of the source path.' makes me confused.
{noformat}
srcFs.setPermission(src, FsPermission.createImmutable((short) 0));
{noformat}
# I prefer to throw IOException rather than doing delete operation in
cleanUpBeforeInitDistcp. cleanUpBeforeInitDistcp is expected to be the final
pre-check function before submit ditcp job.
{noformat}
+ private void initialCheckBeforeInitDistcp() throws IOException {
+ if (dstFs.exists(dst)) {
+ throw IOException(....);
+ }
+ srcFs.allowSnapshot(src);
+ if (srcFs.exists(new Path(src,
+ HdfsConstants.DOT_SNAPSHOT_SEPARATOR_DIR + CURRENT_SNAPSHOT_NAME))) {
throw IOException(....);
+ }
{noformat}
*FedBalanceConfigs.java*
Can we move all keys from BalanceProcedureConfigKeys to this class? We don't
need two duplicated Config class. One follow-up task I am thinking that we can
have a separated config file something named fedbalance-default.xml for
fedbalance tool, like ditcp-default.xml for distcp tool now. I don't prefer to
add all tool config settings into hdfs-default.xml.
*FedBalanceContext.java*
Override the toString method in FedBalanceContext to help us know the input
options that actually be used.
*MountTableProcedure.java*
The for loop can just break once we find the first source path that matched.
{noformat}
+ for (MountTable result : results) {
+ if (mount.equals(result.getSourcePath())) {
+ existingEntry = result;
break;
+ }
+ }
{noformat}
*TrashProcedure.java*
{noformat}
+ /**
+ * Delete source path to trash.
+ */
+ void moveToTrash() throws IOException {
+ Path src = context.getSrc();
+ if (srcFs.exists(src)) {
+ switch (context.getTrashOpt()) {
+ case TRASH:
+ conf.setFloat(FS_TRASH_INTERVAL_KEY, 1);
+ if (!Trash.moveToAppropriateTrash(srcFs, src, conf)) {
+ throw new IOException("Failed move " + src + " to trash.");
+ }
+ break;
+ case DELETE:
+ if (!srcFs.delete(src, true)) {
+ throw new IOException("Failed delete " + src);
+ }
+ LOG.info("{} is deleted.", src);
+ break;
+ default:
+ break;
+ }
+ }
+ }
{noformat}
For above lines, two review comments:
# Can we add SKIP option check as well and throw unexpected option error?
{noformat}
case SKIP:
break;
+ default:
+ throw new IOException("Unexpected trash option=" +
context.getTrashOpt());
+ }
{noformat}
# FS_TRASH_INTERVAL_KEY defined with 1 is too small, that means we the trash
will be deleted after 1 minute. Can you increased this to 60? Also please add
necessary comment in trash option description to say the default trash behavior
when trash is disabled in server side and client side value will be used.
> RBF: DistCpFedBalance implementation
> ------------------------------------
>
> Key: HDFS-15346
> URL: https://issues.apache.org/jira/browse/HDFS-15346
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Jinglun
> Assignee: Jinglun
> Priority: Major
> Attachments: HDFS-15346.001.patch, HDFS-15346.002.patch,
> HDFS-15346.003.patch, HDFS-15346.004.patch, HDFS-15346.005.patch,
> HDFS-15346.006.patch, HDFS-15346.007.patch
>
>
> Patch in HDFS-15294 is too big to review so we split it into 2 patches. This
> is the second one. Detail can be found at HDFS-15294.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]