[ 
https://issues.apache.org/jira/browse/HDFS-10533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15332629#comment-15332629
 ] 

Jing Zhao commented on HDFS-10533:
----------------------------------

Thanks for working on this, [~liuml07]. And thanks for the comment, [~cnauroth].

I agree with Chris that we should try our best to keep the compatibility. Also 
+1 to add interface audience and stability annotations to DistCp related 
classes.

In the meanwhile, I think it's necessary to make the proposed change in trunk, 
even if it breaks the compatibility. The current creation/modification pattern 
of DistCpOptions may lead to bugs that are not easy to detect and can cause 
serious data loss. E.g., currently Falcon creates a DistCpOptions instance and 
directly use its setter APIs to set options like "useDiff" and "deleteMissing". 
If these two options are set together and {{validate}} is not called to check, 
all the files/directories that are not included in the diff report will be 
wrongly deleted from the target cluster. 

Looks like the current DistCp/DistcpOptions classes were developed to be a tool 
initially. Thus they sometimes do not provide good API design and usage 
pattern. However, with more options are added into DistCp, and more downstream 
projects start to use it as a library, it becomes urgent and critical for us to 
define clean APIs for distcp to avoid issues like the above example. And maybe 
the proposed change on DistCpOptions can be a good start.

> Make DistCpOptions class immutable
> ----------------------------------
>
>                 Key: HDFS-10533
>                 URL: https://issues.apache.org/jira/browse/HDFS-10533
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: distcp
>            Reporter: Mingliang Liu
>            Assignee: Mingliang Liu
>
> Currently the {{DistCpOptions}} class encapsulates all DistCp options, which 
> may be set from command-line (via the {{OptionsParser}}) or may be set 
> manually (eg construct an instance and call setters). As there are multiple 
> option fields and more (e.g. [HDFS-9868], [HDFS-10314]) to add, validating 
> them can be cumbersome. Ideally, the {{DistCpOptions}} object should be 
> immutable. The benefits are:
> # {{DistCpOptions}} is simple and easier to use and share, plus it scales well
> # validation is automatic, e.g. manually constructed {{DistCpOptions}} gets 
> validated before usage
> # validation error message is well-defined which does not depend on the order 
> of setters
> This jira is to track the effort of making the {{DistCpOptions}} immutable by 
> using a Builder pattern for creation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to