[ https://issues.apache.org/jira/browse/ACCUMULO-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13459813#comment-13459813 ]
Keith Turner commented on ACCUMULO-769: --------------------------------------- I am not a big fan of the churn in the API this change would make. However I do agree with the goal of this ticket. This leaves me a bit conflicted on this issue. Passing in a Job removes the possibility that the user will do something wrong. With the new API in 1.4 the user can write the following code that is incorrect. {code:java} @Override public int run(String[] args) throws Exception { Configuration conf = getConf(); Job job = new Job(conf, "Test MR 1"); . . . job.setOutputFormatClass(AccumuloOutputFormat.class); AccumuloOutputFormat.setOutputInfo(conf, "root", "secret".getBytes(), false, "nodes"); //WRONG, modifying conf will not setup the job } {code} Instead the user needs to write the following with the new API {code:java} @Override public int run(String[] args) throws Exception { Configuration conf = getConf(); Job job = new Job(conf, "Test MR 1"); . . . job.setOutputFormatClass(AccumuloOutputFormat.class); AccumuloOutputFormat.setOutputInfo(job.getConfiguration(), "root", "secret".getBytes(), false, "nodes"); //CORRECT, modifying the job conf will setup the job } {code} If the user just passed in the job object they would not have to think about it and understand that job creates a copy of the config they passed. > MapReduce API should not use Configuration to set Job state at submission > time (ambiguous semantics) > ---------------------------------------------------------------------------------------------------- > > Key: ACCUMULO-769 > URL: https://issues.apache.org/jira/browse/ACCUMULO-769 > Project: Accumulo > Issue Type: Bug > Components: client > Affects Versions: 1.4.1, 1.4.0 > Reporter: Christopher Tubbs > Assignee: Billie Rinaldi > Priority: Minor > Fix For: 1.5.0 > > > ACCUMULO-267 made this change, but I think it was the wrong way to go about > it. > From the comments on ACCUMULO-267: > This is the wrong way to go about doing this fix. The reason why it took a > JobContext is so that it could accept a "Job" object. This was modeled after > the pattern Hadoop was using for FileOutputFormat, which is somewhat the > standard for conventions in configuring MR jobs. > While JobContext does specifically state that's what it's purpose is, it is a > base class, and Job extends JobContext, and includes a comment that describes > it as holding the state of the job at submission time. This API should really > be taking a "Job" object, rather than a "JobContext" object. Further, because > Job is the only JobContext that actually works as intended here, the change > from JobContext to Job does not require any deprecation, because Job will > still work, and any other JobContext that isn't a Job will still fail. (We > would have to deprecate the ones that were added in 1.4 that took a > Configuration object, though... because those were never "correct", if we are > going off of the conventions set by Hadoop's provided OutputFormats). > It is somewhat annoying to deprecate something in 1.5 that was added in > 1.4... especially since it allows people to go back to what they were doing > before. But, I think it might be worth it to be consistent with the > established conventions, and to clarify the semantics of the methods (we are, > after all, modifying the state of a job we are about to submit, and not just > an arbitrary configuration, which is used for all sorts of things). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira