[ 
https://issues.apache.org/jira/browse/ACCUMULO-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13459813#comment-13459813
 ] 

Keith Turner commented on ACCUMULO-769:
---------------------------------------

I am not a big fan of the churn in the API this change would make.  However I 
do agree with the goal of this ticket.  This leaves me a bit conflicted on this 
issue.  Passing in a Job removes the possibility that the user will do 
something wrong.  With the new API in 1.4 the user can write the following code 
that is incorrect.  

{code:java}
  @Override
  public int run(String[] args) throws Exception {

    Configuration conf = getConf();
    Job job = new Job(conf, "Test MR 1");
      .
      .
      .
    job.setOutputFormatClass(AccumuloOutputFormat.class);
    AccumuloOutputFormat.setOutputInfo(conf, "root", "secret".getBytes(), 
false, "nodes");  //WRONG, modifying conf will not setup the job
  }
{code}

Instead the user needs to write the following with the new API

{code:java}
  @Override
  public int run(String[] args) throws Exception {

    Configuration conf = getConf();
    Job job = new Job(conf, "Test MR 1");
      .
      .
      .
    job.setOutputFormatClass(AccumuloOutputFormat.class);
    AccumuloOutputFormat.setOutputInfo(job.getConfiguration(), "root", 
"secret".getBytes(), false, "nodes");  //CORRECT, modifying the job conf will 
setup the job
  }
{code}

If the user just passed in the job object they would not have to think about it 
and understand that job creates a copy of the config they passed.    


                
> MapReduce API should not use Configuration to set Job state at submission 
> time (ambiguous semantics)
> ----------------------------------------------------------------------------------------------------
>
>                 Key: ACCUMULO-769
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-769
>             Project: Accumulo
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 1.4.1, 1.4.0
>            Reporter: Christopher Tubbs
>            Assignee: Billie Rinaldi
>            Priority: Minor
>             Fix For: 1.5.0
>
>
> ACCUMULO-267 made this change, but I think it was the wrong way to go about 
> it.
> From the comments on ACCUMULO-267:
> This is the wrong way to go about doing this fix. The reason why it took a 
> JobContext is so that it could accept a "Job" object. This was modeled after 
> the pattern Hadoop was using for FileOutputFormat, which is somewhat the 
> standard for conventions in configuring MR jobs.
> While JobContext does specifically state that's what it's purpose is, it is a 
> base class, and Job extends JobContext, and includes a comment that describes 
> it as holding the state of the job at submission time. This API should really 
> be taking a "Job" object, rather than a "JobContext" object. Further, because 
> Job is the only JobContext that actually works as intended here, the change 
> from JobContext to Job does not require any deprecation, because Job will 
> still work, and any other JobContext that isn't a Job will still fail. (We 
> would have to deprecate the ones that were added in 1.4 that took a 
> Configuration object, though... because those were never "correct", if we are 
> going off of the conventions set by Hadoop's provided OutputFormats).
> It is somewhat annoying to deprecate something in 1.5 that was added in 
> 1.4... especially since it allows people to go back to what they were doing 
> before. But, I think it might be worth it to be consistent with the 
> established conventions, and to clarify the semantics of the methods (we are, 
> after all, modifying the state of a job we are about to submit, and not just 
> an arbitrary configuration, which is used for all sorts of things).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to