On May 28, 2010, at 12:16 PM, Drew Farris wrote:

> -u...@m.a.org +...@m.a.org
> 
> It might be nice to add a few default flags to AbstractJob that map directly
> to -D arguments in hadoop, for example, I could see having -i map to
> -Dmapred.input.dir, -o to -Dmapred.output.dir, -nr -Dmapred.num.reducers
> etc.. I think it is great to be able to accept arbitrary -D arguments but it
> would be nice to accept shorthand which also gets displayed in -h output.
> 

+1.  Think of the users...  Plus, we have a lot of docs already that use this.

> The -D options don't get included in -h and as a result it is unclear just
> how to specify input or output to someone who might not be too familliar
> with hadoop conventions.

Besides, the Hadoop conventions are cumbersome.  Just b/c they do something in 
a non-obvious way doesn't mean we need to.

To some extent, as Hadoop gets easier to use, there is no reason why anyone 
need even know we are using Hadoop.  I don't think
we should tie our public interfaces (and the CLI is our primary public 
interface) to Hadoop.

> 
> From the API perspective, AbstractJob could provide no-arg methods like
> AbstractJob.buildInputOption() etc, where the class using the AbstractJob
> api need not be concerned with the precise letters, parameters, description
> required for the option.
> 
> Tangentially related, I was wondering something about AbstractJob: With the
> advent of the parsedArgs map returned by AbstractJob.parseArguments is there
> a need to pass Option arguments around anymore? Could AbstractJob maintain
> Options state in a sense?
> 
> For example, from RecommenderJob:
> 
>    Option numReccomendationsOpt =
> AbstractJob.buildOption("numRecommendations", "n",
>      "Number of recommendations per user", "10");
>    Option usersFileOpt = AbstractJob.buildOption("usersFile", "u",
>      "File of users to recommend for", null);
>    Option booleanDataOpt = AbstractJob.buildOption("booleanData", "b",
>      "Treat input as without pref values", Boolean.FALSE.toString());
> 
>    Map<String,String> parsedArgs = AbstractJob.parseArguments(
>        args, numReccomendationsOpt, usersFileOpt, booleanDataOpt);
>    if (parsedArgs == null) {
>      return -1;
>    }
> 
> Could be changed to something like:
> 
> buildOption("numRecommendations", "n", "Number of recommendations per user",
> "10");
> buildOption("usersFile", "u", "File of users to recommend for", null);
> buildOption("booleanData", "b", "Treat input as without pref values",
> Boolean.FALSE.toString());
> Map<String,String> parsedArgs = parseArguments();
> 
> Providing a set of input validators that check the input before launching a
> job sounds like a pretty cool idea too.

Seems nice to me.

> 
> On Fri, May 28, 2010 at 10:55 AM, Sean Owen <sro...@gmail.com> wrote:
> 
>> Does it help to note this is Hadoop's flag? It seemed more standard
>> therefore,  possibly more intuitive for some already using Hadoop. We were
>> starting to reinvent many flags this way so seemed better to not thunk them
>> with no gain
>> 
>> On May 28, 2010 6:06 AM, "Grant Ingersoll" <gsing...@apache.org> wrote:
>> 
>> I just saw that too, and it seems like a loss to me.  We did a lot of work
>> to be consistent on this and have a lot of documentation out there with it
>> in it.  -Dmapred.input.dir is so much less intuitive than -i or --input.
>> 
>> -Grant
>> 
>> 
>> On May 27, 2010, at 9:04 PM, Jake Mannix wrote:
>> 
>>> Is that right? I think the mahout shell script ...
>> 


Reply via email to