On May 28, 2010, at 12:16 PM, Drew Farris wrote: > -u...@m.a.org +...@m.a.org > > It might be nice to add a few default flags to AbstractJob that map directly > to -D arguments in hadoop, for example, I could see having -i map to > -Dmapred.input.dir, -o to -Dmapred.output.dir, -nr -Dmapred.num.reducers > etc.. I think it is great to be able to accept arbitrary -D arguments but it > would be nice to accept shorthand which also gets displayed in -h output. >
+1. Think of the users... Plus, we have a lot of docs already that use this. > The -D options don't get included in -h and as a result it is unclear just > how to specify input or output to someone who might not be too familliar > with hadoop conventions. Besides, the Hadoop conventions are cumbersome. Just b/c they do something in a non-obvious way doesn't mean we need to. To some extent, as Hadoop gets easier to use, there is no reason why anyone need even know we are using Hadoop. I don't think we should tie our public interfaces (and the CLI is our primary public interface) to Hadoop. > > From the API perspective, AbstractJob could provide no-arg methods like > AbstractJob.buildInputOption() etc, where the class using the AbstractJob > api need not be concerned with the precise letters, parameters, description > required for the option. > > Tangentially related, I was wondering something about AbstractJob: With the > advent of the parsedArgs map returned by AbstractJob.parseArguments is there > a need to pass Option arguments around anymore? Could AbstractJob maintain > Options state in a sense? > > For example, from RecommenderJob: > > Option numReccomendationsOpt = > AbstractJob.buildOption("numRecommendations", "n", > "Number of recommendations per user", "10"); > Option usersFileOpt = AbstractJob.buildOption("usersFile", "u", > "File of users to recommend for", null); > Option booleanDataOpt = AbstractJob.buildOption("booleanData", "b", > "Treat input as without pref values", Boolean.FALSE.toString()); > > Map<String,String> parsedArgs = AbstractJob.parseArguments( > args, numReccomendationsOpt, usersFileOpt, booleanDataOpt); > if (parsedArgs == null) { > return -1; > } > > Could be changed to something like: > > buildOption("numRecommendations", "n", "Number of recommendations per user", > "10"); > buildOption("usersFile", "u", "File of users to recommend for", null); > buildOption("booleanData", "b", "Treat input as without pref values", > Boolean.FALSE.toString()); > Map<String,String> parsedArgs = parseArguments(); > > Providing a set of input validators that check the input before launching a > job sounds like a pretty cool idea too. Seems nice to me. > > On Fri, May 28, 2010 at 10:55 AM, Sean Owen <sro...@gmail.com> wrote: > >> Does it help to note this is Hadoop's flag? It seemed more standard >> therefore, possibly more intuitive for some already using Hadoop. We were >> starting to reinvent many flags this way so seemed better to not thunk them >> with no gain >> >> On May 28, 2010 6:06 AM, "Grant Ingersoll" <gsing...@apache.org> wrote: >> >> I just saw that too, and it seems like a loss to me. We did a lot of work >> to be consistent on this and have a lot of documentation out there with it >> in it. -Dmapred.input.dir is so much less intuitive than -i or --input. >> >> -Grant >> >> >> On May 27, 2010, at 9:04 PM, Jake Mannix wrote: >> >>> Is that right? I think the mahout shell script ... >>