+1 for separate hadoop-tools module. However, if a tool is broken at release time, and no one comes forward to fix it, it should be removed. (i.e. Unlike contrib modules, where build and test failures were tolerated.)
- milind On 9/7/11 11:27 AM, "Mahadev Konar" <maha...@hortonworks.com> wrote: >I like the idea of having tools as a seperate module and I dont think >that it will be a dumping ground unless we choose to make one of it. > >+1 for hadoop tools module under trunk. > >thanks >mahadev > >On Wed, Sep 7, 2011 at 11:18 AM, Alejandro Abdelnur <t...@cloudera.com> >wrote: >> Agreed, we should not have a dumping ground. IMO, what it would go into >> hadoop-tools (i.e. distcp, streaming and someone could argue for >>FsShell as >> well) are effectively hadoop CLI utilities. Having them in a separate >>module >> rather in than in the core module (common, hdfs, mapreduce) does not >>mean >> that they are secondary things, just modularization. Also it will help >>to >> get those tools to use public interfaces of the core module, and when we >> finally have a clean hadoop-client layer, those tools should only >>depend on >> that. >> >> Finally, the fact that tools would end up under trunk/hadoop-tools, it >>does >> not prevent that the packaging from HDFS and MAPREDUCE to bundle the >> same/different tools >> >> +1 for hadoop-tools/ (not binding) >> >> Thanks. >> >> >> On Wed, Sep 7, 2011 at 10:50 AM, Eric Yang <eric...@gmail.com> wrote: >> >>> Mapreduce and HDFS are distinct function of Hadoop. They are loosely >>> coupled. If we have tools aggregator module, it will not have as >>> clear distinct function as other Hadoop modules. Hence, it is >>> possible for a tool to be depend on both HDFS and map reduce. If >>> something broke in tools module, it is unclear which subproject's >>> responsibility to maintain tools function. Therefore, it is safer to >>> send tools to incubator or apache extra rather than deposit the >>> utility tools in tools subcategory. There are many short lived >>> projects that attempts to associate themselves with Hadoop but not >>> being maintained. It would be better to spin off those utility >>> projects than use Hadoop as a dumping ground. >>> >>> The previous discussion for removing contrib, most people were in >>> favor of doing so, and only a few contrib owners were reluctant to >>> remove contrib. Fewer people has participated in restore >>> functionality of broken contrib projects. History speaks for itself. >>> -1 (non-binding) for hadoop-tools. >>> >>> regards, >>> Eric >>> >>> On Tue, Sep 6, 2011 at 6:55 PM, Alejandro Abdelnur <t...@cloudera.com> >>> wrote: >>> > Eric, >>> > >>> > Personally I'm fine either way. >>> > >>> > Still, I fail to see why a generic/categorized tools increase/reduce >>>the >>> > risk of dead code and how they make more-difficult/easier the >>> > package&deployment. >>> > >>> > Would you please explain this? >>> > >>> > Thanks. >>> > >>> > Alejandro >>> > >>> > On Tue, Sep 6, 2011 at 6:38 PM, Eric Yang <eric...@gmail.com> wrote: >>> > >>> >> Option #2 proposed by Amareshwari, seems like a better proposal. We >>> don't >>> >> want to repeat history for contrib again with hadoop-tools. Having >>>a >>> >> generic module like hadoop-tools increases the risk of accumulate >>>dead >>> code. >>> >> It would be better to categorize the hdfs or mapreduce specific >>>tools >>> in >>> >> their respected subcategories. It is also easier to manage from >>> >> package/deployment prospective. >>> >> >>> >> regards, >>> >> Eric >>> >> >>> >> On Sep 6, 2011, at 4:32 PM, Eli Collins wrote: >>> >> >>> >> > On Tue, Sep 6, 2011 at 10:11 AM, Allen Wittenauer <a...@apache.org> >>> wrote: >>> >> >> >>> >> >> On Sep 6, 2011, at 9:30 AM, Vinod Kumar Vavilapalli wrote: >>> >> >>> We still need to answer Amareshwari's question (2) she asked >>>some >>> time >>> >> back >>> >> >>> about the automated code compilation and test execution of the >>>tools >>> >> module. >>> >> >> >>> >> >> >>> >> >> >>> >> >>>>> My #1 question is if tools is basically contrib reborn. If >>>not, >>> what >>> >> >>>> makes >>> >> >>>>> it different? >>> >> >> >>> >> >> >>> >> >> I'm still waiting for this answer as well. >>> >> >> >>> >> >> Until such, I would be pretty much against a tools module. >>> >> Changing the name of the dumping ground doesn't make it any less >>>of a >>> >> dumping ground. >>> >> > >>> >> > IMO if the tools module only gets stuff like distcp that's >>>maintained >>> >> > then it's not contrib, if it contains all the stuff from the >>>current >>> >> > MR contrib then tools is just a re-labeling of contrib. Given that >>> >> > this proposal only covers moving distcp to tools it doesn't sound >>>like >>> >> > contrib to me. >>> >> > >>> >> > Thanks, >>> >> > Eli >>> >> >>> >> >>> > >>> >> >