I have a feeling this discussion should get moved to common-dev or even to general.
My #1 question is if tools is basically contrib reborn. If not, what makes it different? On Aug 29, 2011, at 1:43 AM, Amareshwari Sri Ramadasu wrote: > Some questions on making hadoop-tools top level under trunk, > > 1. Should the patches for tools be created against Hadoop Common? > 2. What will happen to the tools test automation? Will it run as part of > Hadoop Common tests? > 3. Will it introduce a dependency from MapReduce to Common? Or is this taken > care in Mavenization? > > > Thanks > Amareshwari > > On 8/26/11 10:17 PM, "Alejandro Abdelnur" <t...@cloudera.com> wrote: > > Please, don't add more Mavenization work on us (eventually I want to go back > to coding) > > Given that Hadoop is already Mavenized, the patch should be Mavenized. > > What will have to be done extra (besides Mavenizing distcp) is to create a > hadoop-tools module at root level and within it a hadoop-distcp module. > > The hadoop-tools POM will look pretty much like the hadoop-common-project > POM. > > The hadoop-distcp POM should follow the hadoop-common POM patterns. > > Thanks. > > Alejandro > > On Fri, Aug 26, 2011 at 9:37 AM, Amareshwari Sri Ramadasu < > amar...@yahoo-inc.com> wrote: > >> Agree with Mithun and Robert. DistCp and Tools restructuring are separate >> tasks. Since DistCp code is ready to be committed, it need not wait for the >> Tools separation from MR/HDFS. >> I would say it can go into contrib as the patch is now, and when the tools >> restructuring happens it would be just an svn mv. If there are no issues >> with this proposal I can commit the code tomorrow. >> >> Thanks >> Amareshwari >> >> On 8/26/11 7:45 PM, "Robert Evans" <ev...@yahoo-inc.com> wrote: >> >> I agree with Mithun. They are related but this goes beyond distcpv2 and >> should not block distcpv2 from going in. It would be very nice, however, to >> get the layout settled soon so that we all know where to find something when >> we want to work on it. >> >> Also +1 for Alejandro's I also prefer to keep tools at the trunk level. >> >> Even though HDFS, Common, and Mapreduce and perhaps soon tools are separate >> modules right now, there is still tight coupling between the different >> pieces, especially with tests. IMO until we can reduce that coupling we >> should treat building and testing Hadoop as a single project instead of >> trying to keep them separate. >> >> --Bobby >> >> On 8/26/11 7:45 AM, "Mithun Radhakrishnan" <mithun.radhakrish...@yahoo.com> >> wrote: >> >> Would it be acceptable if retooling of tools/ were taken up separately? It >> sounds to me like this might be a distinct (albeit related) task. >> >> Mithun >> >> >> ________________________________ >> From: Giridharan Kesavan <gkesa...@hortonworks.com> >> To: mapreduce-dev@hadoop.apache.org >> Sent: Friday, August 26, 2011 12:04 PM >> Subject: Re: DistCpV2 in 0.23 >> >> +1 to Alejandro's >> >> I prefer to keep the hadoop-tools at trunk level. >> >> -Giri >> >> On Thu, Aug 25, 2011 at 9:15 PM, Alejandro Abdelnur <t...@cloudera.com> >> wrote: >>> I'd suggest putting hadoop-tools either at trunk/ level or having a a >> tools >>> aggregator module for hdfs and other for common. >>> >>> I personal would prefer at trunk/. >>> >>> Thanks. >>> >>> Alejandro >>> >>> On Thu, Aug 25, 2011 at 9:06 PM, Amareshwari Sri Ramadasu < >>> amar...@yahoo-inc.com> wrote: >>> >>>> Agree. It should be separate maven module (and patch puts it as separate >>>> maven module now). And top level for hadoop tools is nice to have, but >> it >>>> becomes hard to maintain until patch automation tests run the tests >> under >>>> tools. Currently we see many times the changes in HDFS effecting RAID >> tests >>>> in MapReduce. So, I'm fine putting the tools under hadoop-mapreduce. >>>> >>>> I propose we can have something like the following: >>>> >>>> trunk/ >>>> - hadoop-mapreduce >>>> - hadoop-mr-client >>>> - hadoop-yarn >>>> - hadoop-tools >>>> - hadoop-streaming >>>> - hadoop-archives >>>> - hadoop-distcp >>>> >>>> Thoughts? >>>> >>>> @Eli and @JD, we did not replace old legacy distcp because this is >> really a >>>> complete rewrite and did not want to remove it until users are >> familiarized >>>> with new one. >>>> >>>> On 8/26/11 12:51 AM, "Todd Lipcon" <t...@cloudera.com> wrote: >>>> >>>> Maybe a separate toplevel for hadoop-tools? Stuff like RAID could go >>>> in there as well - ie tools that are downstream of MR and/or HDFS. >>>> >>>> On Thu, Aug 25, 2011 at 12:09 PM, Mahadev Konar < >> maha...@hortonworks.com> >>>> wrote: >>>>> +1 for a seperate module in hadoop-mapreduce-project. I think >>>>> hadoop-mapreduce-client might not be right place for it. We might have >>>>> to pick a new maven module under hadoop-mapreduce-project that could >>>>> host streaming/distcp/hadoop archives. >>>>> >>>>> thanks >>>>> mahadev >>>>> >>>>> On Thu, Aug 25, 2011 at 11:04 AM, Alejandro Abdelnur < >> t...@cloudera.com> >>>> wrote: >>>>>> Agree, it should be a separate maven module. >>>>>> >>>>>> And it should be under hadoop-mapreduce-client, right? >>>>>> >>>>>> And now that we are in the topic, the same should go for streaming, >> no? >>>>>> >>>>>> Thanks. >>>>>> >>>>>> Alejandro >>>>>> >>>>>> On Thu, Aug 25, 2011 at 10:58 AM, Todd Lipcon <t...@cloudera.com> >>>> wrote: >>>>>> >>>>>>> On Thu, Aug 25, 2011 at 10:36 AM, Eli Collins <e...@cloudera.com> >>>> wrote: >>>>>>>> Nice work! I definitely think this should go in 23 and 20x. >>>>>>>> >>>>>>>> Agree with JD that it should be in the core code, not contrib. If >>>>>>>> it's going to be maintained then we should put it in the core >> code. >>>>>>> >>>>>>> Now that we're all mavenized, though, a separate maven module and >>>>>>> artifact does make sense IMO - ie "hadoop jar >>>>>>> hadoop-distcp-0.23.0-SNAPSHOT" rather than "hadoop distcp" >>>>>>> >>>>>>> -Todd >>>>>>> -- >>>>>>> Todd Lipcon >>>>>>> Software Engineer, Cloudera >>>>>>> >>>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> Todd Lipcon >>>> Software Engineer, Cloudera >>>> >>>> >>> >> >> >> >> -- >> -Giri >> >> >> >