+1 On Sep 6, 2011, at 12:13 AM, Amareshwari Sri Ramadasu wrote:
> + Copying common dev. > > On 9/6/11 10:58 AM, "Mithun Radhakrishnan" <mithun.radhakrish...@yahoo.com> > wrote: > > I'm leaning towards creating a trunk/hadoop-tools/hadoop-distcp (etc.). I'm > hoping that's going to be acceptable to this forum. This way, moving it out > to a separate source tree should be easier. > > It would be nice to have clarity on how tools will be dealt with. It'd be > convenient to distcp in trunk. (It's tiny and useful.) On the other hand, > that might be opening doors to adding too much, and complicating the > build/release. I'd appreciate advice on which way is best. > > In the meantime, I'll align the distcpv2 pom.xml with the maven-ized version > of things, as per Tucu's suggestions. > > Mithun > > > ________________________________ > From: Vinod Kumar Vavilapalli <vino...@hortonworks.com> > To: mapreduce-dev@hadoop.apache.org > Cc: "common-...@hadoop.apache.org" <common-...@hadoop.apache.org>; Mithun > Radhakrishnan <mithun.radhakrish...@yahoo.com> > Sent: Tuesday, August 30, 2011 6:13 PM > Subject: Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23) > > As long as hadoop-tools is in some directory at some depth under trunk, > release of the hadoop-tools is tied to the release of core. > > So we actually have these two options instead: > (1) Separate source tree (http://svn.apache.org/repos/asf/hadoop/tools) > -- Sources at tools/trunk/hadoop-distcp > -- Each tool will work with specific version of Hadoop core. > -- Releases can really be separate > (2) Same source tree: trunk/ > -- Sources at either (1.1) trunk/hadoop-tools or (1.2) > trunk/hadoop-mapreduce-project/hadoop-mr-tools/hadoop-distcp/ > -- Given release isn't decoupled anyway, either will work. (1.2) is > prefereable if building mapreduce builds the tools also. > > +Vinod > > > On Tue, Aug 30, 2011 at 1:31 PM, Amareshwari Sri Ramadasu < > amar...@yahoo-inc.com> wrote: > >> Copying common-dev. >> >> Summarizing the below discussion: What should be the tools layout after >> mavenization? >> >> Option #1: Have hadoop-tools at top level i.e >> trunk/ >> hadoop-tools/ >> hadoop-distcp/ >> Pros: >> Cleaner layout. >> In future, tools could be released separately from Hadoop releases >> >> Cons: Difficult to maintain >> >> Option #2: Keep the tools aggregator module for MapReduce/HDFS/Common if >> they are depending on MapReduce/HDFS/Common respectively. >> For ex: >> hadoop-mapreduce-project/ >> hadoop-mr-tools/ >> hadoop-distcp/ >> >> Pros: Easy to maintain >> Cons: Still has tight coupling with related projects. >> >> Personally, I'm fine with any of the above options. Looking for suggestions >> and reaching a consensus on this. >> >> Thanks >> Amareshwari >> >> On 8/30/11 12:10 AM, "Allen Wittenauer" <a...@apache.org> wrote: >> >> >> >> I have a feeling this discussion should get moved to common-dev or even to >> general. >> >> My #1 question is if tools is basically contrib reborn. If not, what makes >> it different? >> >> On Aug 29, 2011, at 1:43 AM, Amareshwari Sri Ramadasu wrote: >> >>> Some questions on making hadoop-tools top level under trunk, >>> >>> 1. Should the patches for tools be created against Hadoop Common? >>> 2. What will happen to the tools test automation? Will it run as part of >> Hadoop Common tests? >>> 3. Will it introduce a dependency from MapReduce to Common? Or is this >> taken care in Mavenization? >>> >>> >>> Thanks >>> Amareshwari >>> >>> On 8/26/11 10:17 PM, "Alejandro Abdelnur" <t...@cloudera.com> wrote: >>> >>> Please, don't add more Mavenization work on us (eventually I want to go >> back >>> to coding) >>> >>> Given that Hadoop is already Mavenized, the patch should be Mavenized. >>> >>> What will have to be done extra (besides Mavenizing distcp) is to create >> a >>> hadoop-tools module at root level and within it a hadoop-distcp module. >>> >>> The hadoop-tools POM will look pretty much like the hadoop-common-project >>> POM. >>> >>> The hadoop-distcp POM should follow the hadoop-common POM patterns. >>> >>> Thanks. >>> >>> Alejandro >>> >>> On Fri, Aug 26, 2011 at 9:37 AM, Amareshwari Sri Ramadasu < >>> amar...@yahoo-inc.com> wrote: >>> >>>> Agree with Mithun and Robert. DistCp and Tools restructuring are >> separate >>>> tasks. Since DistCp code is ready to be committed, it need not wait for >> the >>>> Tools separation from MR/HDFS. >>>> I would say it can go into contrib as the patch is now, and when the >> tools >>>> restructuring happens it would be just an svn mv. If there are no >> issues >>>> with this proposal I can commit the code tomorrow. >>>> >>>> Thanks >>>> Amareshwari >>>> >>>> On 8/26/11 7:45 PM, "Robert Evans" <ev...@yahoo-inc.com> wrote: >>>> >>>> I agree with Mithun. They are related but this goes beyond distcpv2 and >>>> should not block distcpv2 from going in. It would be very nice, >> however, to >>>> get the layout settled soon so that we all know where to find something >> when >>>> we want to work on it. >>>> >>>> Also +1 for Alejandro's I also prefer to keep tools at the trunk level. >>>> >>>> Even though HDFS, Common, and Mapreduce and perhaps soon tools are >> separate >>>> modules right now, there is still tight coupling between the different >>>> pieces, especially with tests. IMO until we can reduce that coupling we >>>> should treat building and testing Hadoop as a single project instead of >>>> trying to keep them separate. >>>> >>>> --Bobby >>>> >>>> On 8/26/11 7:45 AM, "Mithun Radhakrishnan" < >> mithun.radhakrish...@yahoo.com> >>>> wrote: >>>> >>>> Would it be acceptable if retooling of tools/ were taken up separately? >> It >>>> sounds to me like this might be a distinct (albeit related) task. >>>> >>>> Mithun >>>> >>>> >>>> ________________________________ >>>> From: Giridharan Kesavan <gkesa...@hortonworks.com> >>>> To: mapreduce-dev@hadoop.apache.org >>>> Sent: Friday, August 26, 2011 12:04 PM >>>> Subject: Re: DistCpV2 in 0.23 >>>> >>>> +1 to Alejandro's >>>> >>>> I prefer to keep the hadoop-tools at trunk level. >>>> >>>> -Giri >>>> >>>> On Thu, Aug 25, 2011 at 9:15 PM, Alejandro Abdelnur <t...@cloudera.com> >>>> wrote: >>>>> I'd suggest putting hadoop-tools either at trunk/ level or having a a >>>> tools >>>>> aggregator module for hdfs and other for common. >>>>> >>>>> I personal would prefer at trunk/. >>>>> >>>>> Thanks. >>>>> >>>>> Alejandro >>>>> >>>>> On Thu, Aug 25, 2011 at 9:06 PM, Amareshwari Sri Ramadasu < >>>>> amar...@yahoo-inc.com> wrote: >>>>> >>>>>> Agree. It should be separate maven module (and patch puts it as >> separate >>>>>> maven module now). And top level for hadoop tools is nice to have, but >>>> it >>>>>> becomes hard to maintain until patch automation tests run the tests >>>> under >>>>>> tools. Currently we see many times the changes in HDFS effecting RAID >>>> tests >>>>>> in MapReduce. So, I'm fine putting the tools under hadoop-mapreduce. >>>>>> >>>>>> I propose we can have something like the following: >>>>>> >>>>>> trunk/ >>>>>> - hadoop-mapreduce >>>>>> - hadoop-mr-client >>>>>> - hadoop-yarn >>>>>> - hadoop-tools >>>>>> - hadoop-streaming >>>>>> - hadoop-archives >>>>>> - hadoop-distcp >>>>>> >>>>>> Thoughts? >>>>>> >>>>>> @Eli and @JD, we did not replace old legacy distcp because this is >>>> really a >>>>>> complete rewrite and did not want to remove it until users are >>>> familiarized >>>>>> with new one. >>>>>> >>>>>> On 8/26/11 12:51 AM, "Todd Lipcon" <t...@cloudera.com> wrote: >>>>>> >>>>>> Maybe a separate toplevel for hadoop-tools? Stuff like RAID could go >>>>>> in there as well - ie tools that are downstream of MR and/or HDFS. >>>>>> >>>>>> On Thu, Aug 25, 2011 at 12:09 PM, Mahadev Konar < >>>> maha...@hortonworks.com> >>>>>> wrote: >>>>>>> +1 for a seperate module in hadoop-mapreduce-project. I think >>>>>>> hadoop-mapreduce-client might not be right place for it. We might >> have >>>>>>> to pick a new maven module under hadoop-mapreduce-project that could >>>>>>> host streaming/distcp/hadoop archives. >>>>>>> >>>>>>> thanks >>>>>>> mahadev >>>>>>> >>>>>>> On Thu, Aug 25, 2011 at 11:04 AM, Alejandro Abdelnur < >>>> t...@cloudera.com> >>>>>> wrote: >>>>>>>> Agree, it should be a separate maven module. >>>>>>>> >>>>>>>> And it should be under hadoop-mapreduce-client, right? >>>>>>>> >>>>>>>> And now that we are in the topic, the same should go for streaming, >>>> no? >>>>>>>> >>>>>>>> Thanks. >>>>>>>> >>>>>>>> Alejandro >>>>>>>> >>>>>>>> On Thu, Aug 25, 2011 at 10:58 AM, Todd Lipcon <t...@cloudera.com> >>>>>> wrote: >>>>>>>> >>>>>>>>> On Thu, Aug 25, 2011 at 10:36 AM, Eli Collins <e...@cloudera.com> >>>>>> wrote: >>>>>>>>>> Nice work! I definitely think this should go in 23 and 20x. >>>>>>>>>> >>>>>>>>>> Agree with JD that it should be in the core code, not contrib. If >>>>>>>>>> it's going to be maintained then we should put it in the core >>>> code. >>>>>>>>> >>>>>>>>> Now that we're all mavenized, though, a separate maven module and >>>>>>>>> artifact does make sense IMO - ie "hadoop jar >>>>>>>>> hadoop-distcp-0.23.0-SNAPSHOT" rather than "hadoop distcp" >>>>>>>>> >>>>>>>>> -Todd >>>>>>>>> -- >>>>>>>>> Todd Lipcon >>>>>>>>> Software Engineer, Cloudera >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Todd Lipcon >>>>>> Software Engineer, Cloudera >>>>>> >>>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> -Giri >>>> >>>> >>>> >>> >> >> >> >