+1

On Sep 6, 2011, at 12:13 AM, Amareshwari Sri Ramadasu wrote:

> + Copying common dev.
> 
> On 9/6/11 10:58 AM, "Mithun Radhakrishnan" <mithun.radhakrish...@yahoo.com> 
> wrote:
> 
> I'm leaning towards creating a trunk/hadoop-tools/hadoop-distcp (etc.). I'm 
> hoping that's going to be acceptable to this forum. This way, moving it out 
> to a separate source tree should be easier.
> 
> It would be nice to have clarity on how tools will be dealt with. It'd be 
> convenient to distcp in trunk. (It's tiny and useful.) On the other hand, 
> that might be opening doors to adding too much, and complicating the 
> build/release. I'd appreciate advice on which way is best.
> 
> In the meantime, I'll align the distcpv2 pom.xml with the maven-ized version 
> of things, as per Tucu's suggestions.
> 
> Mithun
> 
> 
> ________________________________
> From: Vinod Kumar Vavilapalli <vino...@hortonworks.com>
> To: mapreduce-dev@hadoop.apache.org
> Cc: "common-...@hadoop.apache.org" <common-...@hadoop.apache.org>; Mithun 
> Radhakrishnan <mithun.radhakrish...@yahoo.com>
> Sent: Tuesday, August 30, 2011 6:13 PM
> Subject: Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)
> 
> As long as hadoop-tools is in some directory at some depth under trunk,
> release of the hadoop-tools is tied to the release of core.
> 
> So we actually have these two options instead:
> (1) Separate source tree (http://svn.apache.org/repos/asf/hadoop/tools)
>    -- Sources at tools/trunk/hadoop-distcp
>    -- Each tool will work with specific version of Hadoop core.
>    -- Releases can really be separate
> (2) Same source tree: trunk/
>    -- Sources at either (1.1) trunk/hadoop-tools or (1.2)
> trunk/hadoop-mapreduce-project/hadoop-mr-tools/hadoop-distcp/
>    -- Given release isn't decoupled anyway, either will work. (1.2) is
> prefereable if building mapreduce builds the tools also.
> 
> +Vinod
> 
> 
> On Tue, Aug 30, 2011 at 1:31 PM, Amareshwari Sri Ramadasu <
> amar...@yahoo-inc.com> wrote:
> 
>> Copying common-dev.
>> 
>> Summarizing the below discussion: What should be the tools layout after
>> mavenization?
>> 
>> Option #1: Have hadoop-tools at top level i.e
>> trunk/
>>  hadoop-tools/
>>      hadoop-distcp/
>> Pros:
>> Cleaner layout.
>> In future, tools could be released separately from  Hadoop releases
>> 
>> Cons: Difficult to maintain
>> 
>> Option #2: Keep the tools aggregator module for MapReduce/HDFS/Common if
>> they are depending on MapReduce/HDFS/Common respectively.
>> For ex:
>> hadoop-mapreduce-project/
>>  hadoop-mr-tools/
>>     hadoop-distcp/
>> 
>> Pros: Easy to maintain
>> Cons: Still has tight coupling with related projects.
>> 
>> Personally, I'm fine with any of the above options. Looking for suggestions
>> and reaching a consensus on this.
>> 
>> Thanks
>> Amareshwari
>> 
>> On 8/30/11 12:10 AM, "Allen Wittenauer" <a...@apache.org> wrote:
>> 
>> 
>> 
>> I have a feeling this discussion should get moved to common-dev or even to
>> general.
>> 
>> My #1 question is if tools is basically contrib reborn.  If not, what makes
>> it different?
>> 
>> On Aug 29, 2011, at 1:43 AM, Amareshwari Sri Ramadasu wrote:
>> 
>>> Some questions on making hadoop-tools top level under trunk,
>>> 
>>> 1.  Should the patches for tools be created against Hadoop Common?
>>> 2.  What will happen to the tools test automation? Will it run as part of
>> Hadoop Common tests?
>>> 3.  Will it introduce a dependency from MapReduce to Common? Or is this
>> taken care in Mavenization?
>>> 
>>> 
>>> Thanks
>>> Amareshwari
>>> 
>>> On 8/26/11 10:17 PM, "Alejandro Abdelnur" <t...@cloudera.com> wrote:
>>> 
>>> Please, don't add more Mavenization work on us (eventually I want to go
>> back
>>> to coding)
>>> 
>>> Given that Hadoop is already Mavenized, the patch should be Mavenized.
>>> 
>>> What will have to be done extra (besides Mavenizing distcp) is to create
>> a
>>> hadoop-tools module at root level and within it a hadoop-distcp module.
>>> 
>>> The hadoop-tools POM will look pretty much like the hadoop-common-project
>>> POM.
>>> 
>>> The hadoop-distcp POM should follow the hadoop-common POM patterns.
>>> 
>>> Thanks.
>>> 
>>> Alejandro
>>> 
>>> On Fri, Aug 26, 2011 at 9:37 AM, Amareshwari Sri Ramadasu <
>>> amar...@yahoo-inc.com> wrote:
>>> 
>>>> Agree with Mithun and Robert. DistCp and Tools restructuring are
>> separate
>>>> tasks. Since DistCp code is ready to be committed, it need not wait for
>> the
>>>> Tools separation from MR/HDFS.
>>>> I would say it can go into contrib as the patch is now, and when the
>> tools
>>>> restructuring happens it would be just an svn mv.  If there are no
>> issues
>>>> with this proposal I can commit the code tomorrow.
>>>> 
>>>> Thanks
>>>> Amareshwari
>>>> 
>>>> On 8/26/11 7:45 PM, "Robert Evans" <ev...@yahoo-inc.com> wrote:
>>>> 
>>>> I agree with Mithun.  They are related but this goes beyond distcpv2 and
>>>> should not block distcpv2 from going in.  It would be very nice,
>> however, to
>>>> get the layout settled soon so that we all know where to find something
>> when
>>>> we want to work on it.
>>>> 
>>>> Also +1 for Alejandro's I also prefer to keep tools at the trunk level.
>>>> 
>>>> Even though HDFS, Common, and Mapreduce and perhaps soon tools are
>> separate
>>>> modules right now, there is still tight coupling between the different
>>>> pieces, especially with tests.  IMO until we can reduce that coupling we
>>>> should treat building and testing Hadoop as a single project instead of
>>>> trying to keep them separate.
>>>> 
>>>> --Bobby
>>>> 
>>>> On 8/26/11 7:45 AM, "Mithun Radhakrishnan" <
>> mithun.radhakrish...@yahoo.com>
>>>> wrote:
>>>> 
>>>> Would it be acceptable if retooling of tools/ were taken up separately?
>> It
>>>> sounds to me like this might be a distinct (albeit related) task.
>>>> 
>>>> Mithun
>>>> 
>>>> 
>>>> ________________________________
>>>> From: Giridharan Kesavan <gkesa...@hortonworks.com>
>>>> To: mapreduce-dev@hadoop.apache.org
>>>> Sent: Friday, August 26, 2011 12:04 PM
>>>> Subject: Re: DistCpV2 in 0.23
>>>> 
>>>> +1 to Alejandro's
>>>> 
>>>> I prefer to keep the hadoop-tools at trunk level.
>>>> 
>>>> -Giri
>>>> 
>>>> On Thu, Aug 25, 2011 at 9:15 PM, Alejandro Abdelnur <t...@cloudera.com>
>>>> wrote:
>>>>> I'd suggest putting hadoop-tools either at trunk/ level or having a a
>>>> tools
>>>>> aggregator module for hdfs and other for common.
>>>>> 
>>>>> I personal would prefer at trunk/.
>>>>> 
>>>>> Thanks.
>>>>> 
>>>>> Alejandro
>>>>> 
>>>>> On Thu, Aug 25, 2011 at 9:06 PM, Amareshwari Sri Ramadasu <
>>>>> amar...@yahoo-inc.com> wrote:
>>>>> 
>>>>>> Agree. It should be separate maven module (and patch puts it as
>> separate
>>>>>> maven module now). And top level for hadoop tools is nice to have, but
>>>> it
>>>>>> becomes hard to maintain until patch automation tests run the tests
>>>> under
>>>>>> tools. Currently we see many times the changes in HDFS effecting RAID
>>>> tests
>>>>>> in MapReduce. So, I'm fine putting the tools under hadoop-mapreduce.
>>>>>> 
>>>>>> I propose we can have something like the following:
>>>>>> 
>>>>>> trunk/
>>>>>> - hadoop-mapreduce
>>>>>>    - hadoop-mr-client
>>>>>>    - hadoop-yarn
>>>>>>    - hadoop-tools
>>>>>>        - hadoop-streaming
>>>>>>        - hadoop-archives
>>>>>>        - hadoop-distcp
>>>>>> 
>>>>>> Thoughts?
>>>>>> 
>>>>>> @Eli and @JD, we did not replace old legacy distcp because this is
>>>> really a
>>>>>> complete rewrite and did not want to remove it until users are
>>>> familiarized
>>>>>> with new one.
>>>>>> 
>>>>>> On 8/26/11 12:51 AM, "Todd Lipcon" <t...@cloudera.com> wrote:
>>>>>> 
>>>>>> Maybe a separate toplevel for hadoop-tools? Stuff like RAID could go
>>>>>> in there as well - ie tools that are downstream of MR and/or HDFS.
>>>>>> 
>>>>>> On Thu, Aug 25, 2011 at 12:09 PM, Mahadev Konar <
>>>> maha...@hortonworks.com>
>>>>>> wrote:
>>>>>>> +1 for a seperate module in hadoop-mapreduce-project. I think
>>>>>>> hadoop-mapreduce-client might not be right place for it. We might
>> have
>>>>>>> to pick a new maven module under hadoop-mapreduce-project that could
>>>>>>> host streaming/distcp/hadoop archives.
>>>>>>> 
>>>>>>> thanks
>>>>>>> mahadev
>>>>>>> 
>>>>>>> On Thu, Aug 25, 2011 at 11:04 AM, Alejandro Abdelnur <
>>>> t...@cloudera.com>
>>>>>> wrote:
>>>>>>>> Agree, it should be a separate maven module.
>>>>>>>> 
>>>>>>>> And it should be under hadoop-mapreduce-client, right?
>>>>>>>> 
>>>>>>>> And now that we are in the topic, the same should go for streaming,
>>>> no?
>>>>>>>> 
>>>>>>>> Thanks.
>>>>>>>> 
>>>>>>>> Alejandro
>>>>>>>> 
>>>>>>>> On Thu, Aug 25, 2011 at 10:58 AM, Todd Lipcon <t...@cloudera.com>
>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> On Thu, Aug 25, 2011 at 10:36 AM, Eli Collins <e...@cloudera.com>
>>>>>> wrote:
>>>>>>>>>> Nice work!   I definitely think this should go in 23 and 20x.
>>>>>>>>>> 
>>>>>>>>>> Agree with JD that it should be in the core code, not contrib.  If
>>>>>>>>>> it's going to be maintained then we should put it in the core
>>>> code.
>>>>>>>>> 
>>>>>>>>> Now that we're all mavenized, though, a separate maven module and
>>>>>>>>> artifact does make sense IMO - ie "hadoop jar
>>>>>>>>> hadoop-distcp-0.23.0-SNAPSHOT" rather than "hadoop distcp"
>>>>>>>>> 
>>>>>>>>> -Todd
>>>>>>>>> --
>>>>>>>>> Todd Lipcon
>>>>>>>>> Software Engineer, Cloudera
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Todd Lipcon
>>>>>> Software Engineer, Cloudera
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> -Giri
>>>> 
>>>> 
>>>> 
>>> 
>> 
>> 
>> 
> 

Reply via email to