Copying common-dev.

Summarizing the below discussion: What should be the tools layout after 
mavenization?

Option #1: Have hadoop-tools at top level i.e
trunk/
   hadoop-tools/
       hadoop-distcp/
Pros:
 Cleaner layout.
 In future, tools could be released separately from  Hadoop releases

Cons: Difficult to maintain

Option #2: Keep the tools aggregator module for MapReduce/HDFS/Common if they 
are depending on MapReduce/HDFS/Common respectively.
For ex:
hadoop-mapreduce-project/
   hadoop-mr-tools/
      hadoop-distcp/

Pros: Easy to maintain
Cons: Still has tight coupling with related projects.

Personally, I'm fine with any of the above options. Looking for suggestions and 
reaching a consensus on this.

Thanks
Amareshwari

On 8/30/11 12:10 AM, "Allen Wittenauer" <a...@apache.org> wrote:



I have a feeling this discussion should get moved to common-dev or even to 
general.

My #1 question is if tools is basically contrib reborn.  If not, what makes it 
different?

On Aug 29, 2011, at 1:43 AM, Amareshwari Sri Ramadasu wrote:

> Some questions on making hadoop-tools top level under trunk,
>
> 1.  Should the patches for tools be created against Hadoop Common?
> 2.  What will happen to the tools test automation? Will it run as part of 
> Hadoop Common tests?
> 3.  Will it introduce a dependency from MapReduce to Common? Or is this taken 
> care in Mavenization?
>
>
> Thanks
> Amareshwari
>
> On 8/26/11 10:17 PM, "Alejandro Abdelnur" <t...@cloudera.com> wrote:
>
> Please, don't add more Mavenization work on us (eventually I want to go back
> to coding)
>
> Given that Hadoop is already Mavenized, the patch should be Mavenized.
>
> What will have to be done extra (besides Mavenizing distcp) is to create a
> hadoop-tools module at root level and within it a hadoop-distcp module.
>
> The hadoop-tools POM will look pretty much like the hadoop-common-project
> POM.
>
> The hadoop-distcp POM should follow the hadoop-common POM patterns.
>
> Thanks.
>
> Alejandro
>
> On Fri, Aug 26, 2011 at 9:37 AM, Amareshwari Sri Ramadasu <
> amar...@yahoo-inc.com> wrote:
>
>> Agree with Mithun and Robert. DistCp and Tools restructuring are separate
>> tasks. Since DistCp code is ready to be committed, it need not wait for the
>> Tools separation from MR/HDFS.
>> I would say it can go into contrib as the patch is now, and when the tools
>> restructuring happens it would be just an svn mv.  If there are no issues
>> with this proposal I can commit the code tomorrow.
>>
>> Thanks
>> Amareshwari
>>
>> On 8/26/11 7:45 PM, "Robert Evans" <ev...@yahoo-inc.com> wrote:
>>
>> I agree with Mithun.  They are related but this goes beyond distcpv2 and
>> should not block distcpv2 from going in.  It would be very nice, however, to
>> get the layout settled soon so that we all know where to find something when
>> we want to work on it.
>>
>> Also +1 for Alejandro's I also prefer to keep tools at the trunk level.
>>
>> Even though HDFS, Common, and Mapreduce and perhaps soon tools are separate
>> modules right now, there is still tight coupling between the different
>> pieces, especially with tests.  IMO until we can reduce that coupling we
>> should treat building and testing Hadoop as a single project instead of
>> trying to keep them separate.
>>
>> --Bobby
>>
>> On 8/26/11 7:45 AM, "Mithun Radhakrishnan" <mithun.radhakrish...@yahoo.com>
>> wrote:
>>
>> Would it be acceptable if retooling of tools/ were taken up separately? It
>> sounds to me like this might be a distinct (albeit related) task.
>>
>> Mithun
>>
>>
>> ________________________________
>> From: Giridharan Kesavan <gkesa...@hortonworks.com>
>> To: mapreduce-dev@hadoop.apache.org
>> Sent: Friday, August 26, 2011 12:04 PM
>> Subject: Re: DistCpV2 in 0.23
>>
>> +1 to Alejandro's
>>
>> I prefer to keep the hadoop-tools at trunk level.
>>
>> -Giri
>>
>> On Thu, Aug 25, 2011 at 9:15 PM, Alejandro Abdelnur <t...@cloudera.com>
>> wrote:
>>> I'd suggest putting hadoop-tools either at trunk/ level or having a a
>> tools
>>> aggregator module for hdfs and other for common.
>>>
>>> I personal would prefer at trunk/.
>>>
>>> Thanks.
>>>
>>> Alejandro
>>>
>>> On Thu, Aug 25, 2011 at 9:06 PM, Amareshwari Sri Ramadasu <
>>> amar...@yahoo-inc.com> wrote:
>>>
>>>> Agree. It should be separate maven module (and patch puts it as separate
>>>> maven module now). And top level for hadoop tools is nice to have, but
>> it
>>>> becomes hard to maintain until patch automation tests run the tests
>> under
>>>> tools. Currently we see many times the changes in HDFS effecting RAID
>> tests
>>>> in MapReduce. So, I'm fine putting the tools under hadoop-mapreduce.
>>>>
>>>> I propose we can have something like the following:
>>>>
>>>> trunk/
>>>> - hadoop-mapreduce
>>>>     - hadoop-mr-client
>>>>     - hadoop-yarn
>>>>     - hadoop-tools
>>>>         - hadoop-streaming
>>>>         - hadoop-archives
>>>>         - hadoop-distcp
>>>>
>>>> Thoughts?
>>>>
>>>> @Eli and @JD, we did not replace old legacy distcp because this is
>> really a
>>>> complete rewrite and did not want to remove it until users are
>> familiarized
>>>> with new one.
>>>>
>>>> On 8/26/11 12:51 AM, "Todd Lipcon" <t...@cloudera.com> wrote:
>>>>
>>>> Maybe a separate toplevel for hadoop-tools? Stuff like RAID could go
>>>> in there as well - ie tools that are downstream of MR and/or HDFS.
>>>>
>>>> On Thu, Aug 25, 2011 at 12:09 PM, Mahadev Konar <
>> maha...@hortonworks.com>
>>>> wrote:
>>>>> +1 for a seperate module in hadoop-mapreduce-project. I think
>>>>> hadoop-mapreduce-client might not be right place for it. We might have
>>>>> to pick a new maven module under hadoop-mapreduce-project that could
>>>>> host streaming/distcp/hadoop archives.
>>>>>
>>>>> thanks
>>>>> mahadev
>>>>>
>>>>> On Thu, Aug 25, 2011 at 11:04 AM, Alejandro Abdelnur <
>> t...@cloudera.com>
>>>> wrote:
>>>>>> Agree, it should be a separate maven module.
>>>>>>
>>>>>> And it should be under hadoop-mapreduce-client, right?
>>>>>>
>>>>>> And now that we are in the topic, the same should go for streaming,
>> no?
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> Alejandro
>>>>>>
>>>>>> On Thu, Aug 25, 2011 at 10:58 AM, Todd Lipcon <t...@cloudera.com>
>>>> wrote:
>>>>>>
>>>>>>> On Thu, Aug 25, 2011 at 10:36 AM, Eli Collins <e...@cloudera.com>
>>>> wrote:
>>>>>>>> Nice work!   I definitely think this should go in 23 and 20x.
>>>>>>>>
>>>>>>>> Agree with JD that it should be in the core code, not contrib.  If
>>>>>>>> it's going to be maintained then we should put it in the core
>> code.
>>>>>>>
>>>>>>> Now that we're all mavenized, though, a separate maven module and
>>>>>>> artifact does make sense IMO - ie "hadoop jar
>>>>>>> hadoop-distcp-0.23.0-SNAPSHOT" rather than "hadoop distcp"
>>>>>>>
>>>>>>> -Todd
>>>>>>> --
>>>>>>> Todd Lipcon
>>>>>>> Software Engineer, Cloudera
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Todd Lipcon
>>>> Software Engineer, Cloudera
>>>>
>>>>
>>>
>>
>>
>>
>> --
>> -Giri
>>
>>
>>
>


Reply via email to