[jira] [Commented] (MAPREDUCE-1638) Divide MapReduce into API and implementation source trees

2011-06-17 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051382#comment-13051382
 ] 

Todd Lipcon commented on MAPREDUCE-1638:


Could the toFullPropertyName method be moved to MRJobConfig and then used by 
both QueueManager and JobSubmitter? Duplicating it seems ugly.

 Divide MapReduce into API and implementation source trees
 -

 Key: MAPREDUCE-1638
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1638
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: build, client
Reporter: Tom White
Assignee: Tom White
 Attachments: MAPREDUCE-1638.patch, MAPREDUCE-1638.patch, 
 MAPREDUCE-1638.sh


 I think it makes sense to separate the MapReduce source into public API and 
 implementation trees. The public API could be broken further into kernel and 
 library trees.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-1638) Divide MapReduce into API and implementation source trees

2011-06-16 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050755#comment-13050755
 ] 

Owen O'Malley commented on MAPREDUCE-1638:
--

I can understand splitting up the client and server jars, but splitting up the 
API and implementation only makes sense if you have different implementations 
and a test suite to test them.

Cleaning up the dependencies is a good thing, especially removing dependencies 
from the client on the server code.

 Divide MapReduce into API and implementation source trees
 -

 Key: MAPREDUCE-1638
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1638
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: build, client
Reporter: Tom White
Assignee: Tom White
 Attachments: MAPREDUCE-1638.patch, MAPREDUCE-1638.patch, 
 MAPREDUCE-1638.sh


 I think it makes sense to separate the MapReduce source into public API and 
 implementation trees. The public API could be broken further into kernel and 
 library trees.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-1638) Divide MapReduce into API and implementation source trees

2011-06-16 Thread Tom White (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050765#comment-13050765
 ] 

Tom White commented on MAPREDUCE-1638:
--

Yes, these patches are about removing client dependencies on the server.

 Divide MapReduce into API and implementation source trees
 -

 Key: MAPREDUCE-1638
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1638
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: build, client
Reporter: Tom White
Assignee: Tom White
 Attachments: MAPREDUCE-1638.patch, MAPREDUCE-1638.patch, 
 MAPREDUCE-1638.sh


 I think it makes sense to separate the MapReduce source into public API and 
 implementation trees. The public API could be broken further into kernel and 
 library trees.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (MAPREDUCE-1638) Divide MapReduce into API and implementation source trees

2010-04-01 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852427#action_12852427
 ] 

Steve Loughran commented on MAPREDUCE-1638:
---

I know of people who are looking at alternate execution engines to the JT, so 
having that bit of the API decoupled from the implementation would be good. I 
don't (yet) see the need for separate JARs though, it only complicates things 
and creates new problems.

 Divide MapReduce into API and implementation source trees
 -

 Key: MAPREDUCE-1638
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1638
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: build, client
Reporter: Tom White
Assignee: Tom White

 I think it makes sense to separate the MapReduce source into public API and 
 implementation trees. The public API could be broken further into kernel and 
 library trees.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1638) Divide MapReduce into API and implementation source trees

2010-03-30 Thread Tom White (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851653#action_12851653
 ] 

Tom White commented on MAPREDUCE-1638:
--

bq. After considering it, I don't think that we should separate the api from 
the impls. In general, that makes more sense if you have multiple 
implementations of the api.

In some sense we already have multiple implementations of the API, if you 
consider the full distributed implementation and LocalJobRunner as different 
implementations. (Also MRUnit is a partial implementation.)

bq. I'm also worried that there would be a circular dependence between the two 
jars.

You're right that there are currently many circular dependencies, which are 
hard to break. Perhaps we can improve this in the future, but it's a bigger 
project (a project Jigsaw - http://openjdk.java.net/projects/jigsaw/ - for 
MapReduce?). 

bq. I agree that we need to make the line stronger, but maybe it would be 
better to move the implementations into new packages?

This would be a good step. For the moment, I think that a combination of 
MAPREDUCE-1623 and HADOOP-6658 is enough to define the public user-facing API. 

 Divide MapReduce into API and implementation source trees
 -

 Key: MAPREDUCE-1638
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1638
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: build, client
Reporter: Tom White
Assignee: Tom White

 I think it makes sense to separate the MapReduce source into public API and 
 implementation trees. The public API could be broken further into kernel and 
 library trees.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1638) Divide MapReduce into API and implementation source trees

2010-03-26 Thread Tom White (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850303#action_12850303
 ] 

Tom White commented on MAPREDUCE-1638:
--

To flesh this out a little more: roughly speaking the API tree would contain 
the public part of o.a.h.mapred that contains the (deprecated) user API as well 
as o.a.h.mapreduce. The library tree would contain o.a.h.mapred.lib and 
o.a.h.mapreduce.lib (and subpackages). The implementation tree would contain 
everything else (although there may be exceptions - classes that are considered 
a part of the public API and should go in the API tree).

This change would mark a very clear boundary between the public user-facing API 
and the internal implementation. Having separate source trees is a common 
approach in many projects. The use of annotations introduced in HADOOP-5073 
doesn't provide such a clear demarcation (since you can't conditionally compile 
according to the presence of an annotation), but is still useful for more 
fine-grained distinctions.

Note that this change would not introduce an incompatible change, since classes 
would be moved between trees and remain in the same packages.

I see the following advantages:
* If we created separate JARs for each tree, clients could compile against the 
API and library JARs without inadvertently introducing dependencies on 
implementation class that happen to be public. Even if the class is marked as 
InterfaceAudience.Private, it is easy to accidentally have the IDE pick it up. 
* It makes MAPREDUCE-1478 (shipping modified libraries) easy to implement.
* We can enforce that the kernel (user API) and implementation don't depend on 
the libraries. (MAPREDUCE-1453)
* It helps enforce compatibility. From a review standpoint it would be easier 
to see if a patch modifies a public API, since it is in its own tree. 
Similarly, we would publish javadoc for API and library, and generate JDiff 
against them. Currently JDiff output is very large due to a large number of 
false positives, so it is difficult to see real incompatibility problems. 
(HADOOP-6658 helps here, but the approach described in this issue solves the 
other problems listed above too.)

Thoughts?

 Divide MapReduce into API and implementation source trees
 -

 Key: MAPREDUCE-1638
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1638
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: build, client
Reporter: Tom White
Assignee: Tom White

 I think it makes sense to separate the MapReduce source into public API and 
 implementation trees. The public API could be broken further into kernel and 
 library trees.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1638) Divide MapReduce into API and implementation source trees

2010-03-26 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850462#action_12850462
 ] 

Owen O'Malley commented on MAPREDUCE-1638:
--

I support splitting the libraries out to their own source tree. That both 
supports important use cases (MAPREDUCE-1478) and enforces good style 
(MAPREDUCE-1453).

After considering it, I don't think that we should separate the api from the 
impls. In general, that makes more sense if you have multiple implementations 
of the api. I'm also worried that there would be a circular dependence between 
the two jars.

I agree that we need to make the line stronger, but maybe it would be better to 
move the implementations into new packages?

Leave all of the API classes alone and for the implementation classes move them 
as:

org.apache.hadoop.mapred.* -  org.apache.hadoop.mapreduce.impl.*
org.apache.hadoop.mapreduce.* - org.apache.hadoop.mapreduce.impl.*

 Divide MapReduce into API and implementation source trees
 -

 Key: MAPREDUCE-1638
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1638
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: build, client
Reporter: Tom White
Assignee: Tom White

 I think it makes sense to separate the MapReduce source into public API and 
 implementation trees. The public API could be broken further into kernel and 
 library trees.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.