[jira] [Commented] (MAPREDUCE-1638) Divide MapReduce into API and implementation source trees
[ https://issues.apache.org/jira/browse/MAPREDUCE-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051382#comment-13051382 ] Todd Lipcon commented on MAPREDUCE-1638: Could the toFullPropertyName method be moved to MRJobConfig and then used by both QueueManager and JobSubmitter? Duplicating it seems ugly. Divide MapReduce into API and implementation source trees - Key: MAPREDUCE-1638 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1638 Project: Hadoop Map/Reduce Issue Type: Improvement Components: build, client Reporter: Tom White Assignee: Tom White Attachments: MAPREDUCE-1638.patch, MAPREDUCE-1638.patch, MAPREDUCE-1638.sh I think it makes sense to separate the MapReduce source into public API and implementation trees. The public API could be broken further into kernel and library trees. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-1638) Divide MapReduce into API and implementation source trees
[ https://issues.apache.org/jira/browse/MAPREDUCE-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050755#comment-13050755 ] Owen O'Malley commented on MAPREDUCE-1638: -- I can understand splitting up the client and server jars, but splitting up the API and implementation only makes sense if you have different implementations and a test suite to test them. Cleaning up the dependencies is a good thing, especially removing dependencies from the client on the server code. Divide MapReduce into API and implementation source trees - Key: MAPREDUCE-1638 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1638 Project: Hadoop Map/Reduce Issue Type: Improvement Components: build, client Reporter: Tom White Assignee: Tom White Attachments: MAPREDUCE-1638.patch, MAPREDUCE-1638.patch, MAPREDUCE-1638.sh I think it makes sense to separate the MapReduce source into public API and implementation trees. The public API could be broken further into kernel and library trees. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-1638) Divide MapReduce into API and implementation source trees
[ https://issues.apache.org/jira/browse/MAPREDUCE-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050765#comment-13050765 ] Tom White commented on MAPREDUCE-1638: -- Yes, these patches are about removing client dependencies on the server. Divide MapReduce into API and implementation source trees - Key: MAPREDUCE-1638 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1638 Project: Hadoop Map/Reduce Issue Type: Improvement Components: build, client Reporter: Tom White Assignee: Tom White Attachments: MAPREDUCE-1638.patch, MAPREDUCE-1638.patch, MAPREDUCE-1638.sh I think it makes sense to separate the MapReduce source into public API and implementation trees. The public API could be broken further into kernel and library trees. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (MAPREDUCE-1638) Divide MapReduce into API and implementation source trees
[ https://issues.apache.org/jira/browse/MAPREDUCE-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852427#action_12852427 ] Steve Loughran commented on MAPREDUCE-1638: --- I know of people who are looking at alternate execution engines to the JT, so having that bit of the API decoupled from the implementation would be good. I don't (yet) see the need for separate JARs though, it only complicates things and creates new problems. Divide MapReduce into API and implementation source trees - Key: MAPREDUCE-1638 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1638 Project: Hadoop Map/Reduce Issue Type: Improvement Components: build, client Reporter: Tom White Assignee: Tom White I think it makes sense to separate the MapReduce source into public API and implementation trees. The public API could be broken further into kernel and library trees. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1638) Divide MapReduce into API and implementation source trees
[ https://issues.apache.org/jira/browse/MAPREDUCE-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851653#action_12851653 ] Tom White commented on MAPREDUCE-1638: -- bq. After considering it, I don't think that we should separate the api from the impls. In general, that makes more sense if you have multiple implementations of the api. In some sense we already have multiple implementations of the API, if you consider the full distributed implementation and LocalJobRunner as different implementations. (Also MRUnit is a partial implementation.) bq. I'm also worried that there would be a circular dependence between the two jars. You're right that there are currently many circular dependencies, which are hard to break. Perhaps we can improve this in the future, but it's a bigger project (a project Jigsaw - http://openjdk.java.net/projects/jigsaw/ - for MapReduce?). bq. I agree that we need to make the line stronger, but maybe it would be better to move the implementations into new packages? This would be a good step. For the moment, I think that a combination of MAPREDUCE-1623 and HADOOP-6658 is enough to define the public user-facing API. Divide MapReduce into API and implementation source trees - Key: MAPREDUCE-1638 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1638 Project: Hadoop Map/Reduce Issue Type: Improvement Components: build, client Reporter: Tom White Assignee: Tom White I think it makes sense to separate the MapReduce source into public API and implementation trees. The public API could be broken further into kernel and library trees. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1638) Divide MapReduce into API and implementation source trees
[ https://issues.apache.org/jira/browse/MAPREDUCE-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850303#action_12850303 ] Tom White commented on MAPREDUCE-1638: -- To flesh this out a little more: roughly speaking the API tree would contain the public part of o.a.h.mapred that contains the (deprecated) user API as well as o.a.h.mapreduce. The library tree would contain o.a.h.mapred.lib and o.a.h.mapreduce.lib (and subpackages). The implementation tree would contain everything else (although there may be exceptions - classes that are considered a part of the public API and should go in the API tree). This change would mark a very clear boundary between the public user-facing API and the internal implementation. Having separate source trees is a common approach in many projects. The use of annotations introduced in HADOOP-5073 doesn't provide such a clear demarcation (since you can't conditionally compile according to the presence of an annotation), but is still useful for more fine-grained distinctions. Note that this change would not introduce an incompatible change, since classes would be moved between trees and remain in the same packages. I see the following advantages: * If we created separate JARs for each tree, clients could compile against the API and library JARs without inadvertently introducing dependencies on implementation class that happen to be public. Even if the class is marked as InterfaceAudience.Private, it is easy to accidentally have the IDE pick it up. * It makes MAPREDUCE-1478 (shipping modified libraries) easy to implement. * We can enforce that the kernel (user API) and implementation don't depend on the libraries. (MAPREDUCE-1453) * It helps enforce compatibility. From a review standpoint it would be easier to see if a patch modifies a public API, since it is in its own tree. Similarly, we would publish javadoc for API and library, and generate JDiff against them. Currently JDiff output is very large due to a large number of false positives, so it is difficult to see real incompatibility problems. (HADOOP-6658 helps here, but the approach described in this issue solves the other problems listed above too.) Thoughts? Divide MapReduce into API and implementation source trees - Key: MAPREDUCE-1638 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1638 Project: Hadoop Map/Reduce Issue Type: Improvement Components: build, client Reporter: Tom White Assignee: Tom White I think it makes sense to separate the MapReduce source into public API and implementation trees. The public API could be broken further into kernel and library trees. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1638) Divide MapReduce into API and implementation source trees
[ https://issues.apache.org/jira/browse/MAPREDUCE-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850462#action_12850462 ] Owen O'Malley commented on MAPREDUCE-1638: -- I support splitting the libraries out to their own source tree. That both supports important use cases (MAPREDUCE-1478) and enforces good style (MAPREDUCE-1453). After considering it, I don't think that we should separate the api from the impls. In general, that makes more sense if you have multiple implementations of the api. I'm also worried that there would be a circular dependence between the two jars. I agree that we need to make the line stronger, but maybe it would be better to move the implementations into new packages? Leave all of the API classes alone and for the implementation classes move them as: org.apache.hadoop.mapred.* - org.apache.hadoop.mapreduce.impl.* org.apache.hadoop.mapreduce.* - org.apache.hadoop.mapreduce.impl.* Divide MapReduce into API and implementation source trees - Key: MAPREDUCE-1638 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1638 Project: Hadoop Map/Reduce Issue Type: Improvement Components: build, client Reporter: Tom White Assignee: Tom White I think it makes sense to separate the MapReduce source into public API and implementation trees. The public API could be broken further into kernel and library trees. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.