[ 
https://issues.apache.org/jira/browse/HCATALOG-520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13467381#comment-13467381
 ] 

Travis Crawford commented on HCATALOG-520:
------------------------------------------

Simply by removing the jars we obviously don't need I was able to reduce our 
deps by ~20 jars.

Currently we depend on {{hive-hbase-handler}} in {{hcatalog-core}}, which I 
don't think we actually need to do. This sheds a LOT of dependencies.

To gain visibility about what our transitive dependencies are, I ported over 
something I do for a different project. The idea is you check in the list of 
jars you expect to depend on, and at build time you fail if that list changes. 
A tool is provided to easily update the list, and the error message is very 
clear. The idea is your dependencies are a critical part of the project and you 
should be aware what they are and when they change.

Preview for anyone who's interested:
https://github.com/traviscrawford/hcatalog/commit/a2a4e085b1b528f9e00765255dadae550f735513
                
> Simplify HCatalog dependencies
> ------------------------------
>
>                 Key: HCATALOG-520
>                 URL: https://issues.apache.org/jira/browse/HCATALOG-520
>             Project: HCatalog
>          Issue Type: Improvement
>            Reporter: Travis Crawford
>            Assignee: Travis Crawford
>
> Looking through the hcatalog-core dependencies I believe we have an 
> opportunity to trim them down. A major goal of HCatalog is to be a dependency 
> of other processing tools, and we can make that more attractive by invading 
> their classpath as little as possible.
> I believe the following look good (minus hive-exec which is a fat jar, but 
> that's a separate issue):
> {code}
>     <dependency org="org.apache.hadoop" name="hadoop-tools" 
> rev="${hadoop20.version}" conf="default->*"/>
>     <dependency org="org.apache.hive" name="hive-builtins" 
> rev="${hive.version}"/>
>     <dependency org="org.apache.hive" name="hive-metastore" 
> rev="${hive.version}"/>
>     <dependency org="org.apache.hive" name="hive-common" 
> rev="${hive.version}"/>
>     <dependency org="org.apache.hive" name="hive-exec" rev="${hive.version}"/>
>     <dependency org="org.apache.hive" name="hive-cli" rev="${hive.version}"/>
>     <dependency org="org.apache.hive" name="hive-hbase-handler" 
> rev="${hive.version}">
>       <exclude org="org.apache.maven.plugins"/>
>       <exclude org="org.jruby"/>
>     </dependency>
> {code}
> The following are where I believe we can make improvements:
> {code}
> <dependency org="org.apache.pig" name="pig" rev="${pig.version}" 
> conf="default->*"/>
> {code}
> Pig is still depended on in hcatalog-core tests, but has not yet been moved 
> to the test target. A major goal of switching to subprojects was to stop 
> forcing processing frameworks as dependencies on people using HCat. This 
> should move to the test target (since some core tests use pig for 
> convenience).
> {code}
> <dependency org="javax.management.j2ee" name="management-api" 
> rev="${javax-mgmt.version}"/>
> {code}
> Does anyone know why management-api is needed? I'm not familiar with this and 
> don't see any usages from a quick grep. Its something JMS-related, and maybe 
> was needed by hcatalog-server-extensions at some point? If tests pass without 
> this I think we should remove it.
> {code}
> <dependency org="org.codehaus.jackson" name="jackson-mapper-asl" 
> rev="${jackson.version}"/>
> <dependency org="org.codehaus.jackson" name="jackson-core-asl" 
> rev="${jackson.version}"/>
> {code}
> HCatalog build requests jackson 1.7.3, and hive-exec depends on 1.8.8. Any 
> objection to using the versions provided by Hive?
> {code}
> <dependency org="org.apache.thrift" name="libfb303" rev="${fb303.version}"/>
> {code}
> I don't believe this is required because hive-metastore depends on libfb303.
> {code}
> <dependency org="commons-dbcp" name="commons-dbcp" 
> rev="${commons-dbcp.version}">
>   <exclude module="commons-pool"/>
>   <exclude org="org.apache.geronimo.specs" module="geronimo-jta_1.1_spec"/>
> </dependency>
> {code}
> hive-metastore depends on commons-dbcp and I don't believe we need to 
> explicitly depend on this.
> {code}
> <dependency org="com.google.guava" name="guava" rev="${guava.version}"/>
> {code}
> hive-exec depends on guava 11.0.2 too so I don't believe we need to depend on 
> this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to