[ https://issues.apache.org/jira/browse/SPARK-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13994990#comment-13994990 ]
Sean Owen commented on SPARK-1802: ---------------------------------- [~pwendell] You can see my start on it here: https://github.com/srowen/spark/commits/SPARK-1802 https://github.com/srowen/spark/commit/a856604cfc67cb58146ada01fda6dbbb2515fa00 This resolves the new issues you note in your diff. Next issue is that hive-exec, quite awfully, includes a copy of all of its transitive dependencies in its artifact. See https://issues.apache.org/jira/browse/HIVE-5733 and note the warnings you'll get during assembly: {code} [WARNING] hive-exec-0.12.0.jar, libthrift-0.9.0.jar define 153 overlappping classes: [WARNING] - org.apache.thrift.transport.TSaslTransport$SaslResponse ... {code} hive-exec is in fact used in this module. Aside from actual surgery on the artifact with the shade plugin, you can't control the dependencies as a result. This may be simply "the best that can be done" right now. If it has worked, it has worked. Am I right that the datanucleus JARs *are* meant to be in the assembly, only for the Hive build? https://github.com/apache/spark/pull/688 https://github.com/apache/spark/pull/610 That's good if so since that's what your diff shows. Finally, while we're here, I note that there are still a few JAR conflicts that turn up when you build the assembly *without* Hive. (I'm going to ignore conflicts in examples; these can be cleaned up but aren't really a big deal given its nature.) We could touch those up too. This is in the normal build (and I know how to zap most of this problem): {code} [WARNING] commons-beanutils-core-1.8.0.jar, commons-beanutils-1.7.0.jar define 82 overlappping classes: {code} These turn up in the Hadoop 2.x + YARN build: {code} [WARNING] servlet-api-2.5.jar, javax.servlet-3.0.0.v201112011016.jar define 42 overlappping classes: ... [WARNING] jcl-over-slf4j-1.7.5.jar, commons-logging-1.1.3.jar define 6 overlappping classes: ... [WARNING] activation-1.1.jar, javax.activation-1.1.0.v201105071233.jar define 17 overlappping classes: ... [WARNING] servlet-api-2.5.jar, javax.servlet-3.0.0.v201112011016.jar define 42 overlappping classes: {code} These should be easy to track down. Shall I? > Audit dependency graph when Spark is built with -Phive > ------------------------------------------------------ > > Key: SPARK-1802 > URL: https://issues.apache.org/jira/browse/SPARK-1802 > Project: Spark > Issue Type: Bug > Reporter: Patrick Wendell > Priority: Blocker > Fix For: 1.0.0 > > > I'd like to have binary release for 1.0 include Hive support. Since this > isn't enabled by default in the build I don't think it's as well tested, so > we should dig around a bit and decide if we need to e.g. add any excludes. > {code} > $ mvn install -Phive -DskipTests && mvn dependency:build-classpath -pl > assembly | grep -v INFO | tr ":" "\n" | awk ' { FS="/"; print ( $(NF) ); }' > | sort > without_hive.txt > $ mvn install -Phive -DskipTests && mvn dependency:build-classpath -Phive -pl > assembly | grep -v INFO | tr ":" "\n" | awk ' { FS="/"; print ( $(NF) ); }' > | sort > with_hive.txt > $ diff without_hive.txt with_hive.txt > < antlr-2.7.7.jar > < antlr-3.4.jar > < antlr-runtime-3.4.jar > 10,14d6 > < avro-1.7.4.jar > < avro-ipc-1.7.4.jar > < avro-ipc-1.7.4-tests.jar > < avro-mapred-1.7.4.jar > < bonecp-0.7.1.RELEASE.jar > 22d13 > < commons-cli-1.2.jar > 25d15 > < commons-compress-1.4.1.jar > 33,34d22 > < commons-logging-1.1.1.jar > < commons-logging-api-1.0.4.jar > 38d25 > < commons-pool-1.5.4.jar > 46,49d32 > < datanucleus-api-jdo-3.2.1.jar > < datanucleus-core-3.2.2.jar > < datanucleus-rdbms-3.2.1.jar > < derby-10.4.2.0.jar > 53,57d35 > < hive-common-0.12.0.jar > < hive-exec-0.12.0.jar > < hive-metastore-0.12.0.jar > < hive-serde-0.12.0.jar > < hive-shims-0.12.0.jar > 60,61d37 > < httpclient-4.1.3.jar > < httpcore-4.1.3.jar > 68d43 > < JavaEWAH-0.3.2.jar > 73d47 > < javolution-5.5.1.jar > 76d49 > < jdo-api-3.0.1.jar > 78d50 > < jetty-6.1.26.jar > 87d58 > < jetty-util-6.1.26.jar > 93d63 > < json-20090211.jar > 98d67 > < jta-1.1.jar > 103,104d71 > < libfb303-0.9.0.jar > < libthrift-0.9.0.jar > 112d78 > < mockito-all-1.8.5.jar > 136d101 > < servlet-api-2.5-20081211.jar > 139d103 > < snappy-0.2.jar > 144d107 > < spark-hive_2.10-1.0.0.jar > 151d113 > < ST4-4.0.4.jar > 153d114 > < stringtemplate-3.2.1.jar > 156d116 > < velocity-1.7.jar > 158d117 > < xz-1.0.jar > {code} > Some initial investigation suggests we may need to take some precaution > surrounding (a) jetty and (b) servlet-api. -- This message was sent by Atlassian JIRA (v6.2#6252)