Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/20923#discussion_r178057319 --- Diff: hadoop-cloud/pom.xml --- @@ -177,6 +214,188 @@ </exclusion> </exclusions> </dependency> + <!-- + the AWS module pulls in jackson; its transitive dependencies can create + intra-jackson-module version problems. + --> + <dependency> + <groupId>org.apache.hadoop</groupId> + <artifactId>hadoop-aws</artifactId> + <version>${hadoop.version}</version> + <scope>${hadoop.deps.scope}</scope> + <exclusions> + <exclusion> + <groupId>org.apache.hadoop</groupId> + <artifactId>hadoop-common</artifactId> + </exclusion> + <exclusion> + <groupId>commons-logging</groupId> + <artifactId>commons-logging</artifactId> + </exclusion> + <exclusion> + <groupId>org.codehaus.jackson</groupId> + <artifactId>jackson-mapper-asl</artifactId> + </exclusion> + <exclusion> + <groupId>org.codehaus.jackson</groupId> + <artifactId>jackson-core-asl</artifactId> + </exclusion> + <exclusion> + <groupId>com.fasterxml.jackson.core</groupId> + <artifactId>jackson-core</artifactId> + </exclusion> + <exclusion> + <groupId>com.fasterxml.jackson.core</groupId> + <artifactId>jackson-databind</artifactId> + </exclusion> + <exclusion> + <groupId>com.fasterxml.jackson.core</groupId> + <artifactId>jackson-annotations</artifactId> + </exclusion> + </exclusions> + </dependency> + <dependency> + <groupId>org.apache.hadoop</groupId> + <artifactId>hadoop-openstack</artifactId> + <version>${hadoop.version}</version> + <scope>${hadoop.deps.scope}</scope> + <exclusions> + <exclusion> + <groupId>org.apache.hadoop</groupId> + <artifactId>hadoop-common</artifactId> + </exclusion> + <exclusion> + <groupId>commons-logging</groupId> + <artifactId>commons-logging</artifactId> + </exclusion> + <exclusion> + <groupId>junit</groupId> + <artifactId>junit</artifactId> + </exclusion> + <exclusion> + <groupId>org.mockito</groupId> + <artifactId>mockito-all</artifactId> + </exclusion> + </exclusions> + </dependency> + + <!-- + Add joda time to ensure that anything downstream which doesn't pull in spark-hive + gets the correct joda time artifact, so doesn't have auth failures on later Java 8 JVMs + --> + <dependency> + <groupId>joda-time</groupId> + <artifactId>joda-time</artifactId> + <scope>${hadoop.deps.scope}</scope> + </dependency> + <!-- explicitly declare the jackson artifacts desired --> + <dependency> + <groupId>com.fasterxml.jackson.core</groupId> + <artifactId>jackson-databind</artifactId> + <scope>${hadoop.deps.scope}</scope> + </dependency> + <dependency> + <groupId>com.fasterxml.jackson.core</groupId> + <artifactId>jackson-annotations</artifactId> + <scope>${hadoop.deps.scope}</scope> + </dependency> + <dependency> + <groupId>com.fasterxml.jackson.dataformat</groupId> + <artifactId>jackson-dataformat-cbor</artifactId> + <version>${fasterxml.jackson.version}</version> + </dependency> + <!--Explicit declaration to force in Spark version into transitive dependencies --> + <dependency> + <groupId>org.apache.httpcomponents</groupId> + <artifactId>httpclient</artifactId> + <scope>${hadoop.deps.scope}</scope> + </dependency> + <!--Explicit declaration to force in Spark version into transitive dependencies --> + <dependency> + <groupId>org.apache.httpcomponents</groupId> + <artifactId>httpcore</artifactId> + <scope>${hadoop.deps.scope}</scope> + </dependency> + </dependencies> + </profile> + + <!-- + Hadoop 3 simplifies the classpath, and adds a new committer base class which + enables store-specific committers. + --> + <profile> + <id>hadoop-3</id> + <properties> + <extra.source.dir>src/hadoop-3/main/scala</extra.source.dir> + <extra.testsource.dir>src/hadoop-3/test/scala</extra.testsource.dir> + </properties> + + <build> + <plugins> + <!-- Include a source dir depending on the Scala version --> + <plugin> + <groupId>org.codehaus.mojo</groupId> + <artifactId>build-helper-maven-plugin</artifactId> + <executions> + <execution> + <id>add-scala-sources</id> + <phase>generate-sources</phase> + <goals> + <goal>add-source</goal> + </goals> + <configuration> + <sources> + <source>${extra.source.dir}</source> + </sources> + </configuration> + </execution> + <execution> + <id>add-scala-test-sources</id> + <phase>generate-test-sources</phase> + <goals> + <goal>add-test-source</goal> + </goals> + <configuration> + <sources> + <source>${extra.testsource.dir}</source> + </sources> + </configuration> + </execution> + </executions> + </plugin> + </plugins> + + </build> + <dependencies> + + <!-- + There's now a hadoop-cloud-storage which transitively pulls in the store JARs, + but it still needs some selective exclusion across versions, especially 3.0.x. --- End diff -- Excluding hadoop-client means there's no need to worry about any of the stuff explicitly excluded from hadoop-client in the spark root pom (asm/asm, jackson, etc). Hadoop 3.0.1 declares hadoop-client as a compile time dependency of [hadoop-cloud-storage](https://github.com/apache/hadoop/blob/branch-3.0.1/hadoop-cloud-storage-project/hadoop-cloud-storage/pom.xml) From 3.0.2+ it's been cut down to provided, and added `azure-datalake` as a dependency [commit 3c03672e](https://github.com/apache/hadoop/commit/3c03672e876ddbd6a6425ea1a056ad13adc309ea), so it's complete w.r.t ASF connectors. There's also a fix for the aws shaded SDK to exclude netty [HADOOP-15264](https://github.com/apache/hadoop/commit/e015e009897e481edc79f4ba72e2c53610b178a3), because of [aws-sdk-java/issues/1488](https://github.com/aws/aws-sdk-java/issues/1488). The individual hadoop cloud modules (hadoop-aws, hadoop-azure, ...) have also downgraded hadoop-client to being provided, so if you pull in any of those, you will only get the extra artifacts needed to connect to the relevant cloud endpoint, and are expected to pull in the same hadoop-client version elsewhere for things to work. Here's the dependency list for spark-hadoop-cloud and 3.0.2-SNAPSHOT; 3.1 will be the same unless there's a last minute update to one of the external SDKs or jetty. ``` [INFO] +- org.apache.hadoop:hadoop-cloud-storage:jar:3.0.2-SNAPSHOT:compile [INFO] | +- org.apache.hadoop:hadoop-aliyun:jar:3.0.2-SNAPSHOT:compile [INFO] | | \- com.aliyun.oss:aliyun-sdk-oss:jar:2.8.3:compile [INFO] | | \- org.jdom:jdom:jar:1.1:compile [INFO] | +- org.apache.hadoop:hadoop-aws:jar:3.0.2-SNAPSHOT:compile [INFO] | | \- com.amazonaws:aws-java-sdk-bundle:jar:1.11.271:compile [INFO] | +- org.apache.hadoop:hadoop-azure:jar:3.0.2-SNAPSHOT:compile [INFO] | | +- com.microsoft.azure:azure-storage:jar:5.4.0:compile [INFO] | | | \- com.microsoft.azure:azure-keyvault-core:jar:0.8.0:compile [INFO] | | \- org.eclipse.jetty:jetty-util-ajax:jar:9.3.19.v20170502:compile [INFO] | +- org.apache.hadoop:hadoop-azure-datalake:jar:3.0.2-SNAPSHOT:compile [INFO] | | \- com.microsoft.azure:azure-data-lake-store-sdk:jar:2.2.5:compile [INFO] | \- org.apache.hadoop:hadoop-openstack:jar:3.0.2-SNAPSHOT:compile ``` Given that Hadoop 3.0.2+ is downgrading hadoop-client to provided, and that's the minimum version this patch will build against, then the exclusion is mostly superfluous: there to block regressions than actually keep it out.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org