Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/20923#discussion_r178060744 --- Diff: hadoop-cloud/pom.xml --- @@ -141,13 +93,98 @@ <artifactId>httpcore</artifactId> <scope>${hadoop.deps.scope}</scope> </dependency> + </dependencies> <profiles> + <!-- this inner profile is the default one and includes openstack and aws --> + <profile> + <id>hadoop-2.6</id> + <activation> + <activeByDefault>true</activeByDefault> --- End diff -- Hmmm. There's another option which is to leave all those in the standard list, and you get a few extra dependencies which aren't needed for the 3.x line: ``` [INFO] +- com.fasterxml.jackson.core:jackson-databind:jar:2.6.7.1:compile * [INFO] | \- com.fasterxml.jackson.core:jackson-core:jar:2.6.7:compile * [INFO] +- com.fasterxml.jackson.core:jackson-annotations:jar:2.6.7:compile * [INFO] +- com.fasterxml.jackson.dataformat:jackson-dataformat-cbor:jar:2.6.7:compile * [INFO] +- org.apache.httpcomponents:httpclient:jar:4.5.4:compile [INFO] | +- commons-logging:commons-logging:jar:1.2:compile [INFO] | \- commons-codec:commons-codec:jar:1.10:compile [INFO] +- org.apache.httpcomponents:httpcore:jar:4.4.8:compile [INFO] +- org.apache.hadoop:hadoop-aws:jar:3.0.2-SNAPSHOT:compile [INFO] | \- com.amazonaws:aws-java-sdk-bundle:jar:1.11.271:compile [INFO] +- org.apache.hadoop:hadoop-openstack:jar:3.0.2-SNAPSHOT:compile [INFO] +- joda-time:joda-time:jar:2.9.3:compile * [INFO] +- org.apache.hadoop:hadoop-cloud-storage:jar:3.0.2-SNAPSHOT:compile [INFO] | +- org.apache.hadoop:hadoop-aliyun:jar:3.0.2-SNAPSHOT:compile [INFO] | | \- com.aliyun.oss:aliyun-sdk-oss:jar:2.8.3:compile [INFO] | | \- org.jdom:jdom:jar:1.1:compile [INFO] | +- org.apache.hadoop:hadoop-azure:jar:3.0.2-SNAPSHOT:compile [INFO] | | +- com.microsoft.azure:azure-storage:jar:5.4.0:compile [INFO] | | | \- com.microsoft.azure:azure-keyvault-core:jar:0.8.0:compile [INFO] | | \- org.eclipse.jetty:jetty-util-ajax:jar:9.3.19.v20170502:compile [INFO] | \- org.apache.hadoop:hadoop-azure-datalake:jar:3.0.2-SNAPSHOT:compile [INFO] | \- com.microsoft.azure:azure-data-lake-store-sdk:jar:2.2.5:compile ``` the `jackson-dataformat-cbor` is the funny one; This is the sole declaration within spark. With the shaded aws JAR then it's not needed at all. The rest all make their way to the spark assembly through other routes. What do you think? Leave them as the default and not worry about it? It would remove the duplication in the 2.7 profile, and apart from the extraneousness on hadoop-3 builds, harmless.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org