Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20923#discussion_r178060744
  
    --- Diff: hadoop-cloud/pom.xml ---
    @@ -141,13 +93,98 @@
           <artifactId>httpcore</artifactId>
           <scope>${hadoop.deps.scope}</scope>
         </dependency>
    +
       </dependencies>
     
       <profiles>
     
    +    <!-- this inner profile is the default one and includes openstack and 
aws -->
    +    <profile>
    +      <id>hadoop-2.6</id>
    +      <activation>
    +        <activeByDefault>true</activeByDefault>
    --- End diff --
    
    Hmmm. There's another option which is to leave all those in the standard 
list, and you get a few extra dependencies which aren't needed for the 3.x line:
    
    ```
    [INFO] +- com.fasterxml.jackson.core:jackson-databind:jar:2.6.7.1:compile   
          *
    [INFO] |  \- com.fasterxml.jackson.core:jackson-core:jar:2.6.7:compile      
          *
    [INFO] +- com.fasterxml.jackson.core:jackson-annotations:jar:2.6.7:compile  
          *
    [INFO] +- 
com.fasterxml.jackson.dataformat:jackson-dataformat-cbor:jar:2.6.7:compile  *
    [INFO] +- org.apache.httpcomponents:httpclient:jar:4.5.4:compile
    [INFO] |  +- commons-logging:commons-logging:jar:1.2:compile
    [INFO] |  \- commons-codec:commons-codec:jar:1.10:compile
    [INFO] +- org.apache.httpcomponents:httpcore:jar:4.4.8:compile
    [INFO] +- org.apache.hadoop:hadoop-aws:jar:3.0.2-SNAPSHOT:compile
    [INFO] |  \- com.amazonaws:aws-java-sdk-bundle:jar:1.11.271:compile
    [INFO] +- org.apache.hadoop:hadoop-openstack:jar:3.0.2-SNAPSHOT:compile
    [INFO] +- joda-time:joda-time:jar:2.9.3:compile                             
          *
    [INFO] +- org.apache.hadoop:hadoop-cloud-storage:jar:3.0.2-SNAPSHOT:compile
    [INFO] |  +- org.apache.hadoop:hadoop-aliyun:jar:3.0.2-SNAPSHOT:compile
    [INFO] |  |  \- com.aliyun.oss:aliyun-sdk-oss:jar:2.8.3:compile
    [INFO] |  |     \- org.jdom:jdom:jar:1.1:compile
    [INFO] |  +- org.apache.hadoop:hadoop-azure:jar:3.0.2-SNAPSHOT:compile
    [INFO] |  |  +- com.microsoft.azure:azure-storage:jar:5.4.0:compile
    [INFO] |  |  |  \- com.microsoft.azure:azure-keyvault-core:jar:0.8.0:compile
    [INFO] |  |  \- 
org.eclipse.jetty:jetty-util-ajax:jar:9.3.19.v20170502:compile
    [INFO] |  \- 
org.apache.hadoop:hadoop-azure-datalake:jar:3.0.2-SNAPSHOT:compile
    [INFO] |     \- 
com.microsoft.azure:azure-data-lake-store-sdk:jar:2.2.5:compile
    ```
    
    the `jackson-dataformat-cbor` is the funny one; This is the sole 
declaration within spark. With the shaded aws JAR then it's not needed at all.
    The rest all make their way to the spark assembly through other routes.
    
    What do you think? Leave them as the default and not worry about it? It 
would remove the duplication in the 2.7 profile, and apart from the 
extraneousness on hadoop-3 builds, harmless.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to