[GitHub] spark pull request #20923: [SPARK-23807][BUILD][WIP] Add Hadoop 3 profile wi...

steveloughran Thu, 29 Mar 2018 06:47:45 -0700

Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20923#discussion_r178060744
  
    --- Diff: hadoop-cloud/pom.xml ---
    @@ -141,13 +93,98 @@
           <artifactId>httpcore</artifactId>
           <scope>${hadoop.deps.scope}</scope>
         </dependency>
    +
       </dependencies>
     
       <profiles>
     
    +    <!-- this inner profile is the default one and includes openstack and 
aws -->
    +    <profile>
    +      <id>hadoop-2.6</id>
    +      <activation>
    +        <activeByDefault>true</activeByDefault>
    --- End diff --
    
    Hmmm. There's another option which is to leave all those in the standard 
list, and you get a few extra dependencies which aren't needed for the 3.x line:
    
    ```
    [INFO] +- com.fasterxml.jackson.core:jackson-databind:jar:2.6.7.1:compile   
          *
    [INFO] |  \- com.fasterxml.jackson.core:jackson-core:jar:2.6.7:compile      
          *
    [INFO] +- com.fasterxml.jackson.core:jackson-annotations:jar:2.6.7:compile  
          *
    [INFO] +- 
com.fasterxml.jackson.dataformat:jackson-dataformat-cbor:jar:2.6.7:compile  *
    [INFO] +- org.apache.httpcomponents:httpclient:jar:4.5.4:compile
    [INFO] |  +- commons-logging:commons-logging:jar:1.2:compile
    [INFO] |  \- commons-codec:commons-codec:jar:1.10:compile
    [INFO] +- org.apache.httpcomponents:httpcore:jar:4.4.8:compile
    [INFO] +- org.apache.hadoop:hadoop-aws:jar:3.0.2-SNAPSHOT:compile
    [INFO] |  \- com.amazonaws:aws-java-sdk-bundle:jar:1.11.271:compile
    [INFO] +- org.apache.hadoop:hadoop-openstack:jar:3.0.2-SNAPSHOT:compile
    [INFO] +- joda-time:joda-time:jar:2.9.3:compile                             
          *
    [INFO] +- org.apache.hadoop:hadoop-cloud-storage:jar:3.0.2-SNAPSHOT:compile
    [INFO] |  +- org.apache.hadoop:hadoop-aliyun:jar:3.0.2-SNAPSHOT:compile
    [INFO] |  |  \- com.aliyun.oss:aliyun-sdk-oss:jar:2.8.3:compile
    [INFO] |  |     \- org.jdom:jdom:jar:1.1:compile
    [INFO] |  +- org.apache.hadoop:hadoop-azure:jar:3.0.2-SNAPSHOT:compile
    [INFO] |  |  +- com.microsoft.azure:azure-storage:jar:5.4.0:compile
    [INFO] |  |  |  \- com.microsoft.azure:azure-keyvault-core:jar:0.8.0:compile
    [INFO] |  |  \- 
org.eclipse.jetty:jetty-util-ajax:jar:9.3.19.v20170502:compile
    [INFO] |  \- 
org.apache.hadoop:hadoop-azure-datalake:jar:3.0.2-SNAPSHOT:compile
    [INFO] |     \- 
com.microsoft.azure:azure-data-lake-store-sdk:jar:2.2.5:compile
    ```
    
    the `jackson-dataformat-cbor` is the funny one; This is the sole 
declaration within spark. With the shaded aws JAR then it's not needed at all.
    The rest all make their way to the spark assembly through other routes.
    
    What do you think? Leave them as the default and not worry about it? It 
would remove the duplication in the 2.7 profile, and apart from the 
extraneousness on hadoop-3 builds, harmless.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20923: [SPARK-23807][BUILD][WIP] Add Hadoop 3 profile wi...

Reply via email to