[ 
https://issues.apache.org/jira/browse/SPARK-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175664#comment-15175664
 ] 

Steve Loughran commented on SPARK-7481:
---------------------------------------

One issue here that hadoop 2.6's hadoop-aws pulls in the whole AWT toolkit, 
which is pretty weighty, for s3a ... which isn't something I'd use in 2.6 
anyway.

Hadoop 2.7 moved to the (link-time-incompatible) amazon-s3 JAR, also adds 
hadoop-azure with some wasb JAR. And in Hadoop 2.7 onwards,. s3a is the one i 
would run to use in preference to s3n. 

What might work is a hadoop 2.6 profile which explicity adds hadoop-aws, then 
excludes the amazon sdk
{code}
    <dependency>
      <groupId>com.amazonaws</groupId>
      <artifactId>aws-java-sdk</artifactId>
      <scope>compile</scope>
    </dependency>
{code}

This would automatically pick up the {{aws-java-sdk-s3}} JAR on a 2.7+ build, 
because it's not excluded by name. Though then there's fun if you try to add 
the {{aws-java-sdk-s3}} JAR needed for Hadoop 2.6 to the classpath, as it won't 
link. Which makes me think that excluding  {{aws-java-sdk-s3}} would be safer. 
The hadoop code to talk to s3a and s3n would be there, s3n would work as 
well/badly as it always does, and for s3a you'd need to add the right aws JAR 
for your hadoop version

> Add Hadoop 2.6+ profile to pull in object store FS accessors
> ------------------------------------------------------------
>
>                 Key: SPARK-7481
>                 URL: https://issues.apache.org/jira/browse/SPARK-7481
>             Project: Spark
>          Issue Type: Improvement
>          Components: Build
>    Affects Versions: 1.3.1
>            Reporter: Steve Loughran
>
> To keep the s3n classpath right, to add s3a, swift & azure, the dependencies 
> of spark in a 2.6+ profile need to add the relevant object store packages 
> (hadoop-aws, hadoop-openstack, hadoop-azure)
> this adds more stuff to the client bundle, but will mean a single spark 
> package can talk to all of the stores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to