[ 
https://issues.apache.org/jira/browse/HADOOP-19083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17955560#comment-17955560
 ] 

Yaniv Kunda commented on HADOOP-19083:
--------------------------------------

[~ste...@apache.org] I understand the incentive to remove it from the tar is 
related to size limits, but was it considered to also separate the maven 
dependency on AWS' sdk bundle?
Using a bundle is very convenient in a hadoop deployment, but for users using 
hadoop as a library (including many open-source project, e.g. spark) this 
incurs the deployment size "penalty" on their size, plus it almost always 
causes dependency issues, since all the 3rd party libraries AWS includes in the 
bundle are shaded without being relocated.
Workaround forces these to exclude bundle from hadoop, and manually the missing 
dependencies - which is error-prone as these are usually required in runtime.

It should be pretty easy to replace hadoop's dependency on bundle with the 
necessary minimum (probably `s3-transfer-manager` et al.) - making:
1) Maven take care of transitive dependencies when hadoop is used as a library
2) Instruct the users to download bundle when deploying hadoop as a service

If it makes sense, I can open a separate ticket and work on it.


> provide hadoop binary tarball without aws v2 sdk
> ------------------------------------------------
>
>                 Key: HADOOP-19083
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19083
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: build, fs/s3
>    Affects Versions: 3.4.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.4.1
>
>
> Have the default hadoop binary .tar.gz exclude the aws v2 sdk by default. 
> This SDK brings the total size of the distribution to about 1 GB.
> Proposed
> * add a profile to include the aws sdk in the dist module
> * document it for local building
> * for release builds, we modify our release ant builds to generate modified 
> x86 and arm64 releases without the file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to