[ https://issues.apache.org/jira/browse/HADOOP-19083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17955560#comment-17955560 ]
Yaniv Kunda commented on HADOOP-19083: -------------------------------------- [~ste...@apache.org] I understand the incentive to remove it from the tar is related to size limits, but was it considered to also separate the maven dependency on AWS' sdk bundle? Using a bundle is very convenient in a hadoop deployment, but for users using hadoop as a library (including many open-source project, e.g. spark) this incurs the deployment size "penalty" on their size, plus it almost always causes dependency issues, since all the 3rd party libraries AWS includes in the bundle are shaded without being relocated. Workaround forces these to exclude bundle from hadoop, and manually the missing dependencies - which is error-prone as these are usually required in runtime. It should be pretty easy to replace hadoop's dependency on bundle with the necessary minimum (probably `s3-transfer-manager` et al.) - making: 1) Maven take care of transitive dependencies when hadoop is used as a library 2) Instruct the users to download bundle when deploying hadoop as a service If it makes sense, I can open a separate ticket and work on it. > provide hadoop binary tarball without aws v2 sdk > ------------------------------------------------ > > Key: HADOOP-19083 > URL: https://issues.apache.org/jira/browse/HADOOP-19083 > Project: Hadoop Common > Issue Type: Sub-task > Components: build, fs/s3 > Affects Versions: 3.4.0 > Reporter: Steve Loughran > Assignee: Steve Loughran > Priority: Major > Labels: pull-request-available > Fix For: 3.4.1 > > > Have the default hadoop binary .tar.gz exclude the aws v2 sdk by default. > This SDK brings the total size of the distribution to about 1 GB. > Proposed > * add a profile to include the aws sdk in the dist module > * document it for local building > * for release builds, we modify our release ant builds to generate modified > x86 and arm64 releases without the file. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org