steveloughran commented on code in PR #8094: URL: https://github.com/apache/hadoop/pull/8094#discussion_r2543140022
########## LICENSE-binary: ########## @@ -536,3 +549,8 @@ Public Domain ------------- aopalliance:aopalliance:1.0 + Review Comment: cut ########## hadoop-cloud-storage-project/hadoop-cloud-storage-dist/pom.xml: ########## @@ -0,0 +1,281 @@ +<?xml version="1.0" encoding="UTF-8"?> +<!-- + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + https://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. See accompanying LICENSE file. +--> +<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd"> + <modelVersion>4.0.0</modelVersion> + <parent> + <groupId>org.apache.hadoop</groupId> + <artifactId>hadoop-project</artifactId> + <version>3.4.3-SNAPSHOT</version> + <relativePath>../../hadoop-project</relativePath> + </parent> + <artifactId>hadoop-cloud-storage-dist</artifactId> + <version>3.4.3-SNAPSHOT</version> + <packaging>jar</packaging> + + <description>Apache Hadoop Cloud Storage Distribution</description> + <name>Apache Hadoop Cloud Storage Distribution</name> + + <!-- + This pulls in all the artifacts to copy into common/lib and so put into + the Hadoop distro and onto the classpath. + + The assembly file /hadoop-assemblies/src/main/resources/assemblies/hadoop-cloud-storage.xml + is processed to define the layout and to add extra files alongside + the Jars. + + By default, while hadoop-* artifacts are all included, dependencies + are omitted for all cloud connectors except hadoop-azure and + possibly hadoop-gcp and hadoop-tos modules. + For hadoop-aws the AWS SDK bundle.jar omitted, but everything else is included. + + * This keeps binary release size below the limit of apache distributions + * Reduces download and size overhead in docker usage. + * Reduces the CVE attack surface + * Reduces the risk of classpath conflict. + + To produce a build with the specific desired dependencies, the build must be executed + with the relevant profile of ${module}-package. + + For example, a build with the hadoop-aws and hadoop-azure-datalake dependencies, + build with -Dhadoop-aws-package -Dhadoop-azure-datalake-package + + Available package profiles: + hadoop-aws-package Review Comment: restore hadoop-aliyun-package docs ########## BUILDING.txt: ########## @@ -385,6 +385,49 @@ Create a local staging version of the website (in /tmp/hadoop-site) Note that the site needs to be built in a second pass after other artifacts. +---------------------------------------------------------------------------------- +Including Cloud Connector Dependencies in Distributions: + +Hadoop distributions include the hadoop modules needed to work with data and services +on cloud infrastructure + +However, dependencies are omitted for all cloud connectors except hadoop-azure +(abfs:// and wasb://) and possibly hadoop-gcp (gs://) and hadoop-tos (tos://). +For the latter two modules, it depends on shading options. + +For hadoop-aws the AWS SDK bundle.jar is omitted, but everything else is included. + +Excluding the extra binaries: +* Keeps release artifact size below the limit of the ASF distribution network. +* Reduces download and size overhead in docker usage. +* Reduces the CVE attack surface and audit-related complaints about those same CVEs. +* Reduces the risk of classpath conflict. + +To produce a build with the specific desired dependencies, the build must be executed +with the relevant profile of ${module}-package alongside the -Pdist profile. + +For example, a build with the hadoop-aws and hadoop-azure-datalake dependencies, +run with + + mvn package -Pdist -DskipTests -Dhadoop-aws-package -Dhadoop-azure-datalake-package + +Available package profiles: + hadoop-aws-package Review Comment: restore hadoop-aliyun-package -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
