[
https://issues.apache.org/jira/browse/HADOOP-19811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18057692#comment-18057692
]
ASF GitHub Bot commented on HADOOP-19811:
-----------------------------------------
cnauroth commented on PR #8243:
URL: https://github.com/apache/hadoop/pull/8243#issuecomment-3880796276
> what about _not shading these_?
>
> BTW, what's the size of the distribution tar on 3.5.0 as gcs and cos both
bundle a lot...even after stripping out aws bundle I worry we're at risk of
crossing that 1 GB threshold
I'm getting ~500 MB for the distribution tarball. See below for some
breakdown of the heaviest hitters. hadoop-gcp is one of the top ones. Within
hadoop-gcp, the biggest contributors are Protobuf, Guava, gRPC and Netty native
binaries.
From history working on gs:// in the previous Google-owned repo, we've found
that we needed full shading to guarantee compatibility with user workloads. The
biggest challenge is alignment on the Protobuf and Guava version, as the
underlying GCS SDK moves those along at its own pace.
```
# share/hadoop directories, size descending
> du -b -s share/hadoop/* | sort -nr | head
187592533 share/hadoop/yarn
156284851 share/hadoop/common
105883478 share/hadoop/client
83782079 share/hadoop/hdfs
29008259 share/hadoop/mapreduce
22844157 share/hadoop/tools
```
```
# Biggest jar sizes, descending
> for x in $(find . -name '*.jar'); do du -b $x; done | sort -nr | head
55992250
./share/hadoop/client/hadoop-client-minicluster-3.5.0-SNAPSHOT.jar
35393191 ./share/hadoop/common/lib/hadoop-gcp-3.5.0-SNAPSHOT.jar
34558357
./share/hadoop/yarn/timelineservice/lib/hbase-shaded-client-byo-hadoop-2.6.3-hadoop3.jar
30080918 ./share/hadoop/client/hadoop-client-runtime-3.5.0-SNAPSHOT.jar
19810310 ./share/hadoop/client/hadoop-client-api-3.5.0-SNAPSHOT.jar
9204801 ./share/hadoop/tools/lib/kafka-clients-3.9.0.jar
8451859 ./share/hadoop/common/lib/bcprov-jdk18on-1.82.jar
6687870 ./share/hadoop/tools/lib/zstd-jni-1.5.6-4.jar
6557826 ./share/hadoop/hdfs/hadoop-hdfs-3.5.0-SNAPSHOT-tests.jar
6459241 ./share/hadoop/hdfs/hadoop-hdfs-3.5.0-SNAPSHOT.jar
```
```
# Within hadoop-gcp, Protobuf and Guava are the biggest contributors
> du -b -s com/google/cloud/hadoop/repackaged/ossgcs/com/google/* | sort -nr
5792042 com/google/cloud/hadoop/repackaged/ossgcs/com/google/protobuf
5082272 com/google/cloud/hadoop/repackaged/ossgcs/com/google/common
4900043 com/google/cloud/hadoop/repackaged/ossgcs/com/google/cloud
4559447 com/google/cloud/hadoop/repackaged/ossgcs/com/google/api
4302946 com/google/cloud/hadoop/repackaged/ossgcs/com/google/storage
1042388 com/google/cloud/hadoop/repackaged/ossgcs/com/google/monitoring
859338 com/google/cloud/hadoop/repackaged/ossgcs/com/google/auth
727606 com/google/cloud/hadoop/repackaged/ossgcs/com/google/rpc
589946 com/google/cloud/hadoop/repackaged/ossgcs/com/google/gson
499635 com/google/cloud/hadoop/repackaged/ossgcs/com/google/longrunning
451096 com/google/cloud/hadoop/repackaged/ossgcs/com/google/iam
285057 com/google/cloud/hadoop/repackaged/ossgcs/com/google/re2j
85889 com/google/cloud/hadoop/repackaged/ossgcs/com/google/type
```
```
# Also gRPC
> du -b -s com/google/cloud/hadoop/repackaged/ossgcs/io/* | sort -nr
63209125 com/google/cloud/hadoop/repackaged/ossgcs/io/grpc
1857154 com/google/cloud/hadoop/repackaged/ossgcs/io/opentelemetry
948329 com/google/cloud/hadoop/repackaged/ossgcs/io/opencensus
16788 com/google/cloud/hadoop/repackaged/ossgcs/io/perfmark
```
```
# Also the Netty native binaries
> du -b -s META-INF/native/* | sort -nr
2867712
META-INF/native/com_google_cloud_hadoop_repackaged_gcs_io_grpc_netty_shaded_netty_tcnative_windows_x86_64.dll
2684992
META-INF/native/libcom_google_cloud_hadoop_repackaged_gcs_io_grpc_netty_shaded_netty_tcnative_osx_x86_64.jnilib
2684104
META-INF/native/libcom_google_cloud_hadoop_repackaged_gcs_io_grpc_netty_shaded_netty_tcnative_linux_x86_64.so
2437120
META-INF/native/libcom_google_cloud_hadoop_repackaged_gcs_io_grpc_netty_shaded_netty_tcnative_osx_aarch_64.jnilib
2420512
META-INF/native/libcom_google_cloud_hadoop_repackaged_gcs_io_grpc_netty_shaded_netty_tcnative_linux_aarch_64.so
108608
META-INF/native/libcom_google_cloud_hadoop_repackaged_gcs_io_grpc_netty_shaded_netty_transport_native_epoll_aarch_64.so
99422
META-INF/native/libcom_google_cloud_hadoop_repackaged_gcs_io_grpc_netty_shaded_netty_transport_native_epoll_x86_64.so
```
> hadoop-gcp does not relocate shaded OpenTelemetry dependencies
> --------------------------------------------------------------
>
> Key: HADOOP-19811
> URL: https://issues.apache.org/jira/browse/HADOOP-19811
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs/gcs
> Reporter: Chris Nauroth
> Assignee: Chris Nauroth
> Priority: Blocker
> Labels: pull-request-available
>
> The hadoop-gcp sub-module intends to produce a jar with a completely shaded
> and relocated set of internal dependencies. Currently, the OpenTelemetry
> classes are not relocated, which creates a risk of version conflicts with
> downstream projects. Thank you to [~chengpan] for reporting this.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]