I don't really understand how Iceberg and the hadoop libraries can coexist in a
deployment.
The latest spark (3.5.1) base image contains the hadoop-client*-3.3.4.jar. The
AWS v2 SDK is only supported in hadoop*-3.4.0.jar and onward.
Iceberg AWS integration states AWS v2 SDK is
Swapping out the iceberg-aws-bundle for the very latest aws provided sdk
('software.amazon.awssdk:bundle:2.25.23') produces an incompatibility from a
slightly different code path:
java.lang.NoSuchMethodError: 'void
Hi everyone,
As part of The ASF’s 25th anniversary campaign[1], we will be celebrating
projects and communities in multiple ways.
We invite all projects and contributors to participate in the following
ways:
* Individuals - submit your first contribution:
[sorry; replying all this time]
With hadoop-*-3.3.6 in place of the 3.4.0 below I get
java.lang.NoClassDefFoundError: com/amazonaws/AmazonClientException
I think that the below iceberg-aws-bundle version supplies the v2 sdk.
Dan
From: Aaron Grubb
Sent: 03
Downgrade to hadoop-*:3.3.x, Hadoop 3.4.x is based on the AWS SDK v2 and should
probably be considered as breaking for tools that build on < 3.4.0 while using
AWS.
From: Oxlade, Dan
Sent: Wednesday, April 3, 2024 2:41:11 PM
To: user@spark.apache.org
Subject:
Hi all,
I've struggled with this for quite some time.
My requirement is to read a parquet file from s3 to a Dataframe then append to
an existing iceberg table.
In order to read the parquet I need the hadoop-aws dependency for s3a:// . In
order to write to iceberg I need the iceberg dependency.