Re: How to Set S3 Credentials at bucket level in Iceberg Spark Session

Pani Dhakshnamurthy Mon, 22 Apr 2024 12:53:23 -0700

Hi Awashi,
  S3A supports setting credentials at the S3 bucket level -

ref:
https://docs.cloudera.com/runtime/7.2.0/cloud-data-access/topics/cr-cda-configuring-per-bucket-settings.html
.


I am not sure if S3FileIO supports this feature.

Thanks
Pani



On Mon, Apr 22, 2024 at 2:01 PM Yufei Gu <flyrain...@gmail.com> wrote:

> Hi Awasthi,
>
> How about configuring two catalogs in Spark? One points to the source
> data, and another points to the target. You can configure different
> credentials in that case.
>
>
> Yufei
>
>
> On Mon, Apr 22, 2024 at 8:49 AM Awasthi, Somesh
> <soawas...@informatica.com.invalid> wrote:
>
>> Hi Jack/Dev Team,
>>
>>
>>
>> We want to pass separate credential for source reading data from s3 and
>> separate credential for target writing data to s3 using glue catalog, but
>> now we are unable to set credential at bucket level and not able get any
>> help from any forum.
>>
>> Could you please check and help me asap or guide me with the right forum
>> to get it resolve.
>>
>>
>>
>> Currently we are following below two approaches to set s3 credentials
>> through code.
>>
>>
>>
>> *Approach1. We are setting s3 credentials through System’s property.*
>>
>>
>>
>> *val* spark = SparkSession.builder().master("local[*]")
>>
>>       .config("spark.sql.defaultCatalog", "AwsDataCatalog")
>>
>>       .config("spark.sql.extensions",
>> "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions")
>>
>>       .config("spark.sql.catalog.AwsDataCatalog",
>> "org.apache.iceberg.spark.SparkCatalog")
>>
>>       .config("spark.sql.catalog.AwsDataCatalog.catalog-impl",
>> "org.apache.iceberg.aws.glue.GlueCatalog")
>>
>>       .config("spark.sql.catalog.AwsDataCatalog.io-impl",
>> "org.apache.iceberg.aws.s3.S3FileIO")
>>
>>       
>> .*config*("spark.sql.catalog.AwsDataCatalog.s3.use-*arn*-region-enabled",
>> "true")
>>
>>       .*config*("spark.sql.catalog.AwsDataCatalog.s3.access-points.xxx",
>> "arn:aws:s3:us-west-2:xxxxx")
>>
>>       .*config*("spark.sql.catalog.AwsDataCatalog.s3.access-points.xxxx",
>> "arn:aws:s3:*ap*-south-1:xxxxx")
>>
>>       .getOrCreate();
>>
>>
>>
>>
>>
>>  System.setProperty("aws.region", "XXXXXXXXXXXX");
>>
>>      System.setProperty("aws.accessKeyId", "XXXXXXXXXXXXXXXXx")
>>
>>     System.setProperty("aws.secretAccessKey", "XXXXXXXXXXXXXXXXXXx")
>>
>>
>>
>> *Approach2. CustomCredentialProvider to set S3 credentials through spark.*
>>
>>
>>
>> *val* spark = SparkSession.builder().master("local[*]")
>>
>>       .config("spark.sql.defaultCatalog", "AwsDataCatalog")
>>
>>       .config("spark.sql.extensions",
>> "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions")
>>
>>       .config("spark.sql.catalog.AwsDataCatalog",
>> "org.apache.iceberg.spark.SparkCatalog")
>>
>>       .config("spark.sql.catalog.AwsDataCatalog.catalog-impl",
>> "org.apache.iceberg.aws.glue.GlueCatalog")
>>
>>       .config("spark.sql.catalog.AwsDataCatalog.io-impl",
>> "org.apache.iceberg.aws.s3.S3FileIO")
>>
>>       .config(
>> "spark.sql.catalog.AwsDataCatalog.client.credentials-provider", "
>> *CustomAwsClientFactory*")
>>
>>       .config("spark.sql.catalog.AwsDataCatalog.client.region", "xxxx")
>>
>>       .config(
>> "spark.sql.catalog.AwsDataCatalog.client.credentials-provider.accessKeyId",
>> "XXXXXXXXXXXXXxxx")
>>
>>       .config(
>> "spark.sql.catalog.AwsDataCatalog.client.credentials-provider.secretAccessKey",
>> "XXXXXXXXXXXXXXXXXXXXx")
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *Problem:- We want to pass separate credential for source reading data
>> from s3 and separate credential for target writing data to s3 using glue
>> catalog.*
>>
>>
>>
>> *Expected Solution:* spark.hadoop.fs.s3a.access.key: <YOURACCESSKEY>
>>
>>                     spark.hadoop.fs.s3a.secret.key: <YOURSECRETKEY>
>>
>> *config*("spark.hadoop.fs.s3a.access.key", "XXXXXXXXXXXXXXxxx")
>>
>> .*config*("spark.hadoop.fs.s3a.secret.key",
>> "XXXXXXXXXXXXXXXXXXXXXXXXXxx")
>>
>>
>>
>>
>>
>>
>>
>> *TLP are consumed:-  **Having **iceberg-spark-runtime-3.5_2.12-1.5.0*
>> * + **iceberg-aws-bundle-1.5.0*
>> * should be enough or not in terms of dependencies. *Currently we are
>> following official website to integrate iceberg spark -
>> https://iceberg.apache.org/docs/nightly/spark-configuration/. Using glue
>> catalog.
>>
>>
>>
>>
>>
>> Could you please help me if it is possible to pass credentials at bucket
>> level or its limitation from iceberg side.
>>
>>
>>
>> Thanks,
>>
>> Somesh.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>

Re: How to Set S3 Credentials at bucket level in Iceberg Spark Session

Reply via email to