Hi Awashi, S3A supports setting credentials at the S3 bucket level - ref: https://docs.cloudera.com/runtime/7.2.0/cloud-data-access/topics/cr-cda-configuring-per-bucket-settings.html .
I am not sure if S3FileIO supports this feature. Thanks Pani On Mon, Apr 22, 2024 at 2:01 PM Yufei Gu <flyrain...@gmail.com> wrote: > Hi Awasthi, > > How about configuring two catalogs in Spark? One points to the source > data, and another points to the target. You can configure different > credentials in that case. > > > Yufei > > > On Mon, Apr 22, 2024 at 8:49 AM Awasthi, Somesh > <soawas...@informatica.com.invalid> wrote: > >> Hi Jack/Dev Team, >> >> >> >> We want to pass separate credential for source reading data from s3 and >> separate credential for target writing data to s3 using glue catalog, but >> now we are unable to set credential at bucket level and not able get any >> help from any forum. >> >> Could you please check and help me asap or guide me with the right forum >> to get it resolve. >> >> >> >> Currently we are following below two approaches to set s3 credentials >> through code. >> >> >> >> *Approach1. We are setting s3 credentials through System’s property.* >> >> >> >> *val* spark = SparkSession.builder().master("local[*]") >> >> .config("spark.sql.defaultCatalog", "AwsDataCatalog") >> >> .config("spark.sql.extensions", >> "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") >> >> .config("spark.sql.catalog.AwsDataCatalog", >> "org.apache.iceberg.spark.SparkCatalog") >> >> .config("spark.sql.catalog.AwsDataCatalog.catalog-impl", >> "org.apache.iceberg.aws.glue.GlueCatalog") >> >> .config("spark.sql.catalog.AwsDataCatalog.io-impl", >> "org.apache.iceberg.aws.s3.S3FileIO") >> >> >> .*config*("spark.sql.catalog.AwsDataCatalog.s3.use-*arn*-region-enabled", >> "true") >> >> .*config*("spark.sql.catalog.AwsDataCatalog.s3.access-points.xxx", >> "arn:aws:s3:us-west-2:xxxxx") >> >> .*config*("spark.sql.catalog.AwsDataCatalog.s3.access-points.xxxx", >> "arn:aws:s3:*ap*-south-1:xxxxx") >> >> .getOrCreate(); >> >> >> >> >> >> System.setProperty("aws.region", "XXXXXXXXXXXX"); >> >> System.setProperty("aws.accessKeyId", "XXXXXXXXXXXXXXXXx") >> >> System.setProperty("aws.secretAccessKey", "XXXXXXXXXXXXXXXXXXx") >> >> >> >> *Approach2. CustomCredentialProvider to set S3 credentials through spark.* >> >> >> >> *val* spark = SparkSession.builder().master("local[*]") >> >> .config("spark.sql.defaultCatalog", "AwsDataCatalog") >> >> .config("spark.sql.extensions", >> "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") >> >> .config("spark.sql.catalog.AwsDataCatalog", >> "org.apache.iceberg.spark.SparkCatalog") >> >> .config("spark.sql.catalog.AwsDataCatalog.catalog-impl", >> "org.apache.iceberg.aws.glue.GlueCatalog") >> >> .config("spark.sql.catalog.AwsDataCatalog.io-impl", >> "org.apache.iceberg.aws.s3.S3FileIO") >> >> .config( >> "spark.sql.catalog.AwsDataCatalog.client.credentials-provider", " >> *CustomAwsClientFactory*") >> >> .config("spark.sql.catalog.AwsDataCatalog.client.region", "xxxx") >> >> .config( >> "spark.sql.catalog.AwsDataCatalog.client.credentials-provider.accessKeyId", >> "XXXXXXXXXXXXXxxx") >> >> .config( >> "spark.sql.catalog.AwsDataCatalog.client.credentials-provider.secretAccessKey", >> "XXXXXXXXXXXXXXXXXXXXx") >> >> >> >> >> >> >> >> >> >> >> >> *Problem:- We want to pass separate credential for source reading data >> from s3 and separate credential for target writing data to s3 using glue >> catalog.* >> >> >> >> *Expected Solution:* spark.hadoop.fs.s3a.access.key: <YOURACCESSKEY> >> >> spark.hadoop.fs.s3a.secret.key: <YOURSECRETKEY> >> >> *config*("spark.hadoop.fs.s3a.access.key", "XXXXXXXXXXXXXXxxx") >> >> .*config*("spark.hadoop.fs.s3a.secret.key", >> "XXXXXXXXXXXXXXXXXXXXXXXXXxx") >> >> >> >> >> >> >> >> *TLP are consumed:- **Having **iceberg-spark-runtime-3.5_2.12-1.5.0* >> * + **iceberg-aws-bundle-1.5.0* >> * should be enough or not in terms of dependencies. *Currently we are >> following official website to integrate iceberg spark - >> https://iceberg.apache.org/docs/nightly/spark-configuration/. Using glue >> catalog. >> >> >> >> >> >> Could you please help me if it is possible to pass credentials at bucket >> level or its limitation from iceberg side. >> >> >> >> Thanks, >> >> Somesh. >> >> >> >> >> >> >> >> >> >> >> >