netapp-acheng opened a new issue, #3617:
URL: https://github.com/apache/polaris/issues/3617

   ### Describe the bug
   
   I’m opening this issue as requested from  in 
https://github.com/apache/polaris/issues/3440.
   This report documents:
   
   The configuration that worked for me with pr-3445 fix (data writes succeeded 
using vended credentials from Polaris without setting a region).
   The configuration that failed previously, where Spark tries to write table 
data using my static AWS access key instead of the vended temporary credentials 
returned by Polaris.
   
   Polaris developers asked specifically to understand “why it wasn’t working 
before”, so this issue contains both sets of configs and the observable 
behavior.
   
   Environment
   
   Spark: 3.5.x
   Iceberg: 1.10.0
   
   iceberg-spark-runtime-3.5_2.12:1.10.0
   iceberg-aws-bundle:1.10.0
   
   Hadoop AWS: 3.4.0
   AWS SDK v2 bundle: 2.23.19
   Polaris: local server at http://localhost:8181
   Object storage: S3-compatible
   
   endpoint = https://sgdemo.example.com
   STS enabled, same endpoint
   
   
   1. The configuration that worked with pr-3445 fix
   (with no region, with STS configuration, and vended credentials enabled)
   Using this spark-shell config, I was able to successfully:
   
   SHOW CATALOGS
   CREATE TABLE
   INSERT INTO (data files successfully written using vended credentials)
   
   Working spark-shell config
   ${SPARK_HOME}/bin/spark-shell \
     --packages 
org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.10.0,org.apache.iceberg:iceberg-aws-bundle:1.10.0,org.apache.hadoop:hadoop-aws:3.4.0,software.amazon.awssdk:bundle:2.23.19
 \
     --conf 
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
 \
     --conf spark.sql.catalog.sts_spark1=org.apache.iceberg.spark.SparkCatalog \
     --conf spark.sql.catalog.sts_spark1.type=rest \
     --conf spark.sql.catalog.sts_spark1.uri=http://localhost:8181/api/catalog \
     --conf spark.sql.catalog.sts_spark1.warehouse=sts_spark1 \
     --conf spark.sql.catalog.sts_spark1.rest.auth.type=OAUTH2 \
     --conf 
spark.sql.catalog.sts_spark1.oauth2-server-uri=http://localhost:8181/api/catalog/v1/oauth/tokens
 \
     --conf spark.sql.catalog.sts_spark1.credential=root:<password> \
     --conf spark.sql.catalog.sts_spark1.scope=PRINCIPAL_ROLE:ALL \
     --conf spark.sql.catalog.sts_spark1.header.Polaris-Realm=POLARIS \
     --conf 
spark.sql.catalog.sts_spark1.header.X-Iceberg-Access-Delegation=vended-credentials
 \
     --conf spark.sql.catalog.sts_spark1.token-refresh-enabled=true \
     --conf spark.hadoop.fs.s3a.endpoint=https://sgdemo.example.com \
     --conf spark.hadoop.fs.s3a.path.style.access=false
   Observed behavior (working version)
   
   Data files were written with vended temporary credentials from Polaris
   No region was provided
   Insert queries completed successfully
   
   2. The configuration that failed (same catalog,  without region and with STS 
configuration)
   With the following config, Spark:
   
   Successfully creates the table
   Successfully writes metadata files
   Fails when writing data files to Object Storage
   The failed PUT uses my static AWS access key, not vended credentials
   The S3 PUT request denied by Object storage system. 
   
   Failing spark-shell config
   ${SPARK_HOME}/bin/spark-shell \
     --packages 
org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.10.0,org.apache.iceberg:iceberg-aws-bundle:1.10.0,org.apache.hadoop:hadoop-aws:3.4.0,software.amazon.awssdk:bundle:2.23.19
 \
     --conf 
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
 \
     --conf spark.sql.catalog.sts_spark1=org.apache.iceberg.spark.SparkCatalog \
     --conf spark.sql.catalog.sts_spark1.type=rest \
     --conf spark.sql.catalog.sts_spark1.uri=http://localhost:8181/api/catalog \
     --conf spark.sql.catalog.sts_spark1.catalog=sts_spark1 \
     --conf spark.sql.catalog.sts_spark1.warehouse=sts_spark1 \
     --conf spark.sql.catalog.sts_spark1.rest.auth.type=OAUTH2 \
     --conf 
spark.sql.catalog.sts_spark1.oauth2-server-uri=http://localhost:8181/api/catalog/v1/oauth/tokens
 \
     --conf spark.sql.catalog.sts_spark1.credential=root:<password> \
     --conf spark.sql.catalog.sts_spark1.scope=PRINCIPAL_ROLE:ALL \
     --conf spark.sql.catalog.sts_spark1.header.Polaris-Realm=POLARIS \
     --conf spark.hadoop.fs.s3a.endpoint=https://sgdemo.example.com \
     --conf spark.hadoop.fs.s3a.path.style.access=false \
     --conf 
spark.sql.catalog.sts_spark1.io-impl=org.apache.iceberg.hadoop.HadoopFileIO \
     --conf spark.sql.catalog.sts_spark1.warehouse=s3a://sts-spark1/ \
     --conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem \
     --conf 
spark.hadoop.fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.auth.AssumedRoleCredentialProvider
 \
     --conf spark.hadoop.fs.s3a.assumed.role.arn=${AWS_ROLE_ARN} \
     --conf 
spark.hadoop.fs.s3a.assumed.role.sts.endpoint=https://sgdemo.example.com \
     --conf spark.hadoop.fs.s3a.assumed.role.session.name=polaris-spark-test \
     --conf spark.hadoop.fs.s3a.assumed.role.session.duration=3600 \
     --conf 
spark.hadoop.fs.s3a.assumed.role.credentials.provider=org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
 \
     --conf spark.hadoop.fs.s3a.access.key=${AWS_ACCESS_KEY_ID} \
     --conf spark.hadoop.fs.s3a.secret.key=${AWS_SECRET_ACCESS_KEY}
   Observed behavior (failing version)
   Spark executor error:
   java.io.UncheckedIOException: Failed to close current writer
     at org.apache.iceberg.io.RollingFileWriter.closeCurrentWriter(...)
   
   StorageGRID access logs:
   200 PUT /ns1/table1/metadata/...metadata.json
   403 PUT /ns1/table1/data/...parquet
   
   The metadata PUT succeeds, but the data PUT returns 403.
   Critical observation for Polaris:
   The failing data PUT is signed using my static access key without a session 
token.
   This means delegated access / vended credentials are not being used during 
data writes with this configuration.
   
   ### To Reproduce
   
   1. Create sts_spark1 catalog with these configuration
      catalog: class PolarisCatalog {
           class Catalog {
               type: INTERNAL
               name: sts_spark1
               properties: class CatalogProperties {
                   {}
                   defaultBaseLocation: s3://sts-spark1
               }
               createTimestamp: null
               lastUpdateTimestamp: null
               entityVersion: null
               storageConfigInfo: class AwsStorageConfigInfo {
                   class StorageConfigInfo {
                       storageType: S3
                       allowedLocations: [s3://sts-spark1]
                   }
                   roleArn: arn:aws:iam::123456789101112:role/assumerole
                   externalId: null
                   userArn: null
                   currentKmsKey: null
                   allowedKmsKeys: []
                   region: null
                   endpoint: https://sgdemo.example.com
                   stsEndpoint: https://sgdemo.example.com
                   stsUnavailable: false
                   endpointInternal: null
                   pathStyleAccess: false
               }
           }
       }
   2. Start spark-shell with the failing conf 
   3. Runs these SQL to create namespace and table.  Both complete 
successfully. 
    SQL: CREATE NAMESPACE IF NOT EXISTS sts_spark1.ns1; CREATE TABLE 
sts_spark1.ns1.table1 (  id INT,  data STRING) USING iceberg;
   Insert data to the table fails. 
   INSERT INTO sts_spark1.ns1.table1 VALUES  (1, 'alpha'),  (2, 'beta'),  (3, 
'gamma');
   This reliably reproduces the 403 on the data file write.
   
   
   ### Actual Behavior
   
   _No response_
   
   ### Expected Behavior
   
   _No response_
   
   ### Additional context
   
   _No response_
   
   ### System information
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to