lastranget opened a new issue, #2207:
URL: https://github.com/apache/polaris/issues/2207

   ### Describe the bug
   
   I'm trying to set up a polaris catalog that points to our on-premises Pure 
FlashBlade s3 storage instance.
   
   I'm getting the following error when I try to create a table via spark sql 
shell:
   `org.apache.iceberg.exceptions.RESTException: Unable to process: Failed to 
get subscoped credentials: (Service: Sts, Status Code: 400, Request ID: null) 
(SDK Attempt Count: 1)
   `
   This is similar to the errors reported in #1146 , but those errors have 
additional details as to the actual issue after "subscoped credentials:", 
whereas in this case, the error message is left incomplete.
   
   I believe from https://github.com/apache/polaris/pull/1913 that external s3 
providers should be supported currently.
   
   The full stack trace is as follows:
   
   `spark-sql ()> CREATE NAMESPACE ICE_NS;
   Time taken: 1.11 seconds
   spark-sql ()> USE NAMESPACE ICE_NS;
   Time taken: 0.071 seconds
   spark-sql (ICE_NS)> CREATE TABLE PEOPLE (id int, name string) USING iceberg;
   25/07/29 17:57:42 ERROR SparkSQLDriver: Failed in [CREATE TABLE PEOPLE (id 
int, name string) USING iceberg]
   org.apache.iceberg.exceptions.RESTException: Unable to process: Failed to 
get subscoped credentials: (Service: Sts, Status Code: 400, Request ID: null) 
(SDK Attempt Count: 1)
        at 
org.apache.iceberg.rest.ErrorHandlers$DefaultErrorHandler.accept(ErrorHandlers.java:248)
        at 
org.apache.iceberg.rest.ErrorHandlers$TableErrorHandler.accept(ErrorHandlers.java:123)
        at 
org.apache.iceberg.rest.ErrorHandlers$TableErrorHandler.accept(ErrorHandlers.java:107)
        at org.apache.iceberg.rest.HTTPClient.throwFailure(HTTPClient.java:215)
        at org.apache.iceberg.rest.HTTPClient.execute(HTTPClient.java:299)
        at org.apache.iceberg.rest.BaseHTTPClient.post(BaseHTTPClient.java:88)
        at 
org.apache.iceberg.rest.RESTSessionCatalog$Builder.create(RESTSessionCatalog.java:771)
        at 
org.apache.iceberg.CachingCatalog$CachingTableBuilder.lambda$create$0(CachingCatalog.java:264)
        at 
org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.BoundedLocalCache.lambda$doComputeIfAbsent$14(BoundedLocalCache.java:2406)
        at java.base/java.util.concurrent.ConcurrentHashMap.compute(Unknown 
Source)
        at 
org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.BoundedLocalCache.doComputeIfAbsent(BoundedLocalCache.java:2404)
        at 
org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.BoundedLocalCache.computeIfAbsent(BoundedLocalCache.java:2387)
        at 
org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.LocalCache.computeIfAbsent(LocalCache.java:108)
        at 
org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.LocalManualCache.get(LocalManualCache.java:62)
        at 
org.apache.iceberg.CachingCatalog$CachingTableBuilder.create(CachingCatalog.java:260)
        at 
org.apache.iceberg.spark.SparkCatalog.createTable(SparkCatalog.java:246)
        at 
org.apache.polaris.spark.SparkCatalog.createTable(SparkCatalog.java:153)
        at 
org.apache.spark.sql.connector.catalog.TableCatalog.createTable(TableCatalog.java:223)
        at 
org.apache.spark.sql.execution.datasources.v2.CreateTableExec.run(CreateTableExec.scala:44)
        at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
        at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
        at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
        at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:107)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:125)
        at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:108)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
        at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:66)
        at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:107)
        at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:461)
        at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:76)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:461)
        at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:32)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
        at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32)
        at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:437)
        at 
org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:98)
        at 
org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:85)
        at 
org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:83)
        at org.apache.spark.sql.Dataset.<init>(Dataset.scala:220)
        at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
        at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97)
        at 
org.apache.spark.sql.SparkSession.$anonfun$sql$4(SparkSession.scala:691)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:682)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:713)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:744)
        at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:651)
        at 
org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:68)
        at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:501)
        at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:619)
        at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1$adapted(SparkSQLCLIDriver.scala:613)
        at scala.collection.Iterator.foreach(Iterator.scala:943)
        at scala.collection.Iterator.foreach$(Iterator.scala:943)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
        at scala.collection.IterableLike.foreach(IterableLike.scala:74)
        at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
        at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
        at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processLine(SparkSQLCLIDriver.scala:613)
        at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:310)
        at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown 
Source)
        at java.base/java.lang.reflect.Method.invoke(Unknown Source)
        at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1034)
        at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:199)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:222)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
        at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1125)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1134)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   `
   
   ### To Reproduce
   
   I have a docker compose file to initialize the polaris server:
   
   `services:
   
     polaris:
       image: apache/polaris:latest
       platform: linux/amd64
       ports:
         - "8181:8181"
         - "8182:8182"
       environment:
         AWS_ACCESS_KEY_ID: <pure-s3-access-key>
         AWS_SECRET_ACCESS_KEY: <pure-s3-secret-key>
         AWS_REGION: us-east-2
         AWS_ENDPOINT_URL_S3: <pure-s3-endpoint-url>
         AWS_ENDPOINT_URL_STS: <same pure s3 endpoint url as immediately above>
         POLARIS_BOOTSTRAP_CREDENTIALS: default-realm,root,secret
         # polaris.features."SUPPORTED_CATALOG_STORAGE_TYPES": 
"[\"FILE\",\"S3\",\"GCS\",\"AZURE\"]"
         polaris.features.DROP_WITH_PURGE_ENABLED: true # allow dropping tables 
from the SQL client
         polaris.realm-context.realms: default-realm
         polaris.features."SKIP_CREDENTIAL_SUBSCOPING_INDIRECTION": false
         polaris.features."SUPPORTED_CATALOG_STORAGE_TYPES": "[\"S3\"]" 
   
   `
   FYI, I get the same malformed error without the skip credential subscoping 
indirection = false, and I get the error when I set the docker image version to 
1.0.1-incubating-rc0 manually (I've also tried this on 1.0.0-incubating)
   
   I then have two scripts to initialize the polaris server:
   
   `ACCESS_TOKEN=$(curl -X POST \
     http://localhost:8181/api/catalog/v1/oauth/tokens \
     -d 
'grant_type=client_credentials&client_id=root&client_secret=secret&scope=PRINCIPAL_ROLE:ALL'
 \
     | jq -r '.access_token')
   
   curl -i -X POST \
     -H "Authorization: Bearer $ACCESS_TOKEN" \
     http://localhost:8181/api/management/v1/catalogs \
     -H "Content-Type: application/json" \
     --data '{
       "name": "polariscatalog",
       "type": "INTERNAL",
       "properties": {
         "default-base-location": "s3://polaris-txl25-1",
         "s3.endpoint": "<pure-s3-endpoint>",
         "s3.path-style-access": "true",
         "s3.access-key-id": "<pure-s3-access-key>",
         "s3.secret-access-key": "<pure-s3-secret-key>",
         "s3.region": "us-east-2"
       },
       "storageConfigInfo": {
         "roleArn": "arn:aws:iam::000000000000:role/dummy-polaris-role",
         "storageType": "S3",
         "allowedLocations": [
           "s3://polaris-txl25-1/*"
         ]
       }
     }'
   `
   
   and
   
   `ACCESS_TOKEN=$(curl -X POST \
     http://localhost:8181/api/catalog/v1/oauth/tokens \
     -d 
'grant_type=client_credentials&client_id=root&client_secret=secret&scope=PRINCIPAL_ROLE:ALL'
 \
     | jq -r '.access_token')
   
   # Create a catalog admin role
   curl -X PUT 
http://localhost:8181/api/management/v1/catalogs/polariscatalog/catalog-roles/catalog_admin/grants
 \
     -H "Authorization: Bearer $ACCESS_TOKEN" \
     -H "Content-Type: application/json" \
     --data '{"grant":{"type":"catalog", "privilege":"CATALOG_MANAGE_CONTENT"}}'
   
   # Create a data engineer role
   curl -X POST http://localhost:8181/api/management/v1/principal-roles \
     -H "Authorization: Bearer $ACCESS_TOKEN" \
     -H "Content-Type: application/json" \
     --data '{"principalRole":{"name":"data_engineer"}}'
   
   # Connect the roles
   curl -X PUT 
http://localhost:8181/api/management/v1/principal-roles/data_engineer/catalog-roles/polariscatalog
 \
     -H "Authorization: Bearer $ACCESS_TOKEN" \
     -H "Content-Type: application/json" \
     --data '{"catalogRole":{"name":"catalog_admin"}}'
   
   # Give root the data engineer role
   curl -X PUT 
http://localhost:8181/api/management/v1/principals/root/principal-roles \
     -H "Authorization: Bearer $ACCESS_TOKEN" \
     -H "Content-Type: application/json" \
     --data '{"principalRole": {"name":"data_engineer"}}'
   `
   These scripts initiate the catalog, and then create a role for the user.
   
   Then, I use the following docker file to create a spark environment that has 
my spark submit script
   
   `FROM docker-hub/spark:3.5.6
   
   ENV AWS_ACCESS_KEY_ID <pure-s3-access-key>
   ENV AWS_SECRET_ACCESS_KEY <pure-s3-secret-key>
   ENV AWS_ENDPOINT_URL <pure-s3-endpoint>
   ENV AWS_REGION us-east-2
   
   COPY txl25-polaris-sql.sh /opt/spark/bin/txl25-polaris-sql.sh
   
   USER root
   RUN apt-get update
   RUN apt-get install libcap2-bin libcap-dev less -y
   USER spark`
   
   And then, from within the running spark docker container (running on host 
networking so that it can talk to polaris), I execute the following:
   
   `./spark-sql \
   --packages 
org.apache.polaris:polaris-spark-3.5_2.12:1.0.0-incubating,org.apache.iceberg:iceberg-aws-bundle:1.9.0,io.delta:delta-spark_2.12:3.3.1
 \
    --conf spark.driver.extraJavaOptions="-Divy.cache.dir=/tmp -Divy.home=/tmp" 
\
   --conf 
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,io.delta.sql.DeltaSparkSessionExtension
 \
   --conf spark.sql.catalog.lorna.warehouse=polariscatalog \
   --conf 
spark.sql.catalog.lorna.header.X-Iceberg-Access-Delegation=vended-credentials \
   --conf spark.sql.catalog.lorna=org.apache.polaris.spark.SparkCatalog \
   --conf spark.sql.catalog.lorna.uri=http://localhost:8181/api/catalog \
   --conf spark.sql.catalog.lorna.credential='root:secret' \
   --conf spark.sql.catalog.lorna.scope='PRINCIPAL_ROLE:ALL' \
   --conf spark.sql.catalog.lorna.token-refresh-enabled=true \
   --conf spark.sql.catalog.lorna.io-impl=org.apache.iceberg.io.ResolvingFileIO 
\
   --conf spark.hadoop.fs.s3.impl=org.apache.hadoop.fs.s3a.S3AFileSystem \
   --conf spark.sql.catalog.lorna.s3.region=us-east-2 \
   --conf spark.sql.catalog.lorna.s3.endpoint=<pure-s3-endpoint>`
   
   From within the spark sql shell, I execute the following:
   
   `USE lorna;
   CREATE NAMESPACE ICE_NS;
   USE NAMESPACE ICE_NS;
   CREATE TABLE PERSON (id int, name string) USING iceberg;`
   
   And then the error report occurs.
   
   ### Actual Behavior
   
   _No response_
   
   ### Expected Behavior
   
   I would expect the system to be able to interact with my custom s3 endpoint, 
but at the very least I think that the error message should be fully formed as 
to give me a better clue why this is failing.
   
   ### Additional context
   
   I do get this warning when selecting my spark catalog, so maybe this is 
relevant:
   
   `spark-sql (default)> USE lorna;
   25/07/29 18:23:08 WARN AuthManagers: Inferring rest.auth.type=oauth2 since 
property credential was provided. Please explicitly set rest.auth.type to avoid 
this warning.
   25/07/29 18:23:08 WARN OAuth2Manager: Iceberg REST client is missing the 
OAuth2 server URI configuration and defaults to 
http://localhost:8181/api/catalog/v1/oauth/tokens. This automatic fallback will 
be removed in a future Iceberg release.It is recommended to configure the 
OAuth2 endpoint using the 'oauth2-server-uri' property to be prepared. This 
warning will disappear if the OAuth2 endpoint is explicitly configured. See 
https://github.com/apache/iceberg/issues/10537
   25/07/29 18:23:10 WARN ObjectStore: Failed to get database global_temp, 
returning NoSuchObjectException
   Time taken: 2.29 `
   
   Here's the last bit of my polaris docker logs:
   
   `2025-07-29 18:47:05,312 INFO  [io.qua.htt.access-log] [,default-realm] 
[,,,] (executor-thread-1) 172.19.0.1 - - [29/Jul/2025:18:47:05 +0000] "POST 
/api/catalog/v1/oauth/tokens HTTP/1.1" 200 753
   2025-07-29 18:47:05,451 INFO  [io.qua.htt.access-log] [,default-realm] [,,,] 
(executor-thread-1) 172.19.0.1 - root [29/Jul/2025:18:47:05 +0000] "GET 
/api/catalog/v1/config?warehouse=polariscatalog HTTP/1.1" 200 2351
   2025-07-29 18:47:05,532 INFO  [io.qua.htt.access-log] [,default-realm] [,,,] 
(executor-thread-1) 172.19.0.1 - root [29/Jul/2025:18:47:05 +0000] "GET 
/api/catalog/v1/config?warehouse=polariscatalog HTTP/1.1" 200 2351
   2025-07-29 18:47:15,943 INFO  [org.apa.pol.ser.exc.IcebergExceptionMapper] 
[,default-realm] [,,,] (executor-thread-1) Handling runtimeException Namespace 
does not exist: ice_ns
   2025-07-29 18:47:15,961 INFO  [io.qua.htt.access-log] [,default-realm] [,,,] 
(executor-thread-1) 172.19.0.1 - root [29/Jul/2025:18:47:15 +0000] "GET 
/api/catalog/v1/polariscatalog/namespaces/ice_ns HTTP/1.1" 404 101
   2025-07-29 18:47:16,066 INFO  
[org.apa.pol.ser.cat.ice.IcebergCatalogHandler] [,default-realm] [,,,] 
(executor-thread-1) Initializing non-federated catalog
   2025-07-29 18:47:16,109 INFO  [io.qua.htt.access-log] [,default-realm] [,,,] 
(executor-thread-1) 172.19.0.1 - root [29/Jul/2025:18:47:16 +0000] "POST 
/api/catalog/v1/polariscatalog/namespaces HTTP/1.1" 200 96
   2025-07-29 18:48:00,501 INFO  
[org.apa.pol.ser.cat.ice.IcebergCatalogHandler] [,default-realm] [,,,] 
(executor-thread-1) Initializing non-federated catalog
   2025-07-29 18:48:00,525 INFO  [io.qua.htt.access-log] [,default-realm] [,,,] 
(executor-thread-1) 172.19.0.1 - root [29/Jul/2025:18:48:00 +0000] "GET 
/api/catalog/v1/polariscatalog/namespaces?pageToken= HTTP/1.1" 200 50
   2025-07-29 18:48:06,053 INFO  
[org.apa.pol.ser.cat.ice.IcebergCatalogHandler] [,default-realm] [,,,] 
(executor-thread-1) Initializing non-federated catalog
   2025-07-29 18:48:06,060 INFO  [io.qua.htt.access-log] [,default-realm] [,,,] 
(executor-thread-1) 172.19.0.1 - root [29/Jul/2025:18:48:06 +0000] "GET 
/api/catalog/v1/polariscatalog/namespaces/ice_ns HTTP/1.1" 200 96
   2025-07-29 18:48:26,456 INFO  [org.apa.pol.ser.exc.IcebergExceptionMapper] 
[,default-realm] [,,,] (executor-thread-1) Handling runtimeException Table does 
not exist: ice_ns.person
   2025-07-29 18:48:26,459 INFO  [io.qua.htt.access-log] [,default-realm] [,,,] 
(executor-thread-1) 172.19.0.1 - root [29/Jul/2025:18:48:26 +0000] "GET 
/api/catalog/v1/polariscatalog/namespaces/ice_ns/tables/person?snapshots=all 
HTTP/1.1" 404 100
   2025-07-29 18:48:26,485 INFO  [org.apa.pol.ser.exc.IcebergExceptionMapper] 
[,default-realm] [,,,] (executor-thread-4) Handling runtimeException Generic 
table does not exist: ice_ns.person
   2025-07-29 18:48:26,487 INFO  [io.qua.htt.access-log] [,default-realm] [,,,] 
(executor-thread-4) 172.19.0.1 - root [29/Jul/2025:18:48:26 +0000] "GET 
/api/catalog/polaris/v1/polariscatalog/namespaces/ice_ns/generic-tables/person 
HTTP/1.1" 404 108
   2025-07-29 18:48:26,614 INFO  
[org.apa.pol.ser.cat.ice.IcebergCatalogHandler] [,default-realm] [,,,] 
(executor-thread-4) Initializing non-federated catalog
   2025-07-29 18:48:26,640 INFO  [org.apa.ice.BaseMetastoreCatalog] 
[,default-realm] [,,,] (executor-thread-4) Table properties set at catalog 
level through catalog properties: {}
   2025-07-29 18:48:26,650 INFO  [org.apa.ice.BaseMetastoreCatalog] 
[,default-realm] [,,,] (executor-thread-4) Table properties enforced at catalog 
level through catalog properties: {}
   2025-07-29 18:48:27,087 INFO  [org.apa.pol.ser.exc.IcebergExceptionMapper] 
[,default-realm] [,,,] (executor-thread-4) Handling runtimeException Failed to 
get subscoped credentials: (Service: Sts, Status Code: 400, Request ID: null) 
(SDK Attempt Count: 1)
   2025-07-29 18:48:27,088 INFO  [io.qua.htt.access-log] [,default-realm] [,,,] 
(executor-thread-4) 172.19.0.1 - root [29/Jul/2025:18:48:27 +0000] "POST 
/api/catalog/v1/polariscatalog/namespaces/ice_ns/tables HTTP/1.1" 422 183
   `
   
   ### System information
   
   Tested with polaris docker images 1.0.1-incubating-rc0 and 1.0.0-incubating. 
Using a spark docker image versioned 3.5.6


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@polaris.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to