sagarlakshmipathy commented on issue #504:
URL:
https://github.com/apache/incubator-xtable/issues/504#issuecomment-2365019590
I ran into an issue while using Snowflake's polaris catalog. Documenting
here.
```
java -cp
/Users/sagarl/Downloads/iceberg-spark-runtime-3.4_2.12-1.4.1.jar:/Users/sagarl/latest/incubator-xtable/xtable-utilities/target/xtable-utilities-0.2.0-SNAPSHOT-bundled.jar:/Users/sagarl/Downloads/bundle-2.20.160.jar:/Users/sagarl/Downloads/url-connection-client-2.20.160.jar
org.apache.xtable.utilities.RunSync --datasetConfig config.yaml
--icebergCatalogConfig catalog.yaml
```
### Error
```
2024-09-20 22:55:30 INFO org.apache.iceberg.RemoveSnapshots:328 - Cleaning
up expired files (local, incremental)
2024-09-20 22:55:31 ERROR org.apache.xtable.spi.sync.TableFormatSync:78 -
Failed to sync snapshot
org.apache.iceberg.exceptions.ForbiddenException: Forbidden: Delegate access
to table with user-specified write location is temporarily not supported.
at
org.apache.iceberg.rest.ErrorHandlers$DefaultErrorHandler.accept(ErrorHandlers.java:157)
~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
at
org.apache.iceberg.rest.ErrorHandlers$CommitErrorHandler.accept(ErrorHandlers.java:88)
~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
at
org.apache.iceberg.rest.ErrorHandlers$CommitErrorHandler.accept(ErrorHandlers.java:71)
~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
at org.apache.iceberg.rest.HTTPClient.throwFailure(HTTPClient.java:183)
~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
at org.apache.iceberg.rest.HTTPClient.execute(HTTPClient.java:292)
~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
at org.apache.iceberg.rest.HTTPClient.execute(HTTPClient.java:226)
~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
at org.apache.iceberg.rest.HTTPClient.post(HTTPClient.java:337)
~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
at org.apache.iceberg.rest.RESTClient.post(RESTClient.java:112)
~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
at
org.apache.iceberg.rest.RESTTableOperations.commit(RESTTableOperations.java:152)
~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
at
org.apache.iceberg.BaseTransaction.lambda$commitSimpleTransaction$3(BaseTransaction.java:416)
~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
at
org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:413)
~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
at
org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:219)
~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:203)
~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:196)
~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
at
org.apache.iceberg.BaseTransaction.commitSimpleTransaction(BaseTransaction.java:412)
~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
at
org.apache.iceberg.BaseTransaction.commitTransaction(BaseTransaction.java:307)
~[iceberg-spark-runtime-3.4_2.12-1.4.1.jar:?]
at
org.apache.xtable.iceberg.IcebergConversionTarget.completeSync(IcebergConversionTarget.java:221)
~[xtable-utilities-0.2.0-SNAPSHOT-bundled.jar:0.2.0-SNAPSHOT]
at
org.apache.xtable.spi.sync.TableFormatSync.getSyncResult(TableFormatSync.java:165)
~[xtable-utilities-0.2.0-SNAPSHOT-bundled.jar:0.2.0-SNAPSHOT]
at
org.apache.xtable.spi.sync.TableFormatSync.syncSnapshot(TableFormatSync.java:70)
[xtable-utilities-0.2.0-SNAPSHOT-bundled.jar:0.2.0-SNAPSHOT]
at
org.apache.xtable.conversion.ConversionController.syncSnapshot(ConversionController.java:182)
[xtable-utilities-0.2.0-SNAPSHOT-bundled.jar:0.2.0-SNAPSHOT]
at
org.apache.xtable.conversion.ConversionController.sync(ConversionController.java:118)
[xtable-utilities-0.2.0-SNAPSHOT-bundled.jar:0.2.0-SNAPSHOT]
at org.apache.xtable.utilities.RunSync.main(RunSync.java:191)
[xtable-utilities-0.2.0-SNAPSHOT-bundled.jar:0.2.0-SNAPSHOT]
```
The sync did not completely happen at this point meaning the table gets
created in target format in the catalog, but doesn't have data in it.
### config.yaml
```
sourceFormat: HUDI
targetFormats:
- ICEBERG
datasets:
-
tableBasePath: s3://xtable-demo-bucket/spark_demo/people
tableName: people
partitionSpec: city:VALUE
namespace: spark_demo
```
### catalog.yaml
```
catalogImpl: org.apache.iceberg.rest.RESTCatalog
catalogName: iceberg_catalog
catalogOptions:
io-impl: org.apache.iceberg.aws.s3.S3FileIO
warehouse: iceberg_catalog
uri: https://<polaris-id>.snowflakecomputing.com/polaris/api/catalog
credential: <client-id>:<client-secret>
header.X-Iceberg-Access-Delegation: vended-credentials
scope: PRINCIPAL_ROLE:ALL
client.region: us-west-2
```
I could access the table using spark-shell using command
```
pyspark --packages
org.apache.iceberg:iceberg-spark-runtime-3.4_2.12:1.4.1,software.amazon.awssdk:bundle:2.20.160,software.amazon.awssdk:url-connection-client:2.20.160
\
--conf
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
\
--conf spark.sql.defaultCatalog=polaris \
--conf spark.sql.catalog.polaris=org.apache.iceberg.spark.SparkCatalog \
--conf spark.sql.catalog.polaris.type=rest \
--conf
spark.sql.catalog.polaris.header.X-Iceberg-Access-Delegation=vended-credentials
\
--conf
spark.sql.catalog.polaris.uri=https://<polaris-id>.snowflakecomputing.com/polaris/api/catalog
\
--conf spark.sql.catalog.polaris.credential=<client-id>:<client-secret> \
--conf spark.sql.catalog.polaris.warehouse=iceberg_catalog \
--conf spark.sql.catalog.polaris.scope=PRINCIPAL_ROLE:my_spark_admin_role \
--conf spark.sql.catalog.polaris.client.region=us-west-2
```
```
>>> spark.sql("USE spark_demo")
DataFrame[]
>>> spark.sql("SHOW TABLES").show()
+----------+----------+-----------+
| namespace| tableName|isTemporary|
+----------+----------+-----------+
|spark_demo| people| false|
|spark_demo|test_table| false|
+----------+----------+-----------+
>>> spark.sql("SELECT * FROM people").show()
+-------------------+--------------------+------------------+----------------------+-----------------+---+----+---+----+---------+
|_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|_hoodie_file_name|
id|name|age|city|create_ts|
+-------------------+--------------------+------------------+----------------------+-----------------+---+----+---+----+---------+
+-------------------+--------------------+------------------+----------------------+-----------------+---+----+---+----+---------+
>>>
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]