kevinjqliu commented on issue #1449: URL: https://github.com/apache/iceberg-python/issues/1449#issuecomment-2557247096
I have not tested this personally but from reading the AWS blog on connect Spark to AWS Glue Iceberg REST catalog, there are some configurations that are different from what I would expect. https://aws.amazon.com/blogs/big-data/read-and-write-s3-iceberg-table-using-aws-glue-iceberg-rest-catalog-from-open-source-apache-spark/ ``` import sys import os import time from pyspark.sql import SparkSession #Replace <aws_region> with AWS region name. #Replace <aws_account_id> with AWS account ID. spark = SparkSession.builder.appName('osspark') \ .config('spark.jars.packages', 'org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.4.1,software.amazon.awssdk:bundle:2.20.160,software.amazon.awssdk:url-connection-client:2.20.160') \ .config('spark.sql.extensions', 'org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions') \ .config('spark.sql.defaultCatalog', 'spark_catalog') \ .config('spark.sql.catalog.spark_catalog', 'org.apache.iceberg.spark.SparkCatalog') \ .config('spark.sql.catalog.spark_catalog.type', 'rest') \ .config('spark.sql.catalog.spark_catalog.uri','https://glue.<aws_region>.amazonaws.com/iceberg') \ .config('spark.sql.catalog.spark_catalog.warehouse','<aws_account_id>') \ .config('spark.sql.catalog.spark_catalog.rest.sigv4-enabled','true') \ .config('spark.sql.catalog.spark_catalog.rest.signing-name','glue') \ .config('spark.sql.catalog.spark_catalog.rest.signing-region', <aws_region>) \ .config('spark.sql.catalog.spark_catalog.io-impl','org.apache.iceberg.aws.s3.S3FileIO') \ .config('spark.hadoop.fs.s3a.aws.credentials.provider','org.apache.hadoop.fs.s3a.SimpleAWSCredentialProvider') \ .config('spark.sql.catalog.spark_catalog.rest-metrics-reporting-enabled','false') \ .getOrCreate() ``` Specifically, notice the `warehouse` parameter. It might help trying to replicate what this spark config is doing -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
