aalopatin commented on issue #15908:
URL: https://github.com/apache/iceberg/issues/15908#issuecomment-4237732209

   Guys, I can explain what's happening here 😄 
   
   We create schemas in HMS catalog like this:
   
   ```
   CREATE SCHEMA ale_lopatin
   LOCATION 's3a://schema_bucket/'
   ```
   
   Note that we set path for schema as a root of a bucket.
   
   Then we create table like this:
   
   ```
      CREATE TABLE schema_bucket.tabel (
           id BIGINT,
           name STRING
       )
       USING ICEBERG;
   ```
   
   In this case we get result:
   
   ```
   CREATE TABLE iceberg.ale_lopatin.ale_lopatin_table (
     id BIGINT,
     name STRING)
   USING iceberg
   LOCATION 's3a://schema_bucket//table'
   TBLPROPERTIES (
     'current-snapshot-id' = 'none',
     'format' = 'iceberg/parquet',
     'format-version' = '2',
     'write.parquet.compression-codec' = 'zstd')
   ```
   
   Here're our Spark configs for Iceberg Catalog:
   
   ```
     spark.sql.extensions: 
org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
         
     spark.sql.catalog.iceberg: org.apache.iceberg.spark.SparkCatalog
     spark.sql.catalog.iceberg.type: hive
     spark.sql.catalog.iceberg.uri: thrift://hms-dev.dmp.x5.ru:9083
     spark.sql.catalog.iceberg.io-impl: org.apache.iceberg.aws.s3.S3FileIO
     spark.sql.catalog.iceberg.s3.endpoint: https://s3dh-dev.dmp.x5.ru:9000
     spark.sql.catalog.iceberg.s3.path-style-access: true
     spark.sql.catalog.iceberg.s3.separator: /
     spark.sql.catalog.iceberg.client.region: dummy
         
     spark.sql.defaultCatalog: iceberg
   ```
   
   As you can see we don't set `spark.sql.catalog.iceberg.warehouse` property. 
I tried to set it as: s3a:// but it fails with error - something about bucket 
isn't set.
   
   If I create schema like this:
   ```
   CREATE SCHEMA ale_lopatin
   LOCATION 's3a://schema_bucket/schema'
   ```
   
   then table gets a location `s3a://schema_bucket/schema/table`
   
   So the problem has a couple of reasons:
   1. We don't use warehouse config and rely on an explicitly specified 
location of schema
   2. Location of a schema is a bucket in S3 that ends with /
   
   We use `iceberg-spark-runtime` and `iceberg-aws-bundle` of 1.10.1 version 
for both Spark 3.5.8 and Spark 4.0.1.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to