borderlayout opened a new issue, #11488:
URL: https://github.com/apache/iceberg/issues/11488
### Feature Request / Improvement
Hi all:
When using Amazon S3 object storage with Iceberg, there can be a
throttling issue for the same path. By setting the parameter
write.object-storage.enabled=true, files under the same file path are hashed to
different paths, which avoids the throttling issue with Amazon S3 object
storage.
(see:https://iceberg.apache.org/docs/nightly/docs/configuration/?h=write.object+storage.enabled#write-properties)
However, I encountered a problem: when setting up partitioned tables, the
hash values in the path are inserted before the partition name, making it
difficult to gather information for individual partition, such as the number of
files or file sizes of one partition.
Is there a reason for designing it this way? If putting the random value
after the partition fields would be a better approach ?
- one partition column((parCol):
bucket/iceberg_test1/data/_44Xmw/parCol=2024-01-10/00295-2798-63356e4e-b4ec-4a80-ae3f-6888f2f7eac9-0-00003.parquet
bucket/iceberg_test1/data/_5l5dQ/parCol=2024-01-09/00063-2566-63356e4e-b4ec-4a80-ae3f-6888f2f7eac9-0-00006.parquet
==changed ==>
bucket/iceberg_test1/data/parCol=2024-01-10/_44Xmw/00295-2798-63356e4e-b4ec-4a80-ae3f-6888f2f7eac9-0-00003.parquet
bucket/iceberg_test1/data/parCol=2024-01-09/_5l5dQ/00063-2566-63356e4e-b4ec-4a80-ae3f-6888f2f7eac9-0-00006.parquet
- two partition columns(parCol,gender):
bucket/iceberg_test3/data/APigWw/parCol=2024-01-01/gender=male/00001-7234-7e44c302-a716-4da8-9ea0-0c44caf9a249-0-00003.parquet
bucket/iceberg_test3/data/4Z-_sw/parCol=2024-01-01/gender=male/00001-7234-7e44c302-a716-4da8-9ea0-0c44caf9a249-0-00001.parquet
===changed==>
bucket/iceberg_test3/data/parCol=2024-01-01/gender=male/APigWw/00001-7234-7e44c302-a716-4da8-9ea0-0c44caf9a249-0-00003.parquet
bucket/iceberg_test3/data/parCol=2024-01-01/gender=male/4Z-_sw/00001-7234-7e44c302-a716-4da8-9ea0-0c44caf9a249-0-00001.parquet
### Query engine
Spark
### Willingness to contribute
- [ ] I can contribute this improvement/feature independently
- [ ] I would be willing to contribute this improvement/feature with
guidance from the Iceberg community
- [ ] I cannot contribute this improvement/feature at this time
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]