adutra commented on issue #3621: URL: https://github.com/apache/polaris/issues/3621#issuecomment-3826506144
To be honest, I'm not 100% convinced of the usefulness of Polaris' object layout. S3 is said to support 3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD requests per second per "partitioned prefix": https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimizing-performance.html But we don't know exactly what a "partitioned prefix" is. For data file hotspots: Iceberg's layout is likely sufficient. For metadata file hotspots: Polaris's feature might provide additional value since Iceberg's layout doesn't apply to these files, but metadata operations are typically lower volume than data writes. The following questions could help clarify the feature's value: 1. Was this feature introduced based on empirical evidence of improved S3 performance, or based on theoretical assumptions about prefix distribution? 2. Are there benchmarks or case studies showing that per-table entropy prefixes reduces hotspots or improve throughput? 3. Does AWS documentation or support confirm that prefix diversity helps S3 partition more effectively? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
