nickdelnano opened a new issue, #10226: URL: https://github.com/apache/iceberg/issues/10226
### Apache Iceberg version 1.4.3 ### Query engine Flink ### Please describe the bug 🐞 Hi, I am observing issues when creating an Iceberg table with a Glue catalog configured to use Lake Formation. I see an integration test case for the issue I am experiencing so I will explain the issue through this. I will include details about my use case with Flink afterwards. This issue is very similar to https://github.com/apache/iceberg/issues/6523 however I observe the issue is not fixed. ### TestLakeFormationMetadataOperations.java test testCreateTableSuccess [Link to test](https://github.com/apache/iceberg/blob/main/aws/src/integration/java/org/apache/iceberg/aws/lakeformation/TestLakeFormationMetadataOperations.java#L167) This test fails in my AWS account. I have walked through the code line by line in a debugger and believe that it would fail in any environment due to the below. The test fails on this [line](https://github.com/apache/iceberg/blob/main/aws/src/integration/java/org/apache/iceberg/aws/lakeformation/TestLakeFormationMetadataOperations.java#L182) because Lake Formation permissions cannot be granted on a table that does not exist. It first yields an exception from the call to `glueCatalogPrivilegedRole.createTable` but then proceeds to the `finally` block. As far as I can tell, the AWS integration tests are not run on opened PRs so I cannot easily demonstrate this in an issue or PR. If it is possible to do this please let me know how and I will create a PR that shows it. Previous work has been done to create an initial or "dummy" Glue table if Lake Formation is enabled and the table requested for creation does not exist yet ([1] https://github.com/apache/iceberg/pull/4423/files). However, if Lake Formation is enabled, [2] [GlueCatalog sets `put(S3FileIOProperties.PRELOAD_CLIENT_ENABLED, String.valueOf(true)`](https://github.com/apache/iceberg/blob/main/aws/src/main/java/org/apache/iceberg/aws/glue/GlueCatalog.java#L224-L232), which triggers the below code path and results in a call to `aws glue get-table` API before any table exists. This causes an uncaught exception and creating a table fails. - https://github.com/apache/iceberg/blob/main/aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIO.java#L374-L376 - https://github.com/apache/iceberg/blob/main/aws/src/main/java/org/apache/iceberg/aws/lakeformation/LakeFormationAwsClientFactory.java#L78-L79 - https://github.com/apache/iceberg/blob/main/aws/src/main/java/org/apache/iceberg/aws/lakeformation/LakeFormationAwsClientFactory.java#L107 Please provide any advice or workaround for how a table can be created in a Glue catalog with Lake Formation enabled without encountering this issue. ### Error in my Flink environment I am using Flink on EC2 (not EMR) and using Iceberg, Glue and Lake Formation. Iceberg catalog configuration: ``` "CREATE CATALOG glue_catalog WITH ( 'type'='iceberg', 'warehouse'='s3://bucket' 'catalog-impl'='org.apache.iceberg.aws.glue.GlueCatalog' 'io-impl'='org.apache.iceberg.aws.s3.S3FileIO' 'client.factory'='org.apache.iceberg.aws.lakeformation.LakeFormationAwsClientFactory' 'client.assume-role.arn'='arn:aws:iam::<redacted>:role/<redacted>' 'glue.lakeformation-enabled'='true' 'client.assume-role.tags.LakeFormationAuthorizedCaller'='<redacted>' 'client.assume-role.region'='us-east-1' 'glue.account-id'='<redacted>' ); ``` The stacktrace confirms the behavior explained for the integration test: in the call stack of creating a table, `S3FileIO` is initialized and `LakeFormationAwsClientFactory.isTableRegisteredWithLakeFormation` is called before any Glue table exists. Stacktrace: ``` Caused by: software.amazon.awssdk.services.glue.model.EntityNotFoundException: Entity Not Found (Service: Glue, Status Code: 400, Request ID: efa126e5-e9d5-41f8-bb5a-c8d30bd166eb) at software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handleErrorResponse(CombinedResponseHandler.java:125) ~[?:?] at software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handleResponse(CombinedResponseHandler.java:82) ~[?:?] at software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handle(CombinedResponseHandler.java:60) ~[?:?] at software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handle(CombinedResponseHandler.java:41) ~[?:?] at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:40) ~[?:?] at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:30) ~[?:?] at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) ~[?:?] at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:73) ~[?:?] at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:42) ~[?:?] at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:78) ~[?:?] at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:40) ~[?:?] at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:50) ~[?:?] at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:36) ~[?:?] at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:81) ~[?:?] at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:36) ~[?:?] at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) ~[?:?] at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:56) ~[?:?] at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:36) ~[?:?] at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:80) ~[?:?] at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:60) ~[?:?] at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:42) ~[?:?] at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:48) ~[?:?] at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:31) ~[?:?] at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) ~[?:?] at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) ~[?:?] at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37) ~[?:?] at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26) ~[?:?] at software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:193) ~[?:?] at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:103) ~[?:?] at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:171) ~[?:?] at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$1(BaseSyncClientHandler.java:82) ~[?:?] at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:179) ~[?:?] at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:76) ~[?:?] at software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45) ~[?:?] at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:56) ~[?:?] at software.amazon.awssdk.services.glue.DefaultGlueClient.getTable(DefaultGlueClient.java:8903) ~[?:?] at org.apache.iceberg.aws.lakeformation.LakeFormationAwsClientFactory.isTableRegisteredWithLakeFormation(LakeFormationAwsClientFactory.java:115) ~[?:?] at org.apache.iceberg.aws.lakeformation.LakeFormationAwsClientFactory.s3(LakeFormationAwsClientFactory.java:79) ~[?:?] at org.apache.iceberg.aws.s3.S3FileIO.client(S3FileIO.java:327) ~[?:?] at org.apache.iceberg.aws.s3.S3FileIO.initialize(S3FileIO.java:375) ~[?:?] at org.apache.iceberg.CatalogUtil.loadFileIO(CatalogUtil.java:325) ~[?:?] at org.apache.iceberg.aws.glue.GlueTableOperations.initializeFileIO(GlueTableOperations.java:223) ~[?:?] at org.apache.iceberg.aws.glue.GlueTableOperations.io(GlueTableOperations.java:115) ~[?:?] at org.apache.iceberg.aws.glue.GlueCatalog.newTableOps(GlueCatalog.java:246) ~[?:?] at org.apache.iceberg.BaseMetastoreCatalog$BaseMetastoreCatalogTableBuilder.create(BaseMetastoreCatalog.java:188) ~[?:?] at org.apache.iceberg.CachingCatalog$CachingTableBuilder.lambda$create$0(CachingCatalog.java:261) ~[?:?] at org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.BoundedLocalCache.lambda$doComputeIfAbsent$14(BoundedLocalCache.java:2406) ~[?:?] at java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1908) ~[?:?] at org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.BoundedLocalCache.doComputeIfAbsent(BoundedLocalCache.java:2404) ~[?:?] at org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.BoundedLocalCache.computeIfAbsent(BoundedLocalCache.java:2387) ~[?:?] at org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.LocalCache.computeIfAbsent(LocalCache.java:108) ~[?:?] at org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.LocalManualCache.get(LocalManualCache.java:62) ~[?:?] at org.apache.iceberg.CachingCatalog$CachingTableBuilder.create(CachingCatalog.java:257) ~[?:?] at org.apache.iceberg.catalog.Catalog.createTable(Catalog.java:75) ~[?:?] at org.apache.iceberg.flink.FlinkCatalog.createIcebergTable(FlinkCatalog.java:415) ~[?:?] at org.apache.iceberg.flink.FlinkCatalog.createTable(FlinkCatalog.java:394) ~[?:?] at org.apache.flink.table.catalog.CatalogManager.lambda$createTable$11(CatalogManager.java:663) ~[flink-table-api-java-uber-1.17.0.jar:1.17.0] at org.apache.flink.table.catalog.CatalogManager.execute(CatalogManager.java:909) ~[flink-table-api-java-uber-1.17.0.jar:1.17.0] at org.apache.flink.table.catalog.CatalogManager.createTable(CatalogManager.java:652) ~[flink-table-api-java-uber-1.17.0.jar:1.17.0] at org.apache.flink.table.api.internal.TableEnvironmentImpl.createTable(TableEnvironmentImpl.java:532) ~[flink-table-api-java-uber-1.17.0.jar:1.17.0] at ``` ### Potential fix Instead of GlueCatalog setting `.put(S3FileIOProperties.PRELOAD_CLIENT_ENABLED, String.valueOf(true));` in all cases when Lake Formation is enabled, perhaps it could be user configurable. `PRELOAD_CLIENT_ENABLED documentation` (https://github.com/apache/iceberg/blob/10ffc606219d34c801c2109a9d19d0848a63d2dc/aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIOProperties.java#L371-L375): -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org