nickdelnano opened a new issue, #10226:
URL: https://github.com/apache/iceberg/issues/10226

   ### Apache Iceberg version
   
   1.4.3
   
   ### Query engine
   
   Flink
   
   ### Please describe the bug 🐞
   
   Hi,
   
   I am observing issues when creating an Iceberg table with a Glue catalog 
configured to use Lake Formation.
   
   I see an integration test case for the issue I am experiencing so I will 
explain the issue through this. I will include details about my use case with 
Flink afterwards.
   
   This issue is very similar to https://github.com/apache/iceberg/issues/6523 
however I observe the issue is not fixed.
   
   ### TestLakeFormationMetadataOperations.java test testCreateTableSuccess
   [Link to 
test](https://github.com/apache/iceberg/blob/main/aws/src/integration/java/org/apache/iceberg/aws/lakeformation/TestLakeFormationMetadataOperations.java#L167)
   
   This test fails in my AWS account. I have walked through the code line by 
line in a debugger and believe that it would fail in any environment due to the 
below.
   
   The test fails on this 
[line](https://github.com/apache/iceberg/blob/main/aws/src/integration/java/org/apache/iceberg/aws/lakeformation/TestLakeFormationMetadataOperations.java#L182)
 because Lake Formation permissions cannot be granted on a table that does not 
exist. It first yields an exception from the call to 
`glueCatalogPrivilegedRole.createTable` but then proceeds to the `finally` 
block.
   
   As far as I can tell, the AWS integration tests are not run on opened PRs so 
I cannot easily demonstrate this in an issue or PR. If it is possible to do 
this please let me know how and I will create a PR that shows it.
   
   Previous work has been done to create an initial or "dummy" Glue table if 
Lake Formation is enabled and the table requested for creation does not exist 
yet ([1] https://github.com/apache/iceberg/pull/4423/files). However, if Lake 
Formation is enabled, [2] [GlueCatalog sets 
`put(S3FileIOProperties.PRELOAD_CLIENT_ENABLED, 
String.valueOf(true)`](https://github.com/apache/iceberg/blob/main/aws/src/main/java/org/apache/iceberg/aws/glue/GlueCatalog.java#L224-L232),
 which triggers the below code path and results in a call to `aws glue 
get-table` API before any table exists. This causes an uncaught exception and 
creating a table fails.
   - 
https://github.com/apache/iceberg/blob/main/aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIO.java#L374-L376
   - 
   
https://github.com/apache/iceberg/blob/main/aws/src/main/java/org/apache/iceberg/aws/lakeformation/LakeFormationAwsClientFactory.java#L78-L79
   - 
https://github.com/apache/iceberg/blob/main/aws/src/main/java/org/apache/iceberg/aws/lakeformation/LakeFormationAwsClientFactory.java#L107
   
   Please provide any advice or workaround for how a table can be created in a 
Glue catalog with Lake Formation enabled without encountering this issue.
   
   ### Error in my Flink environment
   I am using Flink on EC2 (not EMR) and using Iceberg, Glue and Lake Formation.
   
   Iceberg catalog configuration:
   ```
      "CREATE CATALOG glue_catalog WITH (
                   'type'='iceberg',
                   'warehouse'='s3://bucket'
                   'catalog-impl'='org.apache.iceberg.aws.glue.GlueCatalog'
                   'io-impl'='org.apache.iceberg.aws.s3.S3FileIO'
                   
'client.factory'='org.apache.iceberg.aws.lakeformation.LakeFormationAwsClientFactory'
                   
'client.assume-role.arn'='arn:aws:iam::<redacted>:role/<redacted>'
                   'glue.lakeformation-enabled'='true'
                   
'client.assume-role.tags.LakeFormationAuthorizedCaller'='<redacted>'
                   'client.assume-role.region'='us-east-1'
                   'glue.account-id'='<redacted>'
                   );
   ```
   
   The stacktrace confirms the behavior explained for the integration test:  in 
the call stack of creating a table, `S3FileIO` is initialized and 
`LakeFormationAwsClientFactory.isTableRegisteredWithLakeFormation` is called 
before any Glue table exists.
   
   Stacktrace:
   ```
   Caused by: 
software.amazon.awssdk.services.glue.model.EntityNotFoundException: Entity Not 
Found (Service: Glue, Status Code: 400, Request ID: 
efa126e5-e9d5-41f8-bb5a-c8d30bd166eb)
           at 
software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handleErrorResponse(CombinedResponseHandler.java:125)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handleResponse(CombinedResponseHandler.java:82)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handle(CombinedResponseHandler.java:60)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handle(CombinedResponseHandler.java:41)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:40)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:30)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:73)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:42)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:78)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:40)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:50)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:36)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:81)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:36)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:56)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:36)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:80)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:60)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:42)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:48)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:31)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:193)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:103)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:171)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$1(BaseSyncClientHandler.java:82)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:179)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:76)
 ~[?:?]
           at 
software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45)
 ~[?:?]
           at 
software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:56)
 ~[?:?]
           at 
software.amazon.awssdk.services.glue.DefaultGlueClient.getTable(DefaultGlueClient.java:8903)
 ~[?:?]
           at 
org.apache.iceberg.aws.lakeformation.LakeFormationAwsClientFactory.isTableRegisteredWithLakeFormation(LakeFormationAwsClientFactory.java:115)
 ~[?:?]
           at 
org.apache.iceberg.aws.lakeformation.LakeFormationAwsClientFactory.s3(LakeFormationAwsClientFactory.java:79)
 ~[?:?]
           at org.apache.iceberg.aws.s3.S3FileIO.client(S3FileIO.java:327) 
~[?:?]
           at org.apache.iceberg.aws.s3.S3FileIO.initialize(S3FileIO.java:375) 
~[?:?]
           at org.apache.iceberg.CatalogUtil.loadFileIO(CatalogUtil.java:325) 
~[?:?]
           at 
org.apache.iceberg.aws.glue.GlueTableOperations.initializeFileIO(GlueTableOperations.java:223)
 ~[?:?]
           at 
org.apache.iceberg.aws.glue.GlueTableOperations.io(GlueTableOperations.java:115)
 ~[?:?]
           at 
org.apache.iceberg.aws.glue.GlueCatalog.newTableOps(GlueCatalog.java:246) ~[?:?]
           at 
org.apache.iceberg.BaseMetastoreCatalog$BaseMetastoreCatalogTableBuilder.create(BaseMetastoreCatalog.java:188)
 ~[?:?]
           at 
org.apache.iceberg.CachingCatalog$CachingTableBuilder.lambda$create$0(CachingCatalog.java:261)
 ~[?:?]
           at 
org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.BoundedLocalCache.lambda$doComputeIfAbsent$14(BoundedLocalCache.java:2406)
 ~[?:?]
           at 
java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1908) 
~[?:?]
           at 
org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.BoundedLocalCache.doComputeIfAbsent(BoundedLocalCache.java:2404)
 ~[?:?]
           at 
org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.BoundedLocalCache.computeIfAbsent(BoundedLocalCache.java:2387)
 ~[?:?]
           at 
org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.LocalCache.computeIfAbsent(LocalCache.java:108)
 ~[?:?]
           at 
org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.LocalManualCache.get(LocalManualCache.java:62)
 ~[?:?]
           at 
org.apache.iceberg.CachingCatalog$CachingTableBuilder.create(CachingCatalog.java:257)
 ~[?:?]
           at org.apache.iceberg.catalog.Catalog.createTable(Catalog.java:75) 
~[?:?]
           at 
org.apache.iceberg.flink.FlinkCatalog.createIcebergTable(FlinkCatalog.java:415) 
~[?:?]
           at 
org.apache.iceberg.flink.FlinkCatalog.createTable(FlinkCatalog.java:394) ~[?:?]
           at 
org.apache.flink.table.catalog.CatalogManager.lambda$createTable$11(CatalogManager.java:663)
 ~[flink-table-api-java-uber-1.17.0.jar:1.17.0]
           at 
org.apache.flink.table.catalog.CatalogManager.execute(CatalogManager.java:909) 
~[flink-table-api-java-uber-1.17.0.jar:1.17.0]
           at 
org.apache.flink.table.catalog.CatalogManager.createTable(CatalogManager.java:652)
 ~[flink-table-api-java-uber-1.17.0.jar:1.17.0]
           at 
org.apache.flink.table.api.internal.TableEnvironmentImpl.createTable(TableEnvironmentImpl.java:532)
 ~[flink-table-api-java-uber-1.17.0.jar:1.17.0]
           at 
   ```
   
   ### Potential fix
   Instead of GlueCatalog setting 
`.put(S3FileIOProperties.PRELOAD_CLIENT_ENABLED, String.valueOf(true));` in all 
cases when Lake Formation is enabled, perhaps it could be user configurable.
   
   `PRELOAD_CLIENT_ENABLED documentation` 
(https://github.com/apache/iceberg/blob/10ffc606219d34c801c2109a9d19d0848a63d2dc/aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIOProperties.java#L371-L375):


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to