[I] [SUPPORT] AWS Athena query fail when compaction is scheduled for MOR table [hudi]

via GitHub Mon, 23 Oct 2023 05:28:39 -0700


brightwon opened a new issue, #9907:
URL: https://github.com/apache/hudi/issues/9907


   I'm using hudi 0.14.0 with flink 1.16.1 to store data from kafka to s3.
   but Athena(Engine 3) query to MOR table is not working because of this error.
   
   ```
   Error running query: HIVE_UNKNOWN_ERROR: 
io.trino.plugin.hive.s3.TrinoS3FileSystem$UnrecoverableS3OperationException: 
com.amazonaws.services.s3.model.AmazonS3Exception: The specified key does not 
exist. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; Request 
ID: ***; S3 Extended Request ID: ***; Proxy: null), S3 Extended Request ID: *** 
(Bucket: mybucket, Key: 
mytable/.hoodie/.aux/20231014095517882.compaction.requested)
   ```
   
   This error occurs if compaction is scheduled.
   After compaction is complete, query is working.
   
   Here's flink hudi option (Java)
   ```
   flinkHudiOptions.put(FlinkOptions.PATH.key(), basePath);
   flinkHudiOptions.put(FlinkOptions.TABLE_TYPE.key(), 
HoodieTableType.MERGE_ON_READ.name());
   flinkHudiOptions.put(FlinkOptions.OPERATION.key(), 
WriteOperationType.UPSERT.name());
   flinkHudiOptions.put(FlinkOptions.PRECOMBINE_FIELD.key(), "event_time");
   flinkHudiOptions.put(FlinkOptions.KEYGEN_CLASS_NAME.key(), 
"org.apache.hudi.keygen.ComplexKeyGenerator");
   flinkHudiOptions.put(FlinkOptions.COMPACTION_ASYNC_ENABLED.key(), "true");
   flinkHudiOptions.put(FlinkOptions.COMPACTION_TRIGGER_STRATEGY.key(), 
FlinkOptions.NUM_COMMITS);
   flinkHudiOptions.put(FlinkOptions.COMPACTION_DELTA_COMMITS.key(), "5");
   flinkHudiOptions.put(FlinkOptions.COMPACTION_MAX_MEMORY.key(), "1024");
   flinkHudiOptions.put(FlinkOptions.METADATA_ENABLED.key(), "true");
   flinkHudiOptions.put(HoodieMetadataConfig.ASYNC_INDEX_ENABLE.key(), "true");
   
flinkHudiOptions.put(HoodieMetadataConfig.ENABLE_METADATA_INDEX_COLUMN_STATS.key(),
 "true");
   flinkHudiOptions.put(HoodieWriteConfig.WRITE_CONCURRENCY_MODE.key(), 
WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL.name());
   flinkHudiOptions.put(HoodieLockConfig.LOCK_PROVIDER_CLASS_NAME.key(), 
"org.apache.hudi.client.transaction.lock.InProcessLockProvider");
   flinkHudiOptions.put(FlinkOptions.CLEAN_ASYNC_ENABLED.key(), "true");
   flinkHudiOptions.put(FlinkOptions.CLEAN_POLICY.key(), 
HoodieCleaningPolicy.KEEP_LATEST_BY_HOURS.name());
   flinkHudiOptions.put(FlinkOptions.CLEAN_RETAIN_HOURS.key(), "24");
   ```
   
   My flink application works on flink-operator's FlinkDeployment (on AWS EKS).
   I ran the hive-sync command once in EMR 6.10.0 (Hudi 0.12.2-amzn-0 version) 
for easy use of Glue MetaStore.
   
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. run flink application with above options
   2. run hive-sync once for making  using hive-sync on EMR
   3. run athena query when compaction is scheduled
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version : 0.14.0
   
   * Flink version : 1.16.1
   
   * Hive version :
   
   * Hadoop version :
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : no
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[I] [SUPPORT] AWS Athena query fail when compaction is scheduled for MOR table [hudi]

Reply via email to