echisan opened a new issue, #10601:
URL: https://github.com/apache/hudi/issues/10601
**_Tips before filing an issue_**
- Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
- Join the mailing list to engage in conversations and get faster support at
dev-subscr...@hudi.apache.org.
- If you have triaged this as a bug, then file an
[issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
**Describe the problem you faced**
I am not sure why the parquet file is missing, flinkjob did not restart. I
would like to know how to handle this issue. Is it possible to ignore the
missing file?
**To Reproduce**
Steps to reproduce the behavior:
1.Set up a FlinkSQL job with Kafka as the data source.
2.Configure the job to write data into a Hudi Cow table with online
clustering.
3.Execute the job.
**Expected behavior**
A clear and concise description of what you expected to happen.
**Environment Description**
* Hudi version : 0.13.1-rc1
* Spark version :
* Hive version : 3.1.3
* Hadoop version : 2.9.2
* Storage (HDFS/S3/GCS..) : S3
* Running on Docker? (yes/no) : flink on k8s
**Additional context**
```sql
CREATE TABLE ods_mqtt_msg(
dt STRING,
PRIMARY KEY (`field1`, `field2`, `field3`) NOT ENFORCED
)
PARTITIONED BY (`dt`)
WITH (
'connector' = 'hudi',
'table.type' = 'COPY_ON_WRITE',
'path' = 's3a:///lakehouse/hudi/device_mqtt_msg/ods_mqtt_msg',
'write.operation' = 'INSERT',
'clustering.async.enabled' = 'true',
'clustering.schedule.enabled' = 'true',
'hive_sync.enable' = 'true',
'hive_sync.mode' = 'hms',
'hive_sync.metastore.uris' = 'thrift://hive-metastore-svc.hms.svc:9083',
'read.streaming.enabled' = 'true',
'write.tasks' = '4'
);
```
**Stacktrace**
```
2024-02-01 03:56:01,519 INFO org.apache.hudi.client.HoodieFlinkWriteClient
[] - Cleaner has been spawned already. Waiting for it to finish
2024-02-01 03:56:01,519 INFO org.apache.hudi.async.AsyncCleanerService
[] - Waiting for async clean service to finish
2024-02-01 03:56:01,627 INFO
org.apache.hudi.common.table.timeline.HoodieActiveTimeline [] - Loaded
instants upto : Option{val=[==>20240201035304158__commit__INFLIGHT]}
2024-02-01 03:56:02,333 INFO org.apache.hudi.common.util.ClusteringUtils
[] - Found 658 files in pending clustering operations
2024-02-01 03:56:02,333 INFO
org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView [] - Sending
request :
(http://169.122.153.67:46386/v1/hoodie/view/compactions/pending/?basepath=s3a%3A%2Flakehouse%2Fhudi%2Fdevice_mqtt_msg%2Fods_mqtt_msg=20240201035302997=350fb15b2282717446dd396f06ebaf80257ed284589ba906e5c3ccf6701cc223)
2024-02-01 03:56:02,427 INFO
org.apache.hudi.common.table.timeline.HoodieActiveTimeline [] - Checking for
file exists
?s3a:/lakehouse/hudi/device_mqtt_msg/ods_mqtt_msg/.hoodie/20240131195932968.replacecommit.requested
2024-02-01 03:56:02,564 INFO
org.apache.hudi.common.table.timeline.HoodieActiveTimeline [] - Create new
file for toInstant
?s3a:/lakehouse/hudi/device_mqtt_msg/ods_mqtt_msg/.hoodie/20240131195932968.replacecommit.inflight
2024-02-01 03:56:02,677 INFO
org.apache.hudi.common.table.timeline.HoodieActiveTimeline [] - Loaded
instants upto : Option{val=[20240201035304158__commit__COMPLETED]}
2024-02-01 03:56:02,677 INFO
org.apache.flink.streaming.api.operators.AbstractStreamOperator [] - Execute
clustering plan for instant 20240131195932968 as 17 file slices
2024-02-01 03:56:02,937 ERROR
org.apache.hudi.sink.clustering.ClusteringOperator [] - Executor
executes action [Execute clustering for instant 20240131195932968 from task 2]
error
org.apache.hudi.exception.HoodieClusteringException: Error reading input
data for
s3a://xxx-bucket/lakehouse/hudi/device_mqtt_msg/ods_mqtt_msg/2024-01-31/85040fcd-3f42-4b37-865f-616fc0ad3df8-0_1-4-0_20240131164655396.parquet
and []
at
org.apache.hudi.sink.clustering.ClusteringOperator.lambda$null$4(ClusteringOperator.java:332)
~[hudi-flink1.16-bundle-0.13.1-rc1.jar:0.13.1-rc1]
at java.lang.Iterable.spliterator(Unknown Source) ~[?:?]
at
org.apache.hudi.sink.clustering.ClusteringOperator.lambda$readRecordsForGroupBaseFiles$5(ClusteringOperator.java:336)
~[hudi-flink1.16-bundle-0.13.1-rc1.jar:0.13.1-rc1]
at java.util.stream.ReferencePipeline$3$1.accept(Unknown Source) ~[?:?]
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(Unknown
Source) ~[?:?]
at java.util.stream.AbstractPipeline.copyInto(Unknown Source) ~[?:?]
at java.util.stream.AbstractPipeline.wrapAndCopyInto(Unknown Source)
~[?:?]
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(Unknown