koldic opened a new issue, #7209:
URL: https://github.com/apache/hudi/issues/7209

   **Describe the problem you faced**
   
   Hudi deltastreamer fails with this exception `Could not deserialize metadata 
of type class org.apache.hudi.avro.model.HoodieCleanMetadata`
   
   **To Reproduce**
   
   Unknown
   
   **Expected behavior**
   
   Hoodie will finish cleaning and run correctly again
   
   **Environment Description**
   
   * Hudi version : 0.12.1
   
   * Spark version : 2.4.8
   
   * Hadoop version : 3.1.4.0-315
   
   * Storage (HDFS/S3/GCS..) : HDFS
   
   * Running on Docker? (yes/no) : NO
   
   **Additional context**
   
   deltastreamer run for few days and worked correctly, sometimes it failed due 
to insufficient memory, was rollbacked and then worked fine again. Suddenly it 
failed (during) the weekend and after re-running it again it ended always with 
this exception.
   ```
   Settings:
   hoodie.upsert.shuffle.parallelism=                                           
                                       200
   hoodie.clean.automatic=                                                      
                                       true
   hoodie.clean.async=                                                          
                                       true
   hoodie.metadata.enable=                                                      
                                       false
   hoodie.write.markers.type=                                                   
                                       direct
   
   # Key fields
   hoodie.datasource.write.keygenerator.class=                                  
                                       
cz.seznam.datacollect.hit.app.sync.keygen.AugmentedCustomKeyGenerator
   hoodie.datasource.write.recordkey.field=                                     
                                       random
   hoodie.datasource.write.partitionpath.field=                                 
                                       
create_tst:timestamp,app:hivestyle,os:hivestyle
   hoodie.deltastreamer.keygen.timebased.timestamp.type=                        
                                       DATE_STRING
   hoodie.deltastreamer.keygen.timebased.input.dateformat=                      
                                       yyyyMMdd'T'HH:mm:ss.SSS
   hoodie.deltastreamer.keygen.timebased.timezone=                              
                                       UTC
   hoodie.deltastreamer.keygen.timebased.output.dateformat=                     
                                       
'year='yyyy/'month='MM/'week='ww/'day='dd/'hour='HH
   
   hoodie.deltastreamer.schemaprovider.source.schema.file=                      
                                       {{ env ("HITS_APP_SCHEMA_PATH") }}
   hoodie.deltastreamer.kafka.source.maxEvents=                                 
                                       1000000
   ```
   
   **Stacktrace**
   
   ```
   2022-11-15 12:32:28,267 INFO scheduler.TaskSetManager: Starting task 0.0 in 
stage 32.0 (TID 14326, cider81.ng.seznam.cz, executor 44, partition 0, 
PROCESS_LOCAL, 7791 bytes)
   2022-11-15 12:32:28,274 INFO storage.BlockManagerInfo: Added 
broadcast_20_piece0 in memory on cider81.ng.seznam.cz:8496 (size: 36.5 KB, 
free: 5.2 GB)
   2022-11-15 12:32:28,300 INFO scheduler.TaskSetManager: Finished task 0.0 in 
stage 32.0 (TID 14326) in 34 ms on cider81.ng.seznam.cz (executor 44) (1/1)
   2022-11-15 12:32:28,300 INFO cluster.YarnClusterScheduler: Removed TaskSet 
32.0, whose tasks have all completed, from pool 
   2022-11-15 12:32:28,300 INFO scheduler.DAGScheduler: ResultStage 32 
(collectAsMap at HoodieSparkEngineContext.java:151) finished in 0.052 s
   2022-11-15 12:32:28,300 INFO scheduler.DAGScheduler: Job 12 finished: 
collectAsMap at HoodieSparkEngineContext.java:151, took 0.053427 s
   2022-11-15 12:32:28,321 INFO fs.FSUtils: Removed directory at 
/hits/app/hudi_cileni/.hoodie/.temp/20221115122551524
   2022-11-15 12:32:28,321 INFO client.BaseHoodieWriteClient: Async cleaner has 
been spawned. Waiting for it to finish
   2022-11-15 12:32:28,321 INFO async.AsyncCleanerService: Waiting for async 
clean service to finish
   2022-11-15 12:32:28,326 ERROR async.HoodieAsyncService: Service shutdown 
with error
   java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException: 
Could not deserialize metadata of type class 
org.apache.hudi.avro.model.HoodieCleanMetadata
        at 
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
        at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
        at 
org.apache.hudi.async.HoodieAsyncService.waitForShutdown(HoodieAsyncService.java:103)
        at 
org.apache.hudi.async.AsyncCleanerService.waitForCompletion(AsyncCleanerService.java:75)
        at 
org.apache.hudi.client.BaseHoodieWriteClient.autoCleanOnCommit(BaseHoodieWriteClient.java:609)
        at 
org.apache.hudi.client.BaseHoodieWriteClient.postCommit(BaseHoodieWriteClient.java:533)
        at 
org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:237)
        at 
org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:125)
        at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:626)
        at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:336)
        at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$1(HoodieDeltaStreamer.java:704)
        at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
   Caused by: java.lang.IllegalArgumentException: Could not deserialize 
metadata of type class org.apache.hudi.avro.model.HoodieCleanMetadata
        at 
org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:40)
        at 
org.apache.hudi.common.table.timeline.TimelineMetadataUtils.deserializeAvroMetadata(TimelineMetadataUtils.java:205)
        at 
org.apache.hudi.common.table.timeline.TimelineMetadataUtils.deserializeHoodieCleanMetadata(TimelineMetadataUtils.java:170)
        at 
org.apache.hudi.table.action.clean.CleanPlanActionExecutor.getCommitsSinceLastCleaning(CleanPlanActionExecutor.java:72)
        at 
org.apache.hudi.table.action.clean.CleanPlanActionExecutor.needsCleaning(CleanPlanActionExecutor.java:87)
        at 
org.apache.hudi.table.action.clean.CleanPlanActionExecutor.execute(CleanPlanActionExecutor.java:169)
        at 
org.apache.hudi.table.HoodieSparkCopyOnWriteTable.scheduleCleaning(HoodieSparkCopyOnWriteTable.java:204)
        at 
org.apache.hudi.client.BaseHoodieWriteClient.scheduleTableServiceInternal(BaseHoodieWriteClient.java:1354)
        at 
org.apache.hudi.client.BaseHoodieWriteClient.clean(BaseHoodieWriteClient.java:865)
        at 
org.apache.hudi.client.BaseHoodieWriteClient.clean(BaseHoodieWriteClient.java:827)
        at 
org.apache.hudi.async.AsyncCleanerService.lambda$startService$0(AsyncCleanerService.java:55)
        ... 4 more
   2022-11-15 12:32:28,333 ERROR deltastreamer.HoodieDeltaStreamer: Shutting 
down delta-sync due to exception
   org.apache.hudi.exception.HoodieException: Error waiting for async clean 
service to finish
        at 
org.apache.hudi.async.AsyncCleanerService.waitForCompletion(AsyncCleanerService.java:77)
        at 
org.apache.hudi.client.BaseHoodieWriteClient.autoCleanOnCommit(BaseHoodieWriteClient.java:609)
        at 
org.apache.hudi.client.BaseHoodieWriteClient.postCommit(BaseHoodieWriteClient.java:533)
        at 
org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:237)
        at 
org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:125)
        at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:626)
        at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:336)
        at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$1(HoodieDeltaStreamer.java:704)
        at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
   Caused by: java.util.concurrent.ExecutionException: 
java.lang.IllegalArgumentException: Could not deserialize metadata of type 
class org.apache.hudi.avro.model.HoodieCleanMetadata
        at 
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
        at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
        at 
org.apache.hudi.async.HoodieAsyncService.waitForShutdown(HoodieAsyncService.java:103)
        at 
org.apache.hudi.async.AsyncCleanerService.waitForCompletion(AsyncCleanerService.java:75)
        ... 11 more
   Caused by: java.lang.IllegalArgumentException: Could not deserialize 
metadata of type class org.apache.hudi.avro.model.HoodieCleanMetadata
        at 
org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:40)
        at 
org.apache.hudi.common.table.timeline.TimelineMetadataUtils.deserializeAvroMetadata(TimelineMetadataUtils.java:205)
        at 
org.apache.hudi.common.table.timeline.TimelineMetadataUtils.deserializeHoodieCleanMetadata(TimelineMetadataUtils.java:170)
        at 
org.apache.hudi.table.action.clean.CleanPlanActionExecutor.getCommitsSinceLastCleaning(CleanPlanActionExecutor.java:72)
        at 
org.apache.hudi.table.action.clean.CleanPlanActionExecutor.needsCleaning(CleanPlanActionExecutor.java:87)
        at 
org.apache.hudi.table.action.clean.CleanPlanActionExecutor.execute(CleanPlanActionExecutor.java:169)
        at 
org.apache.hudi.table.HoodieSparkCopyOnWriteTable.scheduleCleaning(HoodieSparkCopyOnWriteTable.java:204)
        at 
org.apache.hudi.client.BaseHoodieWriteClient.scheduleTableServiceInternal(BaseHoodieWriteClient.java:1354)
        at 
org.apache.hudi.client.BaseHoodieWriteClient.clean(BaseHoodieWriteClient.java:865)
        at 
org.apache.hudi.client.BaseHoodieWriteClient.clean(BaseHoodieWriteClient.java:827)
        at 
org.apache.hudi.async.AsyncCleanerService.lambda$startService$0(AsyncCleanerService.java:55)
        ... 4 more
   2022-11-15 12:32:28,333 INFO deltastreamer.HoodieDeltaStreamer: Delta Sync 
shutdown. Error ?true
   2022-11-15 12:32:28,333 WARN deltastreamer.HoodieDeltaStreamer: Gracefully 
shutting down compactor
   2022-11-15 12:32:30,964 INFO async.AsyncCompactService: Compactor shutting 
down properly!!
   2022-11-15 12:32:30,966 INFO deltastreamer.HoodieDeltaStreamer: DeltaSync 
shutdown. Closing write client. Error?true
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to