[ 
https://issues.apache.org/jira/browse/HUDI-4072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-4072:
--------------------------------------
    Fix Version/s: 0.12.0

> Clustering fails when there is an empty SCHEMA entry in commit metadata with 
> deltastreamer
> ------------------------------------------------------------------------------------------
>
>                 Key: HUDI-4072
>                 URL: https://issues.apache.org/jira/browse/HUDI-4072
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: deltastreamer
>            Reporter: sivabalan narayanan
>            Assignee: sivabalan narayanan
>            Priority: Major
>             Fix For: 0.12.0
>
>
> when deltastreamer has an empty commit(no records to commit, but commit has 
> to happen since checkpoint has changed), we add NULL_SCHEMA or empty string 
> as schema in extra metadata for commits. This is having issues in follow up 
> commits when write client is instantiated from this commit. 
>  
> stacktrace1:
> {code:java}
> Caused by: org.apache.avro.AvroRuntimeException: Not a record: "null"
>       at org.apache.avro.Schema.getFields(Schema.java:279)
>       at 
> org.apache.hudi.avro.HoodieAvroUtils.addMetadataFields(HoodieAvroUtils.java:208)
>       at 
> org.apache.hudi.io.HoodieWriteHandle.<init>(HoodieWriteHandle.java:115)
>       at 
> org.apache.hudi.io.HoodieWriteHandle.<init>(HoodieWriteHandle.java:104)
>       at 
> org.apache.hudi.io.HoodieMergeHandle.<init>(HoodieMergeHandle.java:124)
>       at 
> org.apache.hudi.io.HoodieMergeHandle.<init>(HoodieMergeHandle.java:117)
>       at 
> org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.getUpdateHandle(BaseSparkCommitActionExecutor.java:376)
>       at 
> org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdate(BaseSparkCommitActionExecutor.java:347)
>       at 
> org.apache.hudi.table.action.deltacommit.BaseSparkDeltaCommitActionExecutor.handleUpdate(BaseSparkDeltaCommitActionExecutor.java:80)
>       at 
> org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:321)
>  {code}
>  
> stacktrace2: 
> {code:java}
> Exception in thread "main" org.apache.hudi.exception.HoodieException: 
> org.apache.hudi.exception.HoodieException: Async clustering failed.  Shutting 
> down Delta Sync...
>       at 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$1(HoodieDeltaStreamer.java:184)
>       at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
>       at 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:179)
>       at 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:530)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>       at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
>       at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>       at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>       at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>       at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
>       at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)
>       at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.util.concurrent.ExecutionException: 
> org.apache.hudi.exception.HoodieException: Async clustering failed.  Shutting 
> down Delta Sync...
>       at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
>       at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
>       at 
> org.apache.hudi.async.HoodieAsyncService.waitForShutdown(HoodieAsyncService.java:103)
>       at 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$1(HoodieDeltaStreamer.java:182)
>       ... 15 more
> Caused by: org.apache.hudi.exception.HoodieException: Async clustering 
> failed.  Shutting down Delta Sync...
>       at 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:690)
>       at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at java.lang.Thread.run(Thread.java:750)
> Caused by: org.apache.hudi.exception.HoodieException: Async clustering 
> failed.  Shutting down Delta Sync...
>       at 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:675)
>       ... 4 mor {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to