[ https://issues.apache.org/jira/browse/HUDI-4072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
sivabalan narayanan reassigned HUDI-4072: ----------------------------------------- Assignee: sivabalan narayanan > Clustering fails when there is an empty SCHEMA entry in commit metadata with > deltastreamer > ------------------------------------------------------------------------------------------ > > Key: HUDI-4072 > URL: https://issues.apache.org/jira/browse/HUDI-4072 > Project: Apache Hudi > Issue Type: Bug > Components: deltastreamer > Reporter: sivabalan narayanan > Assignee: sivabalan narayanan > Priority: Major > > when deltastreamer has an empty commit(no records to commit, but commit has > to happen since checkpoint has changed), we add NULL_SCHEMA or empty string > as schema in extra metadata for commits. This is having issues in follow up > commits when write client is instantiated from this commit. > > stacktrace1: > {code:java} > Caused by: org.apache.avro.AvroRuntimeException: Not a record: "null" > at org.apache.avro.Schema.getFields(Schema.java:279) > at > org.apache.hudi.avro.HoodieAvroUtils.addMetadataFields(HoodieAvroUtils.java:208) > at > org.apache.hudi.io.HoodieWriteHandle.<init>(HoodieWriteHandle.java:115) > at > org.apache.hudi.io.HoodieWriteHandle.<init>(HoodieWriteHandle.java:104) > at > org.apache.hudi.io.HoodieMergeHandle.<init>(HoodieMergeHandle.java:124) > at > org.apache.hudi.io.HoodieMergeHandle.<init>(HoodieMergeHandle.java:117) > at > org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.getUpdateHandle(BaseSparkCommitActionExecutor.java:376) > at > org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdate(BaseSparkCommitActionExecutor.java:347) > at > org.apache.hudi.table.action.deltacommit.BaseSparkDeltaCommitActionExecutor.handleUpdate(BaseSparkDeltaCommitActionExecutor.java:80) > at > org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:321) > {code} > > stacktrace2: > {code:java} > Exception in thread "main" org.apache.hudi.exception.HoodieException: > org.apache.hudi.exception.HoodieException: Async clustering failed. Shutting > down Delta Sync... > at > org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$1(HoodieDeltaStreamer.java:184) > at org.apache.hudi.common.util.Option.ifPresent(Option.java:97) > at > org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:179) > at > org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:530) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.util.concurrent.ExecutionException: > org.apache.hudi.exception.HoodieException: Async clustering failed. Shutting > down Delta Sync... > at > java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) > at > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) > at > org.apache.hudi.async.HoodieAsyncService.waitForShutdown(HoodieAsyncService.java:103) > at > org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$1(HoodieDeltaStreamer.java:182) > ... 15 more > Caused by: org.apache.hudi.exception.HoodieException: Async clustering > failed. Shutting down Delta Sync... > at > org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:690) > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:750) > Caused by: org.apache.hudi.exception.HoodieException: Async clustering > failed. Shutting down Delta Sync... > at > org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:675) > ... 4 mor {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)