[ 
https://issues.apache.org/jira/browse/HUDI-6627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinish Reddy updated HUDI-6627:
-------------------------------
    Description: 
When source returns an empty option in deltastreamer, the writer schema is 
null. This causes an NPE with the table schema validation in spark write client 
causing the below exception. We should skip this validation when writer schema 
is null. 


{code:java}
org.apache.hudi.exception.HoodieInsertException: Failed insert schema 
compability check.
        at 
org.apache.hudi.table.HoodieTable.validateInsertSchema(HoodieTable.java:851)
        at 
org.apache.hudi.client.SparkRDDWriteClient.insert(SparkRDDWriteClient.java:185)
        at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:690)
        at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:396)
        at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.ingestOnce(HoodieDeltaStreamer.java:876)
        at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
        at 
com.onehouse.hudi.OnehouseDeltaStreamer$MultiTableSyncService.lambda$null$1(OnehouseDeltaStreamer.java:319)
        at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
Caused by: org.apache.hudi.exception.HoodieException: Failed to read 
schema/check compatibility for base path 
s3a://onehouse-customer-bucket-2451e78f/data-lake/chandra_data_lake_default/xml_flatten_struct_test
        at 
org.apache.hudi.table.HoodieTable.validateSchema(HoodieTable.java:830)
        at 
org.apache.hudi.table.HoodieTable.validateInsertSchema(HoodieTable.java:849)
        ... 10 more
Caused by: java.lang.NullPointerException
        at 
com.fasterxml.jackson.core.JsonFactory.createParser(JsonFactory.java:1158)
        at org.apache.avro.Schema$Parser.parse(Schema.java:1418)
        at 
org.apache.hudi.avro.HoodieAvroUtils.createHoodieWriteSchema(HoodieAvroUtils.java:302)
        at 
org.apache.hudi.table.HoodieTable.validateSchema(HoodieTable.java:826)
        ... 11 more
{code}

 

  was:
When source returns an empty option in deltastreamer, the writer schema is 
null. This causes an NPE with the table schema validation in spark write client 
causing the below exception. We should skip this validation when writer schema 
is null.

{quote}org.apache.hudi.exception.HoodieInsertException: Failed insert schema 
compability check.
        at 
org.apache.hudi.table.HoodieTable.validateInsertSchema(HoodieTable.java:851)
        at 
org.apache.hudi.client.SparkRDDWriteClient.insert(SparkRDDWriteClient.java:185)
        at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:690)
        at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:396)
        at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.ingestOnce(HoodieDeltaStreamer.java:876)
        at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
        at 
com.onehouse.hudi.OnehouseDeltaStreamer$MultiTableSyncService.lambda$null$1(OnehouseDeltaStreamer.java:319)
        at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
Caused by: org.apache.hudi.exception.HoodieException: Failed to read 
schema/check compatibility for base path 
s3a://onehouse-customer-bucket-2451e78f/data-lake/chandra_data_lake_default/xml_flatten_struct_test
        at 
org.apache.hudi.table.HoodieTable.validateSchema(HoodieTable.java:830)
        at 
org.apache.hudi.table.HoodieTable.validateInsertSchema(HoodieTable.java:849)
        ... 10 more
Caused by: java.lang.NullPointerException
        at 
com.fasterxml.jackson.core.JsonFactory.createParser(JsonFactory.java:1158)
        at org.apache.avro.Schema$Parser.parse(Schema.java:1418)
        at 
org.apache.hudi.avro.HoodieAvroUtils.createHoodieWriteSchema(HoodieAvroUtils.java:302)
        at 
org.apache.hudi.table.HoodieTable.validateSchema(HoodieTable.java:826)
        ... 11 more{quote}

 


> Spark write client fails when write schema is null
> --------------------------------------------------
>
>                 Key: HUDI-6627
>                 URL: https://issues.apache.org/jira/browse/HUDI-6627
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: Vinish Reddy
>            Priority: Minor
>
> When source returns an empty option in deltastreamer, the writer schema is 
> null. This causes an NPE with the table schema validation in spark write 
> client causing the below exception. We should skip this validation when 
> writer schema is null. 
> {code:java}
> org.apache.hudi.exception.HoodieInsertException: Failed insert schema 
> compability check.
>       at 
> org.apache.hudi.table.HoodieTable.validateInsertSchema(HoodieTable.java:851)
>       at 
> org.apache.hudi.client.SparkRDDWriteClient.insert(SparkRDDWriteClient.java:185)
>       at 
> org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:690)
>       at 
> org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:396)
>       at 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.ingestOnce(HoodieDeltaStreamer.java:876)
>       at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
>       at 
> com.onehouse.hudi.OnehouseDeltaStreamer$MultiTableSyncService.lambda$null$1(OnehouseDeltaStreamer.java:319)
>       at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at java.lang.Thread.run(Thread.java:750)
> Caused by: org.apache.hudi.exception.HoodieException: Failed to read 
> schema/check compatibility for base path 
> s3a://onehouse-customer-bucket-2451e78f/data-lake/chandra_data_lake_default/xml_flatten_struct_test
>       at 
> org.apache.hudi.table.HoodieTable.validateSchema(HoodieTable.java:830)
>       at 
> org.apache.hudi.table.HoodieTable.validateInsertSchema(HoodieTable.java:849)
>       ... 10 more
> Caused by: java.lang.NullPointerException
>       at 
> com.fasterxml.jackson.core.JsonFactory.createParser(JsonFactory.java:1158)
>       at org.apache.avro.Schema$Parser.parse(Schema.java:1418)
>       at 
> org.apache.hudi.avro.HoodieAvroUtils.createHoodieWriteSchema(HoodieAvroUtils.java:302)
>       at 
> org.apache.hudi.table.HoodieTable.validateSchema(HoodieTable.java:826)
>       ... 11 more
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to