[ 
https://issues.apache.org/jira/browse/HUDI-5986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yue Zhang updated HUDI-5986:
----------------------------
    Fix Version/s: 0.13.1
                       (was: 0.14.0)

> empty preCombineKey should never be stored in hoodie.properties
> ---------------------------------------------------------------
>
>                 Key: HUDI-5986
>                 URL: https://issues.apache.org/jira/browse/HUDI-5986
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: hudi-utilities
>            Reporter: Wechar
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.13.1
>
>
> *Overview:*
> We found {{hoodie.properties}} will keep the empty preCombineKey if the table 
> does not have preCombineKey. And the empty preCombineKey will cause the 
> exception when insert data:
> {code:bash}
> Caused by: org.apache.hudi.exception.HoodieException: (Part -) field not 
> found in record. Acceptable fields were :[id, name, price]
>       at 
> org.apache.hudi.avro.HoodieAvroUtils.getNestedFieldVal(HoodieAvroUtils.java:557)
>       at 
> org.apache.hudi.HoodieSparkSqlWriter$$anonfun$createHoodieRecordRdd$1$$anonfun$apply$5.apply(HoodieSparkSqlWriter.scala:1134)
>       at 
> org.apache.hudi.HoodieSparkSqlWriter$$anonfun$createHoodieRecordRdd$1$$anonfun$apply$5.apply(HoodieSparkSqlWriter.scala:1127)
>       at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
>       at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
>       at 
> org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:193)
>       at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:62)
>       at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
>       at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
>       at org.apache.spark.scheduler.Task.run(Task.scala:123)
>       at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>       at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
>       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at java.lang.Thread.run(Thread.java:748)
> {code}
> *Steps to Reproduce:*
> {code:sql}
> -- 1. create a table without preCombineKey
> CREATE TABLE default.test_hudi_default_cm (
>   uuid int,
>   name string,
>   price double
> ) USING hudi
> options (
>  primaryKey='uuid');
> -- 2. config write operation to insert
> set hoodie.datasource.write.operation=insert;
> set hoodie.merge.allow.duplicate.on.inserts=true;
> -- 3. insert data
> insert into default.test_hudi_default_cm select 1, 'name1', 1.1;
> -- 4. insert overwrite
> insert overwrite table default.test_hudi_default_cm select 2, 'name3', 1.1;
> -- 5. insert data will occur exception
> insert into default.test_hudi_default_cm select 1, 'name3', 1.1;
> {code}
> *Root Cause:*
> Hudi re-construct the table when *insert overwrite table* in sql but the 
> configured operation   is not, then it stores the default empty preCombineKey 
> in {{hoodie.properties}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to