[ https://issues.apache.org/jira/browse/HUDI-5986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yue Zhang updated HUDI-5986: ---------------------------- Fix Version/s: 0.13.1 (was: 0.14.0) > empty preCombineKey should never be stored in hoodie.properties > --------------------------------------------------------------- > > Key: HUDI-5986 > URL: https://issues.apache.org/jira/browse/HUDI-5986 > Project: Apache Hudi > Issue Type: Bug > Components: hudi-utilities > Reporter: Wechar > Priority: Major > Labels: pull-request-available > Fix For: 0.13.1 > > > *Overview:* > We found {{hoodie.properties}} will keep the empty preCombineKey if the table > does not have preCombineKey. And the empty preCombineKey will cause the > exception when insert data: > {code:bash} > Caused by: org.apache.hudi.exception.HoodieException: (Part -) field not > found in record. Acceptable fields were :[id, name, price] > at > org.apache.hudi.avro.HoodieAvroUtils.getNestedFieldVal(HoodieAvroUtils.java:557) > at > org.apache.hudi.HoodieSparkSqlWriter$$anonfun$createHoodieRecordRdd$1$$anonfun$apply$5.apply(HoodieSparkSqlWriter.scala:1134) > at > org.apache.hudi.HoodieSparkSqlWriter$$anonfun$createHoodieRecordRdd$1$$anonfun$apply$5.apply(HoodieSparkSqlWriter.scala:1127) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:410) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:410) > at > org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:193) > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:62) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) > at org.apache.spark.scheduler.Task.run(Task.scala:123) > at > org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} > *Steps to Reproduce:* > {code:sql} > -- 1. create a table without preCombineKey > CREATE TABLE default.test_hudi_default_cm ( > uuid int, > name string, > price double > ) USING hudi > options ( > primaryKey='uuid'); > -- 2. config write operation to insert > set hoodie.datasource.write.operation=insert; > set hoodie.merge.allow.duplicate.on.inserts=true; > -- 3. insert data > insert into default.test_hudi_default_cm select 1, 'name1', 1.1; > -- 4. insert overwrite > insert overwrite table default.test_hudi_default_cm select 2, 'name3', 1.1; > -- 5. insert data will occur exception > insert into default.test_hudi_default_cm select 1, 'name3', 1.1; > {code} > *Root Cause:* > Hudi re-construct the table when *insert overwrite table* in sql but the > configured operation is not, then it stores the default empty preCombineKey > in {{hoodie.properties}}. -- This message was sent by Atlassian Jira (v8.20.10#820010)