Sarfaraz-214 opened a new issue, #10233: URL: https://github.com/apache/hudi/issues/10233
I am using HoodieStreamer with **Hudi 0.14** and trying to leverage [autogenerated keys](https://hudi.apache.org/releases/release-0.14.0/#support-for-hudi-tables-with-autogenerated-keys). Hence I am not passing **hoodie.datasource.write.recordkey.field** & **hoodie.datasource.write.precombine.field** . Additionally, I am passing **hoodie.spark.sql.insert.into.operation = insert** (instead of --op insert) which claims that there is no pre-combine key with bulk_insert and insert mode. With above the **.hoodie** directory gets created but the data write to GCS fails with error - ``` org.apache.hudi.exception.HoodieException: ts(Part -ts) field not found in record. Acceptable fields were :[c1, c2, c3, c4, c5] at org.apache.hudi.avro.HoodieAvroUtils.getNestedFieldVal(HoodieAvroUtils.java:601) ``` I also see in **hoodie.properties** file pre-combine key is getting set to **ts** (hoodie.table.precombine.field=ts). Seems like this is getting set due to default value of **--source-ordering-field** . How can we skip the pre-combine field in this case? This is happening for both CoW & MoR tables. Actually this is running fine via Spark-SQL, but while using HoodieStreamer I am facing the issue. Sharing the configurations used: **hudi-table.properties** ``` hoodie.datasource.write.partitionpath.field=job_id hoodie.spark.sql.insert.into.operation=insert bootstrap.servers=*** security.protocol=SASL_SSL sasl.mechanism=PLAIN sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username='***' password='***'; auto.offset.reset=earliest hoodie.deltastreamer.source.kafka.topic=<topicName> hoodie.deltastreamer.schemaprovider.source.schema.file=gs://<fullPath>/<schemaName>.avsc hoodie.write.concurrency.mode=optimistic_concurrency_control hoodie.write.lock.provider=org.apache.hudi.client.transaction.lock.InProcessLockProvider ``` **spark-submit command** ``` spark-submit \ --class org.apache.hudi.utilities.streamer.HoodieStreamer \ --packages org.apache.hudi:hudi-spark3.3-bundle_2.12:0.14.0, \ --properties-file /home/sarfaraz_h/spark-config.properties \ --master yarn \ --deploy-mode cluster \ --driver-memory 12G \ --driver-cores 3 \ --executor-memory 12G \ --executor-cores 3 \ --num-executors 3 \ --conf spark.yarn.maxAppAttempts=1 \ --conf spark.sql.shuffle.partitions=18 \ gs://<fullPath>/jar/hudi-utilities-slim-bundle_2.12-0.14.0.jar \ --continuous \ --source-limit 1000000 \ --min-sync-interval-seconds 600 \ --table-type COPY_ON_WRITE \ --source-class org.apache.hudi.utilities.sources.JsonKafkaSource \ --target-base-path gs://<fullPath>/<tableName> \ --target-table <tableName> \ --schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider \ --props gs://<fullPath>/configfolder/es_user_profile_config.properties ``` Spark version used is - 3.3.2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org