prashant462 opened a new issue, #10626: URL: https://github.com/apache/hudi/issues/10626
### Issue Summary When using dbt Spark with Hudi to create a Hudi format table, there is an inconsistency in the Hudi table configuration between the initial insert and subsequent merge operations. The properties provided in the options of the dbt model are correctly fetched and applied during the first run. However, during the second run, when executing the merge operation, Hudi fetches a subset of the properties from the Hudi catalog table, leading to the addition of default properties and changes in configuration. ### Steps to Reproduce - Execute the dbt model with Hudi options for the initial insert. Sample model {{ config( materialized = 'incremental', file_format= 'hudi', pre_hook="SET spark.sql.legacy.allowNonEmptyLocationInCTAS = true", location_root="file:///Users/B0279627/Downloads/Hudi", unique_key="id", incremental_strategy="merge", options={ 'preCombineField': 'id2', 'hoodie.index.type':"GLOBAL_SIMPLE", 'hoodie.simple.index.update.partition.path':'true', 'hoodie.keep.min.commits':'145', 'hoodie.keep.max.commits':'288', 'hoodie.cleaner.policy':'KEEP_LATEST_BY_HOURS', 'hoodie.cleaner.hours.retained':'72', 'hoodie.cleaner.fileversions.retained':'144', 'hoodie.cleaner.commits.retained':'144', 'hoodie.upsert.shuffle.parallelism':'200', 'hoodie.insert.shuffle.parallelism':'200', 'hoodie.bulkinsert.shuffle.parallelism':'200', 'hoodie.delete.shuffle.parallelism':'200', 'hoodie.parquet.compression.codec':'zstd', 'hoodie.datasource.hive_sync.support_timestamp':'true', 'hoodie.datasource.write.reconcile.schema':'true', 'hoodie.enable.data.skipping':'true', 'hoodie.datasource.write.payload.class':'org.apache.hudi.common.model.DefaultHoodieRecordPayload', } ) }} - Observe that all specified properties are correctly applied during the first run. - For observation you can check with sample property like hoodie.index.type=GLOBAL_SIMPLE - Execute the dbt model with Hudi options for a subsequent merge operation. - Observe changes in Hudi table properties, with defaults being applied for certain configurations like hoodie.index.type changed to SIMPLE (Target table created seems like following hoodie.index.type= SIMPLE) ### Expected Behavior Hudi should consistently set all specified properties in every run, irrespective of whether it is the initial insert or a subsequent merge operation. The properties passed in the options of the dbt model should be retained and applied consistently across all operations. ### Environment Description * Hudi version : 0.12.1 * Spark version : 3.3.1 * Hive version : 3.1.3 * Hadoop version : 3.1.1 * DBT version: 1.7.1 * Storage (HDFS/S3/GCS..) : Checked with s3 , hdfs and local file system. * Running on Docker? (yes/no) : no ### **Additional context** In the second run MergeIntohoodieTableCommand.scala executes InsertIntoHoodieTableCommand.run() in this case hudi fetch the props from hudicatalog table where it fetches tableConfigs and catalog properties. But they are not all that complete properties which I passed in the first run using dbt options. Due to which hudi add some other default properties in the hoodie props which are not fetched in the hudicatalog props . Seems due to this many properties are changing. Below i have attached some images of properties fetched in subsequent merge operations <img width="1440" alt="MicrosoftTeams-image (21)" src="https://github.com/apache/hudi/assets/31952894/46126281-b95a-47a4-9116-66a093a97506"> <img width="1120" alt="Screenshot 2024-02-05 at 10 00 20 PM" src="https://github.com/apache/hudi/assets/31952894/80ba4206-77d0-4852-aaf1-fd0e19c91025"> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org