KnightChess commented on code in PR #6824: URL: https://github.com/apache/hudi/pull/6824#discussion_r999401951
########## hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/MergeIntoHoodieTableCommand.scala: ########## @@ -160,7 +167,7 @@ case class MergeIntoHoodieTableCommand(mergeInto: MergeIntoTable) extends Hoodie // column order changed after left anti join , we should keep column order of source dataframe val cols = removeMetaFields(sourceDF).columns - executeInsertOnly(insertSourceDF.select(cols.head, cols.tail:_*), parameters) + executeInsertOnly(insertSourceDF.select(cols.head, cols.tail:_*), writeParam) Review Comment: yes, use `hoodie.combine.before.insert` will de-duplicate, but this is not friendly to users. When create a table with precombine field and use merge into sql to upsert data, it may be prod duplicate records if user wirte diff merge sql. if user need solve it, we need set `hoodie.combine.before.insert` in one case which only has no match branch. User will have doubt, a table with precombineKey in merge sql, sometime writing effect is `upsert` and sometime `insert`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org