pengzhiwei2018 commented on a change in pull request #3328: URL: https://github.com/apache/hudi/pull/3328#discussion_r674875373
########## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/HoodieOptionConfig.scala ########## @@ -172,6 +178,15 @@ object HoodieOptionConfig { params.get(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY.key) } + /** + * Whether enable the bulk insert for sql insert statement when there is no primaryKey in the table. + */ + def enableBulkInsert(options: Map[String, String]): Boolean = { Review comment: I saw that currently ENABLE_ROW_WRITER_OPT_KEY is only used for bulk insert, so i reused this config. ########## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala ########## @@ -159,7 +159,10 @@ object HoodieSparkSqlWriter { // Convert to RDD[HoodieRecord] val genericRecords: RDD[GenericRecord] = HoodieSparkUtils.createRdd(df, schema, structName, nameSpace) - val shouldCombine = parameters(INSERT_DROP_DUPS_OPT_KEY.key()).toBoolean || operation.equals(WriteOperationType.UPSERT); + val shouldCombine = parameters(INSERT_DROP_DUPS_OPT_KEY.key()).toBoolean || + operation.equals(WriteOperationType.UPSERT) || + parameters.getOrElse(HoodieWriteConfig.COMBINE_BEFORE_INSERT_PROP.key(), Review comment: Yes, if we have enable the COMBINE_BEFORE_INSERT_PROP for insert, it has not compute the pre combine field value which will result incorrect result for insert with duplicate records. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org