pengzhiwei2018 commented on a change in pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#discussion_r674875373



##########
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/HoodieOptionConfig.scala
##########
@@ -172,6 +178,15 @@ object HoodieOptionConfig {
     params.get(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY.key)
   }
 
+  /**
+   * Whether enable the bulk insert for sql insert statement when there is no 
primaryKey in the table.
+   */
+  def enableBulkInsert(options: Map[String, String]): Boolean = {

Review comment:
       I saw that currently ENABLE_ROW_WRITER_OPT_KEY is only used for bulk 
insert,  so i reused this config.

##########
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
##########
@@ -159,7 +159,10 @@ object HoodieSparkSqlWriter {
 
           // Convert to RDD[HoodieRecord]
           val genericRecords: RDD[GenericRecord] = 
HoodieSparkUtils.createRdd(df, schema, structName, nameSpace)
-          val shouldCombine = 
parameters(INSERT_DROP_DUPS_OPT_KEY.key()).toBoolean || 
operation.equals(WriteOperationType.UPSERT);
+          val shouldCombine = 
parameters(INSERT_DROP_DUPS_OPT_KEY.key()).toBoolean ||
+            operation.equals(WriteOperationType.UPSERT) ||
+            
parameters.getOrElse(HoodieWriteConfig.COMBINE_BEFORE_INSERT_PROP.key(),

Review comment:
       Yes,  if we have enable the COMBINE_BEFORE_INSERT_PROP for insert, it 
has not compute the pre combine field value which will result incorrect result 
for insert with duplicate records.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to