nsivabalan commented on code in PR #8107:
URL: https://github.com/apache/hudi/pull/8107#discussion_r1131832606


##########
hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/HoodieDatasetBulkInsertHelper.scala:
##########
@@ -82,9 +85,19 @@ object HoodieDatasetBulkInsertHelper
           val keyGenerator =
             ReflectionUtils.loadClass(keyGeneratorClassName, new 
TypedProperties(config.getProps))
               .asInstanceOf[SparkKeyGeneratorInterface]
+          val partitionId = TaskContext.getPartitionId()
+          var rowId = 0
 
           iter.map { row =>
-            val recordKey = keyGenerator.getRecordKey(row, schema)
+            // auto generate record keys if needed
+            val recordKey = if (autoGenerateRecordKeys) {
+              val recKey = HoodieRecord.generateSequenceId(instantTime, 
partitionId, rowId)
+              rowId += 1
+              UTF8String.fromString(recKey)
+            }
+            else { // else use key generator to fetch record key
+              keyGenerator.getRecordKey(row, schema)
+            }
             val partitionPath = keyGenerator.getPartitionPath(row, schema)
             val commitTimestamp = UTF8String.EMPTY_UTF8
             val commitSeqNo = UTF8String.EMPTY_UTF8

Review Comment:
   guess, we only focussed on record key and partition path since it has to be 
generated upfront. rest of them can be filled or generated from within write 
handle. Lets keep other optimization or unrelated changes out of this patch. 
Already its a large patch. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to