nsivabalan commented on code in PR #8107: URL: https://github.com/apache/hudi/pull/8107#discussion_r1131832606
########## hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/HoodieDatasetBulkInsertHelper.scala: ########## @@ -82,9 +85,19 @@ object HoodieDatasetBulkInsertHelper val keyGenerator = ReflectionUtils.loadClass(keyGeneratorClassName, new TypedProperties(config.getProps)) .asInstanceOf[SparkKeyGeneratorInterface] + val partitionId = TaskContext.getPartitionId() + var rowId = 0 iter.map { row => - val recordKey = keyGenerator.getRecordKey(row, schema) + // auto generate record keys if needed + val recordKey = if (autoGenerateRecordKeys) { + val recKey = HoodieRecord.generateSequenceId(instantTime, partitionId, rowId) + rowId += 1 + UTF8String.fromString(recKey) + } + else { // else use key generator to fetch record key + keyGenerator.getRecordKey(row, schema) + } val partitionPath = keyGenerator.getPartitionPath(row, schema) val commitTimestamp = UTF8String.EMPTY_UTF8 val commitSeqNo = UTF8String.EMPTY_UTF8 Review Comment: guess, we only focussed on record key and partition path since it has to be generated upfront. rest of them can be filled or generated from within write handle. Lets keep other optimization or unrelated changes out of this patch. Already its a large patch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org