nsivabalan commented on code in PR #12594:
URL: https://github.com/apache/hudi/pull/12594#discussion_r1932881698


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java:
##########
@@ -674,26 +690,31 @@ private static Map<HeaderMetadataType, String> 
getUpdatedHeader(Map<HeaderMetada
       updatedHeader.put(
           HeaderMetadataType.IS_PARTIAL, Boolean.toString(true));
     }
+    if (shouldWriteRecordPositions && baseInstantTimeForPositions.isPresent()) 
{
+      updatedHeader.put(
+          HeaderMetadataType.BASE_FILE_INSTANT_TIME_OF_RECORD_POSITIONS,
+          baseInstantTimeForPositions.get());
+    }
     return updatedHeader;
   }
 
-  private static HoodieLogBlock getBlock(HoodieWriteConfig writeConfig,
-                                         HoodieLogBlock.HoodieLogBlockType 
logDataBlockFormat,
-                                         List<HoodieRecord> records,
-                                         boolean shouldWriteRecordPositions,
-                                         Map<HeaderMetadataType, String> 
header,
-                                         String keyField) {
+  private static HoodieLogBlock getDataBlock(HoodieWriteConfig writeConfig,
+                                             HoodieLogBlock.HoodieLogBlockType 
logDataBlockFormat,
+                                             List<HoodieRecord> records,
+                                             Map<HeaderMetadataType, String> 
header,
+                                             String keyField) {
     switch (logDataBlockFormat) {
       case AVRO_DATA_BLOCK:
-        return new HoodieAvroDataBlock(records, shouldWriteRecordPositions, 
header, keyField);
+        return new HoodieAvroDataBlock(records, header, keyField);
       case HFILE_DATA_BLOCK:
+        // Not supporting positions in HFile data blocks
+        
header.remove(HeaderMetadataType.BASE_FILE_INSTANT_TIME_OF_RECORD_POSITIONS);

Review Comment:
   we can directly set writeRecord positions to false for MDT



##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestMORDataSourceStorage.scala:
##########
@@ -311,8 +310,7 @@ class TestMORDataSourceStorage extends 
SparkClientFunctionalTestHarness {
       // since it happens before the compaction is executed.
       // The deltacommit from the fifth and sixth round should write record 
positions in the log
       // files since it happens after the completed compaction
-      validateRecordPositionsInLogFiles(
-        metaClient, shouldContainRecordPosition = !enableNBCC && i != 4)
+      validateRecordPositionsInLogFiles(metaClient, 
shouldContainRecordPosition = !enableNBCC)

Review Comment:
   can we also ensure the log header does not contain positions. 
   also, if possible can we validate that key based merging is used and not 
position based merge



##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieDataBlock.java:
##########
@@ -79,12 +78,11 @@ public abstract class HoodieDataBlock extends 
HoodieLogBlock {
    * NOTE: This ctor is used on the write-path (ie when records ought to be 
written into the log)
    */
   public HoodieDataBlock(List<HoodieRecord> records,
-                         boolean shouldWriteRecordPositions,
                          Map<HeaderMetadataType, String> header,
                          Map<FooterMetadataType, String> footer,
                          String keyFieldName) {
     super(header, footer, Option.empty(), Option.empty(), null, false);
-    if (shouldWriteRecordPositions) {
+    if (containsBaseFileInstantTimeOfPositions()) {

Review Comment:
   we could make `supportWritePosition" as a method to HoodieIndex and avoid 
sorting the records . 
   for bloom, simple etc we can blindly go ahead to write record positions. 
   but other indexes, we can skip.
   
   and so, we can avoid sorting the records too. 



##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieDataBlock.java:
##########
@@ -79,12 +78,11 @@ public abstract class HoodieDataBlock extends 
HoodieLogBlock {
    * NOTE: This ctor is used on the write-path (ie when records ought to be 
written into the log)
    */
   public HoodieDataBlock(List<HoodieRecord> records,
-                         boolean shouldWriteRecordPositions,
                          Map<HeaderMetadataType, String> header,
                          Map<FooterMetadataType, String> footer,
                          String keyFieldName) {
     super(header, footer, Option.empty(), Option.empty(), null, false);
-    if (shouldWriteRecordPositions) {
+    if (containsBaseFileInstantTimeOfPositions()) {

Review Comment:
   alternatively, we could just check first record for valid position. 
   if yes, we can sort and write the header. 
   if not, we can bail out. no need to incur the overhead due to sorting 



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java:
##########
@@ -674,26 +690,31 @@ private static Map<HeaderMetadataType, String> 
getUpdatedHeader(Map<HeaderMetada
       updatedHeader.put(
           HeaderMetadataType.IS_PARTIAL, Boolean.toString(true));
     }
+    if (shouldWriteRecordPositions && baseInstantTimeForPositions.isPresent()) 
{

Review Comment:
   we could reduce 1 arg here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to