prashantwason commented on a change in pull request #4449:
URL: https://github.com/apache/hudi/pull/4449#discussion_r778656822



##########
File path: 
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedMetadata.java
##########
@@ -507,6 +519,255 @@ public void 
testMetadataTableWithPendingCompaction(boolean simulateFailedCompact
     }
   }
 
+  /**
+   * Test arguments - Table type, populate meta fields, exclude key from 
payload.
+   */
+  public static List<Arguments> testMetadataRecordKeyExcludeFromPayloadArgs() {
+    return asList(
+        Arguments.of(COPY_ON_WRITE, true),
+        Arguments.of(COPY_ON_WRITE, false),
+        Arguments.of(MERGE_ON_READ, true),
+        Arguments.of(MERGE_ON_READ, false)
+    );
+  }
+
+  /**

Review comment:
       These tests are not specific to metadata table.
   
   Populate meta fields should be tested via virtual key support.
   key field should be tested via HFileWriter / HFileBlock tests.
   
   Metadata Tests are time consuming so may be can reduce these.

##########
File path: 
hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieHFileReader.java
##########
@@ -151,15 +154,15 @@ public BloomFilter readBloomFilter() {
   }
 
   public List<Pair<String, R>> readAllRecords(Schema writerSchema, Schema 
readerSchema) throws IOException {
+    final Option<Schema.Field> keySchemaField = 
Option.ofNullable(readerSchema.getField(keyField));
     List<Pair<String, R>> recordList = new LinkedList<>();
     try {
       final HFileScanner scanner = reader.getScanner(false, false);
       if (scanner.seekTo()) {
         do {
           Cell c = scanner.getKeyValue();
-          byte[] keyBytes = Arrays.copyOfRange(c.getRowArray(), 
c.getRowOffset(), c.getRowOffset() + c.getRowLength());
-          R record = getRecordFromCell(c, writerSchema, readerSchema);
-          recordList.add(new Pair<>(new String(keyBytes), record));
+          final Pair<String, R> keyAndRecordPair = getRecordFromCell(c, 
writerSchema, readerSchema, keySchemaField);
+          recordList.add(new Pair<>(keyAndRecordPair.getFirst(), 
keyAndRecordPair.getSecond()));

Review comment:
       Instead of allocation a new Pair, cant we simply add keyAndRecordPair to 
recordList?

##########
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/storage/HoodieHFileWriter.java
##########
@@ -77,6 +81,8 @@ public HoodieHFileWriter(String instantTime, Path file, 
HoodieHFileConfig hfileC
     this.file = HoodieWrapperFileSystem.convertToHoodiePath(file, conf);
     this.fs = (HoodieWrapperFileSystem) this.file.getFileSystem(conf);
     this.hfileConfig = hfileConfig;
+    this.schema = schema;
+    this.schemaRecordKeyField = 
Option.ofNullable(schema.getField(HoodieMetadataPayload.SCHEMA_FIELD_ID_KEY));

Review comment:
       It would be even simpler to migrate 
HoodieMetadataPayload.SCHEMA_FIELD_ID_KEY to HoodieHFileWriter. 
   
   In other words, for now lets require HFile schemas to have a field called 
"key". In future, if the need arises for HFile as a real data file format for 
HUDI dataset outside of metadata table, we can customize this to allow 
configuring the key field.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to