prashantwason commented on a change in pull request #4449: URL: https://github.com/apache/hudi/pull/4449#discussion_r778656822
########## File path: hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedMetadata.java ########## @@ -507,6 +519,255 @@ public void testMetadataTableWithPendingCompaction(boolean simulateFailedCompact } } + /** + * Test arguments - Table type, populate meta fields, exclude key from payload. + */ + public static List<Arguments> testMetadataRecordKeyExcludeFromPayloadArgs() { + return asList( + Arguments.of(COPY_ON_WRITE, true), + Arguments.of(COPY_ON_WRITE, false), + Arguments.of(MERGE_ON_READ, true), + Arguments.of(MERGE_ON_READ, false) + ); + } + + /** Review comment: These tests are not specific to metadata table. Populate meta fields should be tested via virtual key support. key field should be tested via HFileWriter / HFileBlock tests. Metadata Tests are time consuming so may be can reduce these. ########## File path: hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieHFileReader.java ########## @@ -151,15 +154,15 @@ public BloomFilter readBloomFilter() { } public List<Pair<String, R>> readAllRecords(Schema writerSchema, Schema readerSchema) throws IOException { + final Option<Schema.Field> keySchemaField = Option.ofNullable(readerSchema.getField(keyField)); List<Pair<String, R>> recordList = new LinkedList<>(); try { final HFileScanner scanner = reader.getScanner(false, false); if (scanner.seekTo()) { do { Cell c = scanner.getKeyValue(); - byte[] keyBytes = Arrays.copyOfRange(c.getRowArray(), c.getRowOffset(), c.getRowOffset() + c.getRowLength()); - R record = getRecordFromCell(c, writerSchema, readerSchema); - recordList.add(new Pair<>(new String(keyBytes), record)); + final Pair<String, R> keyAndRecordPair = getRecordFromCell(c, writerSchema, readerSchema, keySchemaField); + recordList.add(new Pair<>(keyAndRecordPair.getFirst(), keyAndRecordPair.getSecond())); Review comment: Instead of allocation a new Pair, cant we simply add keyAndRecordPair to recordList? ########## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/storage/HoodieHFileWriter.java ########## @@ -77,6 +81,8 @@ public HoodieHFileWriter(String instantTime, Path file, HoodieHFileConfig hfileC this.file = HoodieWrapperFileSystem.convertToHoodiePath(file, conf); this.fs = (HoodieWrapperFileSystem) this.file.getFileSystem(conf); this.hfileConfig = hfileConfig; + this.schema = schema; + this.schemaRecordKeyField = Option.ofNullable(schema.getField(HoodieMetadataPayload.SCHEMA_FIELD_ID_KEY)); Review comment: It would be even simpler to migrate HoodieMetadataPayload.SCHEMA_FIELD_ID_KEY to HoodieHFileWriter. In other words, for now lets require HFile schemas to have a field called "key". In future, if the need arises for HFile as a real data file format for HUDI dataset outside of metadata table, we can customize this to allow configuring the key field. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org