nsivabalan commented on a change in pull request #3306: URL: https://github.com/apache/hudi/pull/3306#discussion_r672805478
########## File path: hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/testutils/HoodieClientTestHarness.java ########## @@ -225,6 +229,22 @@ protected void initMetaClient(HoodieTableType tableType) throws IOException { metaClient = HoodieTestUtils.init(hadoopConf, basePath, tableType); } + protected Properties getPropertiesForKeyGen() { Review comment: HoodieTestDataGenerator has these fields in the commonly used schema and hence hardcoded it here, so that all tests can call into this to generate props. ########## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java ########## @@ -278,7 +291,8 @@ protected boolean writeRecord(HoodieRecord<T> hoodieRecord, Option<IndexedRecord * Go through an old record. Here if we detect a newer version shows up, we write the new one to the file. */ public void write(GenericRecord oldRecord) { - String key = oldRecord.get(HoodieRecord.RECORD_KEY_METADATA_FIELD).toString(); + String key = populateMetaFields ? oldRecord.get(HoodieRecord.RECORD_KEY_METADATA_FIELD).toString() : Review comment: not sure if we need to abstract this out and keep it outside of MergeHandle itself. there is only two options. Either use meta cols or use keyGen to compute record keys. So, have decided to manage it here itself. ########## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java ########## @@ -1588,6 +1588,11 @@ public Builder withWriteMetaKeyPrefixes(String writeMetaKeyPrefixes) { return this; } + public Builder withPopulateMetaFields(boolean populateMetaFields) { Review comment: Can you please clarify me something. if we wish to store the property in hoodie.properties and is part of HoodieTableConfig, then we should set the config via HoodieWriteConfigBuilder.withHoodieTableConfig(new HoodieTableConfigBuilder().withPopulate ... sort of? and exposing setter here in HoodieWriteConfig is not the right way to go about? It was easier for me in tests to set this param. but wanted to know whats the right way to go about. ########## File path: hudi-common/src/main/java/org/apache/hudi/common/util/ParquetUtils.java ########## @@ -142,6 +143,43 @@ return hoodieKeys; } + /** + * Fetch {@link HoodieKey}s from the given parquet file. + * + * @param filePath The parquet file path. + * @param configuration configuration to build fs object + * @return {@link List} of {@link HoodieKey}s fetched from the parquet file + */ + @Override + public List<HoodieKey> fetchRecordKeyPartitionPath(Configuration configuration, Path filePath, BaseKeyGenerator keyGenerator) { Review comment: not sure if we can add another argument to existing api and generate/fetch recordKeys and partition path based on that. Felt this is neat. ########## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/storage/HoodieFileWriterFactory.java ########## @@ -58,16 +59,15 @@ private static <T extends HoodieRecordPayload, R extends IndexedRecord> HoodieFileWriter<R> newParquetFileWriter( String instantTime, Path path, HoodieWriteConfig config, Schema schema, HoodieTable hoodieTable, - TaskContextSupplier taskContextSupplier) throws IOException { - BloomFilter filter = createBloomFilter(config); - HoodieAvroWriteSupport writeSupport = - new HoodieAvroWriteSupport(new AvroSchemaConverter(hoodieTable.getHadoopConf()).convert(schema), schema, filter); + TaskContextSupplier taskContextSupplier, boolean populateMetaFields, boolean enableBloomFilter) throws IOException { + BloomFilter filter = enableBloomFilter ? createBloomFilter(config) : null; Review comment: Already HoodieAvroWriteSupport handles null bloom Filter and hence using null here. If you prefer, I can change that to Option and fix this. ########## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/storage/HoodieParquetWriter.java ########## @@ -69,14 +70,19 @@ public HoodieParquetWriter(String instantTime, Path file, HoodieAvroParquetConfi this.writeSupport = parquetConfig.getWriteSupport(); this.instantTime = instantTime; this.taskContextSupplier = taskContextSupplier; + this.populateMetaFields = populateMetaFields; } @Override public void writeAvroWithMetadata(R avroRecord, HoodieRecord record) throws IOException { - prepRecordWithMetadata(avroRecord, record, instantTime, - taskContextSupplier.getPartitionIdSupplier().get(), recordIndex, file.getName()); - super.write(avroRecord); - writeSupport.add(record.getRecordKey()); + if (populateMetaFields) { + prepRecordWithMetadata(avroRecord, record, instantTime, + taskContextSupplier.getPartitionIdSupplier().get(), recordIndex, file.getName()); + super.write(avroRecord); + writeSupport.add(record.getRecordKey()); Review comment: as of this patch, I assume boom goes hand in hand w/ meta cols. If populateMetaCols is false, we are not adding bloom index. We can add follow up patches to de-couple these. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org