nsivabalan commented on a change in pull request #3306:
URL: https://github.com/apache/hudi/pull/3306#discussion_r672805478



##########
File path: 
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/testutils/HoodieClientTestHarness.java
##########
@@ -225,6 +229,22 @@ protected void initMetaClient(HoodieTableType tableType) 
throws IOException {
     metaClient = HoodieTestUtils.init(hadoopConf, basePath, tableType);
   }
 
+  protected Properties getPropertiesForKeyGen() {

Review comment:
       HoodieTestDataGenerator has these fields in the commonly used schema and 
hence hardcoded it here, so that all tests can call into this to generate props.

##########
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java
##########
@@ -278,7 +291,8 @@ protected boolean writeRecord(HoodieRecord<T> hoodieRecord, 
Option<IndexedRecord
    * Go through an old record. Here if we detect a newer version shows up, we 
write the new one to the file.
    */
   public void write(GenericRecord oldRecord) {
-    String key = 
oldRecord.get(HoodieRecord.RECORD_KEY_METADATA_FIELD).toString();
+    String key = populateMetaFields ? 
oldRecord.get(HoodieRecord.RECORD_KEY_METADATA_FIELD).toString() :

Review comment:
       not sure if we need to abstract this out and keep it outside of 
MergeHandle itself. there is only two options. Either use meta cols or use 
keyGen to compute record keys. So, have decided to manage it here itself.

##########
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
##########
@@ -1588,6 +1588,11 @@ public Builder withWriteMetaKeyPrefixes(String 
writeMetaKeyPrefixes) {
       return this;
     }
 
+    public Builder withPopulateMetaFields(boolean populateMetaFields) {

Review comment:
       Can you please clarify me something. if we wish to store the property in 
hoodie.properties and is part of HoodieTableConfig, then we should set the 
config via HoodieWriteConfigBuilder.withHoodieTableConfig(new 
HoodieTableConfigBuilder().withPopulate ... sort of? and exposing setter here 
in HoodieWriteConfig is not the right way to go about?
   It was easier for me in tests to set this param. but wanted to know whats 
the right way to go about.  

##########
File path: 
hudi-common/src/main/java/org/apache/hudi/common/util/ParquetUtils.java
##########
@@ -142,6 +143,43 @@
     return hoodieKeys;
   }
 
+  /**
+   * Fetch {@link HoodieKey}s from the given parquet file.
+   *
+   * @param filePath      The parquet file path.
+   * @param configuration configuration to build fs object
+   * @return {@link List} of {@link HoodieKey}s fetched from the parquet file
+   */
+  @Override
+  public List<HoodieKey> fetchRecordKeyPartitionPath(Configuration 
configuration, Path filePath, BaseKeyGenerator keyGenerator) {

Review comment:
       not sure if we can add another argument to existing api and 
generate/fetch recordKeys and partition path based on that. Felt this is neat.

##########
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/storage/HoodieFileWriterFactory.java
##########
@@ -58,16 +59,15 @@
 
   private static <T extends HoodieRecordPayload, R extends IndexedRecord> 
HoodieFileWriter<R> newParquetFileWriter(
       String instantTime, Path path, HoodieWriteConfig config, Schema schema, 
HoodieTable hoodieTable,
-      TaskContextSupplier taskContextSupplier) throws IOException {
-    BloomFilter filter = createBloomFilter(config);
-    HoodieAvroWriteSupport writeSupport =
-        new HoodieAvroWriteSupport(new 
AvroSchemaConverter(hoodieTable.getHadoopConf()).convert(schema), schema, 
filter);
+      TaskContextSupplier taskContextSupplier, boolean populateMetaFields, 
boolean enableBloomFilter) throws IOException {
+    BloomFilter filter = enableBloomFilter ? createBloomFilter(config) : null;

Review comment:
       Already HoodieAvroWriteSupport handles null bloom Filter and hence using 
null here. If you prefer, I can change that to Option and fix this.

##########
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/storage/HoodieParquetWriter.java
##########
@@ -69,14 +70,19 @@ public HoodieParquetWriter(String instantTime, Path file, 
HoodieAvroParquetConfi
     this.writeSupport = parquetConfig.getWriteSupport();
     this.instantTime = instantTime;
     this.taskContextSupplier = taskContextSupplier;
+    this.populateMetaFields = populateMetaFields;
   }
 
   @Override
   public void writeAvroWithMetadata(R avroRecord, HoodieRecord record) throws 
IOException {
-    prepRecordWithMetadata(avroRecord, record, instantTime,
-        taskContextSupplier.getPartitionIdSupplier().get(), recordIndex, 
file.getName());
-    super.write(avroRecord);
-    writeSupport.add(record.getRecordKey());
+    if (populateMetaFields) {
+      prepRecordWithMetadata(avroRecord, record, instantTime,
+          taskContextSupplier.getPartitionIdSupplier().get(), recordIndex, 
file.getName());
+      super.write(avroRecord);
+      writeSupport.add(record.getRecordKey());

Review comment:
       as of this patch, I assume boom goes hand in hand w/ meta cols. If 
populateMetaCols is false, we are not adding bloom index. We can add follow up 
patches to de-couple these. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to