[ https://issues.apache.org/jira/browse/HUDI-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Raymond Xu updated HUDI-5323: ----------------------------- Priority: Blocker (was: Critical) > Decouple virtual key with writing bloom filters to parquet files > ---------------------------------------------------------------- > > Key: HUDI-5323 > URL: https://issues.apache.org/jira/browse/HUDI-5323 > Project: Apache Hudi > Issue Type: Improvement > Components: index, writer-core > Reporter: Ethan Guo > Assignee: Ethan Guo > Priority: Blocker > Labels: pull-request-available > Fix For: 0.13.0 > > > When the virtual key feature is enabled by setting > hoodie.populate.meta.fields to false, the bloom filters are not written to > parquet base files in the write transactions. Relevant logic in > HoodieFileWriterFactory class: > {code:java} > private static <T extends HoodieRecordPayload, R extends IndexedRecord> > HoodieFileWriter<R> newParquetFileWriter( > String instantTime, Path path, HoodieWriteConfig config, Schema schema, > HoodieTable hoodieTable, > TaskContextSupplier taskContextSupplier, boolean populateMetaFields) > throws IOException { > return newParquetFileWriter(instantTime, path, config, schema, > hoodieTable.getHadoopConf(), > taskContextSupplier, populateMetaFields, populateMetaFields); > } > private static <T extends HoodieRecordPayload, R extends IndexedRecord> > HoodieFileWriter<R> newParquetFileWriter( > String instantTime, Path path, HoodieWriteConfig config, Schema schema, > Configuration conf, > TaskContextSupplier taskContextSupplier, boolean populateMetaFields, > boolean enableBloomFilter) throws IOException { > Option<BloomFilter> filter = enableBloomFilter ? > Option.of(createBloomFilter(config)) : Option.empty(); > HoodieAvroWriteSupport writeSupport = new HoodieAvroWriteSupport(new > AvroSchemaConverter(conf).convert(schema), schema, filter); > HoodieParquetConfig<HoodieAvroWriteSupport> parquetConfig = new > HoodieParquetConfig<>(writeSupport, config.getParquetCompressionCodec(), > config.getParquetBlockSize(), config.getParquetPageSize(), > config.getParquetMaxFileSize(), > conf, config.getParquetCompressionRatio(), > config.parquetDictionaryEnabled()); > return new HoodieAvroParquetWriter<>(path, parquetConfig, instantTime, > taskContextSupplier, populateMetaFields); > } {code} > Given that bloom filters are absent, when using Bloom Index on the same > table, the writer encounters NPE (HUDI-5319). > We should decouple the virtual key feature with bloom filter and always write > the bloom filters to the parquet files. -- This message was sent by Atlassian Jira (v8.20.10#820010)