alexeykudinkin commented on code in PR #6132: URL: https://github.com/apache/hudi/pull/6132#discussion_r974723729
########## rfc/rfc-46/rfc-46.md: ########## @@ -128,21 +173,88 @@ Following major components will be refactored: 1. `HoodieWriteHandle`s will be 1. Accepting `HoodieRecord` instead of raw Avro payload (avoiding Avro conversion) - 2. Using Combining API engine to merge records (when necessary) + 2. Using Record Merge API to merge records (when necessary) 3. Passes `HoodieRecord` as is to `FileWriter` 2. `HoodieFileWriter`s will be 1. Accepting `HoodieRecord` 2. Will be engine-specific (so that they're able to handle internal record representation) 3. `HoodieRealtimeRecordReader`s 1. API will be returning opaque `HoodieRecord` instead of raw Avro payload +### Config for Record Merge +The MERGE_CLASS_NAME config is engine-aware. If you are not specified the MERGE_CLASS_NAME, MERGE_CLASS_NAME will be specified default according to your engine type. + +### Public Api in HoodieRecord +Because we implement different types of records, we need to implement functionality similar to AvroUtils in HoodieRecord for different data(avro, InternalRow, RowData). +Its public API will look like following: + +```java +import java.util.Properties; + +class HoodieRecord { + + /** + * Get column in record to support RDDCustomColumnsSortPartitioner + */ + Object getRecordColumnValues(Schema recordSchema, String[] columns, + boolean consistentLogicalTimestampEnabled); + + /** + * Support bootstrap. + */ + HoodieRecord mergeWith(HoodieRecord other, Schema targetSchema) throws IOException; + + /** + * Rewrite record into new schema(add meta columns) + */ + HoodieRecord rewriteRecord(Schema recordSchema, Properties props, Schema targetSchema) + throws IOException; + + /** + * Support schema evolution. + */ + HoodieRecord rewriteRecordWithNewSchema(Schema recordSchema, Properties props, Schema newSchema, + Map<String, String> renameCols) throws IOException; + + HoodieRecord updateValues(Schema recordSchema, Properties props, Review Comment: @wzx140 we should split these up: - Only legitimate use-case for us to update fields is Hudi's metadata - `HoodieHFileDataBlock` shouldn't be modifying existing payload but should instead be _rewriting_ w/o the field it wants to omit. We will tackle that separately, and for the sake of RFC-46 we can create temporary method `truncateRecordKey` which will be overwriting record-key value for now (we will deprecate and remove this method after we address this) We should not leave a loophole where we allow a record to be modified to make sure that nobody can start building against this API -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org