GitHub user gudladona created a discussion: Native Protobuf Record Payload
Hello, I'd like to pitch the idea of making Hudi Record payload support native protobuf bytes. Protobuf as the wire format has become very popular for its small payload size per message, particularly over streaming. Also, Protobuf data can be represented as a spark InternalRow type without needing to convert to an Avro record. Additionally, the Parquet also supports Proto native writer (ProtoParquetWriter) which can take a Dynamic message as source that can be easily built from proto bytes. Doing this would alleviate the expensive conversion of Proto --> Avro --> HudiRecordPayload(contains re-serialized avro bytes) and then use a Proto Avro writer during the file write process. Kindly let me know your thoughts, I am happy to start a PR if the idea(improvement) sounds legit. GitHub link: https://github.com/apache/hudi/discussions/13867 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
