GitHub user gudladona created a discussion: Native Protobuf Record Payload

Hello,

I'd like to pitch the idea of making Hudi Record payload support native 
protobuf bytes. Protobuf as the wire format has become very popular for its 
small payload size per message, particularly over streaming. Also, Protobuf 
data can be represented as a spark InternalRow type without needing to convert 
to an Avro record. Additionally, the Parquet also supports Proto native writer 
(ProtoParquetWriter) which can take a Dynamic message as source that can be 
easily built from proto bytes. Doing this would alleviate the expensive 
conversion of Proto --> Avro --> HudiRecordPayload(contains re-serialized avro 
bytes) and then use a Proto Avro writer during the file write process. 

Kindly let me know your thoughts, I am happy to start a PR if the 
idea(improvement) sounds legit.

GitHub link: https://github.com/apache/hudi/discussions/13867

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to