Sai Sharath Dandi created FLINK-33817:
-----------------------------------------
Summary: Allow ReadDefaultValues = False for non primitive types
on Proto3
Key: FLINK-33817
URL: https://issues.apache.org/jira/browse/FLINK-33817
Project: Flink
Issue Type: Improvement
Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
Affects Versions: 1.18.0
Reporter: Sai Sharath Dandi
*Background*
The current Protobuf format
[implementation|https://github.com/apache/flink/blob/c3e2d163a637dca5f49522721109161bd7ebb723/flink-formats/flink-protobuf/src/main/java/org/apache/flink/formats/protobuf/deserialize/ProtoToRowConverter.java]
always sets ReadDefaultValues=False when using Proto3 version. This can cause
severe performance degradation for large Protobuf schemas with OneOf fields as
the entire generated code needs to be executed during deserialization even when
certain fields are not present in the data to be deserialized and all the
subsequent nested Fields can be skipped. Proto3 supports hasXXX() methods for
checking field presence for non primitive types since Proto version
[3.15|https://github.com/protocolbuffers/protobuf/releases/tag/v3.15.0]. In
internal benchmarks in our company, we've seen almost 10x difference in
performance when allowing to set ReadDefaultValues=False with proto3 version
*Solution*
Support using ReadDefaultValues=False when using Proto3 version. We need to be
careful to check for field presence only on non-primitive types if
ReadDefaultValues is false and version used is Proto3
--
This message was sent by Atlassian Jira
(v8.20.10#820010)