Sai Sharath Dandi created FLINK-33611:
-----------------------------------------
Summary: Add the ability to reuse variable names across different
split method scopes
Key: FLINK-33611
URL: https://issues.apache.org/jira/browse/FLINK-33611
Project: Flink
Issue Type: Improvement
Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
Affects Versions: 1.18.0
Reporter: Sai Sharath Dandi
h3. Background
Flink serializes and deserializes protobuf format data by calling the decode or
encode method in GeneratedProtoToRow_XXX.java generated by codegen to parse
byte[] data into Protobuf Java objects. FLINK-32650 has introduced the ability
to split the generated code to improve the performance for large Protobuf
schemas. However, this is still not sufficient to support some larger protobuf
schemas as the generated code exceeds the java constant pool size
[limit|https://en.wikipedia.org/wiki/Java_class_file#The_constant_pool] and we
can see errors like "Too many constants" when trying to compile the generated
code.
*Solution*
Since we already have the split code functionality already introduced, the main
proposal here is to now use different variable names across different split
method scopes. This will greatly reduce the constant pool size. One more
optimization is to only split the last code segment also only when the size
exceeds split threshold limit. Currently, the last segment of the generated
code is always being split which can lead to too many split methods.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)