[ https://issues.apache.org/jira/browse/FLINK-33611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802335#comment-17802335 ]
Sai Sharath Dandi commented on FLINK-33611: ------------------------------------------- [~libenchao] All identifier names in the code are part of the constant pool including local variable names. You can use the javap tool on a simple class file to examine the constant pool contents - [ref|[https://blogs.oracle.com/javamagazine/post/java-class-file-constant-pool].] Here's an example class and it's constant pool content obtained with javap - {code:java} public class Hello { public void sayHello1() { Integer a1; int b; String c; } public void sayHello2() { Integer a2; int b; String c; } } {code} {code:java} Constant pool: #1 = Methodref #6.#25 // java/lang/Object."<init>":()V #2 = Methodref #26.#27 // java/lang/Integer.valueOf:(I)Ljava/lang/Integer; #3 = String #28 // hi #4 = String #29 // hello #5 = Class #30 // com/uber/athena/athenax/connector/kafka/formats/protobuf/deserialize/Hello #6 = Class #31 // java/lang/Object #7 = Utf8 <init> #8 = Utf8 ()V #9 = Utf8 Code #10 = Utf8 LineNumberTable #11 = Utf8 LocalVariableTable #12 = Utf8 this #13 = Utf8 Lcom/uber/athena/athenax/connector/kafka/formats/protobuf/deserialize/Hello; #14 = Utf8 sayHello1 #15 = Utf8 a1 #16 = Utf8 Ljava/lang/Integer; #17 = Utf8 b #18 = Utf8 I #19 = Utf8 c #20 = Utf8 Ljava/lang/String; #21 = Utf8 sayHello2 #22 = Utf8 a2 #23 = Utf8 SourceFile #24 = Utf8 Hello.java #25 = NameAndType #7:#8 // "<init>":()V #26 = Class #32 // java/lang/Integer #27 = NameAndType #33:#34 // valueOf:(I)Ljava/lang/Integer; #28 = Utf8 hi #29 = Utf8 hello #30 = Utf8 com/uber/athena/athenax/connector/kafka/formats/protobuf/deserialize/Hello #31 = Utf8 java/lang/Object #32 = Utf8 java/lang/Integer #33 = Utf8 valueOf #34 = Utf8 (I)Ljava/lang/Integer; {code} As we can see from the above example, local variable names are part of the constant pool > Support Large Protobuf Schemas > ------------------------------ > > Key: FLINK-33611 > URL: https://issues.apache.org/jira/browse/FLINK-33611 > Project: Flink > Issue Type: Improvement > Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) > Affects Versions: 1.18.0 > Reporter: Sai Sharath Dandi > Assignee: Sai Sharath Dandi > Priority: Major > Labels: pull-request-available > > h3. Background > Flink serializes and deserializes protobuf format data by calling the decode > or encode method in GeneratedProtoToRow_XXX.java generated by codegen to > parse byte[] data into Protobuf Java objects. FLINK-32650 has introduced the > ability to split the generated code to improve the performance for large > Protobuf schemas. However, this is still not sufficient to support some > larger protobuf schemas as the generated code exceeds the java constant pool > size [limit|https://en.wikipedia.org/wiki/Java_class_file#The_constant_pool] > and we can see errors like "Too many constants" when trying to compile the > generated code. > *Solution* > Since we already have the split code functionality already introduced, the > main proposal here is to now reuse the variable names across different split > method scopes. This will greatly reduce the constant pool size. One more > optimization is to only split the last code segment also only when the size > exceeds split threshold limit. Currently, the last segment of the generated > code is always being split which can lead to too many split methods and thus > exceed the constant pool size limit -- This message was sent by Atlassian Jira (v8.20.10#820010)