[ 
https://issues.apache.org/jira/browse/FLINK-33611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802335#comment-17802335
 ] 

Sai Sharath Dandi commented on FLINK-33611:
-------------------------------------------

[~libenchao] All identifier names in the code are part of the constant pool 
including local variable names. You can use the javap tool on a simple class 
file to examine the constant pool contents - 
[ref|[https://blogs.oracle.com/javamagazine/post/java-class-file-constant-pool].]

 

Here's an example class and it's constant pool content obtained with javap - 

 

 
{code:java}
public class Hello {

  public void sayHello1() {
    Integer a1;
    int b;
    String c;
  }

  public void sayHello2() {
    Integer a2;
    int b;
    String c;
  }
} {code}
{code:java}
Constant pool:
   #1 = Methodref          #6.#25         // java/lang/Object."<init>":()V
   #2 = Methodref          #26.#27        // 
java/lang/Integer.valueOf:(I)Ljava/lang/Integer;
   #3 = String             #28            // hi
   #4 = String             #29            // hello
   #5 = Class              #30            // 
com/uber/athena/athenax/connector/kafka/formats/protobuf/deserialize/Hello
   #6 = Class              #31            // java/lang/Object
   #7 = Utf8               <init>
   #8 = Utf8               ()V
   #9 = Utf8               Code
  #10 = Utf8               LineNumberTable
  #11 = Utf8               LocalVariableTable
  #12 = Utf8               this
  #13 = Utf8               
Lcom/uber/athena/athenax/connector/kafka/formats/protobuf/deserialize/Hello;
  #14 = Utf8               sayHello1
  #15 = Utf8               a1
  #16 = Utf8               Ljava/lang/Integer;
  #17 = Utf8               b
  #18 = Utf8               I
  #19 = Utf8               c
  #20 = Utf8               Ljava/lang/String;
  #21 = Utf8               sayHello2
  #22 = Utf8               a2
  #23 = Utf8               SourceFile
  #24 = Utf8               Hello.java
  #25 = NameAndType        #7:#8          // "<init>":()V
  #26 = Class              #32            // java/lang/Integer
  #27 = NameAndType        #33:#34        // valueOf:(I)Ljava/lang/Integer;
  #28 = Utf8               hi
  #29 = Utf8               hello
  #30 = Utf8               
com/uber/athena/athenax/connector/kafka/formats/protobuf/deserialize/Hello
  #31 = Utf8               java/lang/Object
  #32 = Utf8               java/lang/Integer
  #33 = Utf8               valueOf
  #34 = Utf8               (I)Ljava/lang/Integer; {code}
 

 

As we can see from the above example, local variable names are part of the 
constant pool

 

> Support Large Protobuf Schemas
> ------------------------------
>
>                 Key: FLINK-33611
>                 URL: https://issues.apache.org/jira/browse/FLINK-33611
>             Project: Flink
>          Issue Type: Improvement
>          Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>    Affects Versions: 1.18.0
>            Reporter: Sai Sharath Dandi
>            Assignee: Sai Sharath Dandi
>            Priority: Major
>              Labels: pull-request-available
>
> h3. Background
> Flink serializes and deserializes protobuf format data by calling the decode 
> or encode method in GeneratedProtoToRow_XXX.java generated by codegen to 
> parse byte[] data into Protobuf Java objects. FLINK-32650 has introduced the 
> ability to split the generated code to improve the performance for large 
> Protobuf schemas. However, this is still not sufficient to support some 
> larger protobuf schemas as the generated code exceeds the java constant pool 
> size [limit|https://en.wikipedia.org/wiki/Java_class_file#The_constant_pool] 
> and we can see errors like "Too many constants" when trying to compile the 
> generated code. 
> *Solution*
> Since we already have the split code functionality already introduced, the 
> main proposal here is to now reuse the variable names across different split 
> method scopes. This will greatly reduce the constant pool size. One more 
> optimization is to only split the last code segment also only when the size 
> exceeds split threshold limit. Currently, the last segment of the generated 
> code is always being split which can lead to too many split methods and thus 
> exceed the constant pool size limit



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to